Data Processing Module

Get unique values

Description:
For a given array, extract the unique values for a given field or fields.

{
    "type": "data_processing",
    "action": "unique_values",
    "args": {
        "source": "$data.people",
        "fields": ["name"],
        "rows": [1, 2, 3, 4, 5],
        "target": "$data.names"
    }
}

The result is an array of values related to that field from the collection.

While the ‘rows’ property is optional and can be excluded for actions affecting the entire source, specifying it means the action will target only the records at the indicated indexes in the source data. Throughout this document, any mention of the ‘rows’ property should be understood in this context: as optional and applicable only to the specified records.

Filter

Description:
Filter the collection to get a subset based on the filter intent.
A basic intent has three properties:

  1. field – what is the field name we are evaluating
  2. operator – what operator is used in the evaluation.
    • eq or = or ==
    • neq or !=
    • gt or >
    • ge or >=
    • lt or <
    • le or <=
    • is_null
    • not_null
    • like
    • not_like
    • contains
    • in
    • between
    • starts_with
    • ends_with
  3. value – what is the value we are checking for

The intent can also contain keywords for boolean expressions such as:

  1. not
  2. and
  3. or
  4. and the round brackets ( and )

The result of a filter intent is an collection of record index values.
For example [1, 29, …]. These defines the record indexes in the source collection that matches the filter expression.

Basic step structure

{
    "type": "data_processing",
    "action": "filter",
    "args": {
        "source": "$data.people",
        "intent": {
            "field": "code",
            "operator": "eq",
            "value": "abc"
        },
        "case_sensitive": true
    }
}

The data structure for intents is consistent across all operators, so we won’t provide examples for each one. You’ve already seen the use of the equals operator, but we will demonstrate deviations.

intent: {field: "value", operator: "is_null"}

intent: {field: "value", operator: "not_null"},

intent: {field: "value", operator: "in", value: ["test1", "alpha", "test2"]},

Logical operators

You can also use logical operators “and” and “or” as part of a intent expression.

"intent": {
    "operator": "or",
    "expressions": [
        { "field": "value", "operator": "eq", "value": "alpha" },
        { "field": "code", "operator": "eq", "value": "a2" }
    ]
}

In these cases, the operator is the logical operator and instead of a value property we have a expressions collection property. This collection has multiple simple expressions that combine with the logical operator to define the intent.

{
    "type": "data_processing",
    "action": "filter",
    "args": {
        "source": "$data.people",
        
        "intent": {
            "operator": "or",
            "expressions": [
                { "field": "value", "operator": "eq", "value": "alpha" },
                { "field": "code", "operator": "eq", "value": "a2" }
            ]          
        },
        
        "case_sensitive": true
    }
}

Grouping data

Grouping is done using the group intent.

{
    "type": "data_processing",
    "action": "group",
    "args": {
        "source": "$data.people",
        "intent": ["lastName", "firstName"],
        "rows": [1, 2, 3, 4]
    }
}

The intent involves grouping records based on a collection of field names, where each unique value in the collection leads to a group formation. For instance, individuals sharing the same last name are grouped together. Further, those with both identical last names and first names are grouped into a more specific category. This method of grouping is depicted in a hierarchical structure, with the leaf nodes of this structure representing the indexes of records that match the specified grouping criteria.

The resulting hierarchical structure is a object with the following properties.

  1. id: This is the unique UUID assigned to the group.
  2. value: Represents the unique value defining the group, such as a first name in this context.
  3. child_count: Indicates the total number of records or subgroups contained within this group.
  4. children: A collection of subgroups, for instance, groups of first names associated with a particular last name.

When you get to the last group record, you will not have a children property but instead a rows collection. This is a collection of indexes for the rows that make up the group breakdown.

The group result object will always start with a property called “root” that defines the first level of grouping objects as it’s children. The value of the root is also “root”.

{
    "root": {
        "id": "e051372b-8e9b-450d-870b-64fd04c7444f",
        "value": "root",
        "child_count": 3,
        "children": [
            {
                "id": "...",
                "value": "Smith",
                "child_count": 2,
                "children": [
                    {
                        "id": "...",
                        "value": "John", 
                        "child_count": 1,
                        "rows": [0]
                    },
                    {
                        "id": "...",
                        "value": "Jane",
                        "child_count": 1,
                        "rows": [1]
                    }
                ]
            },
            
            {
                "id": "...",
                "value": "Jones", 
                .... structure for the Joneses ...
            },
            
            {
                "id": "...",
                "value": "Pieters",
                ... structure for the Pieters ....
            }
        ]
    }
}

Sort

Sorting is defined using a collection of strings that combine the field name and direction. For example “age:ace” and “age:dec”.

{
    "type": "data_processing",
    "action": "sort",
    "args": {
        "source": "$data.people",      
        "intent": ["lastName:ace", "age:dec"],        
    }
}

The result is a collection of indexes that defines the order that the source would be when sorted. For example: [3, 4, 2, 1]

Get perspective

You can combine all the different intents into a single perspective call.
This can combine any of the above intents.

{
    "type": "data_processing",
    "action": "sort",
    "args": {
        "source": "$data.people",      
        "intent": {
            "filter": { "field": "lastName", "operator": "eq", "value": "Doe" },
            "sort": ["age:dec"]
        }        
    }
}

If the appropriate intent is set, the process begins with the application of the filter. Following this, the operations are conducted solely on the filtered results, not the entire source dataset. The next steps involve sorting the collection and finally implementing the grouping.

Aggregate

Calculate the aggregates for numeric fields in a collection of data.

  1. sum
  2. min
  3. max
  4. ave
  5. count
{
    "type": "data_processing",
    "action": "sort",
    "args": {
        "source": "$data.people",      
        "intent": ["age", "income"]
    }
}

The example provided demonstrates how to compute aggregate values for the properties ‘age’ and ‘income’. The outcome will be an object featuring these properties, with each one encompassing the respective aggregate data. For instance, accessing “result.age.min” will reveal the minimum age.

{
    "age": {
        "sum": 400,
        "min": 20,
        "max": 60, 
        "ave": 40, 
        "count": 10
    },
    "income": {
        ... aggregates ...
    }
}