group
Use the "group" directive to collect rows according to shared criteria. Use this directives sub-directives to perform convenient operations on each group.
Grouping rows in a worksheet based on common criteria is a powerful data processing technique. Users may group rows of data per specified criteria. Once rows are grouped, other operations can be applied to the group as an added convenience.
For example, consider a worksheet with columns "Account ID" and "Date". Use this directive to group the rows in this worksheet.
pebblestream:from(Accounts)
pebblestream:group("Account ID", "Date")
Grouping Feature
All rows with matching "Account ID" and "Date" values are put next to each other. This often results in easier data processing.
The following examples illustrate how group's sub directives are used.
coalesce
Here is the sub-directive coalesce declaration.
pebblestream:coalesce()
After group calculations are performed, coalesce each group into a single row.
Coalesce is not compatible with ratio or merge sub-directives.
sum
Use this directive to get the sum of the values in a column for rows grouped by criteria.
pebblestream:from(Amounts)
pebblestream:group("Last", "First")
pebblestream:sum("Amount"[, "Result"])
The "Amount" rows are summed together for all rows that have the same "Last" and "First" values. If "Result" is provided, a new column called "Result" is created with the value. Otherwise, the "Amount" column is updated with the result of the sum.
Sum up all transfers for the each person in the Transactions worksheet.
First | Last | Transfer |
---|---|---|
John | Jones | -60 |
Adam | Jones | 186 |
Cathy | Albertson | 48794 |
John | Jones | 545 |
Cathy | Albertson | 8777 |
Here is the directive that accomplishes this task.
pebblestream:from(Transactions)
pebblestream:group("Last", "First")
pebblestream:sum("Transfer")
pebblestream:coalesce()
This is the resulting worksheet.
First | Last | Transfer |
---|---|---|
John | Jones | 485 |
Adam | Jones | 186 |
Cathy | Albertson | 57571 |
If the sub-directive "coalesce" is not included, the following is the resulting worksheet:
First | Last | Transfer |
---|---|---|
John | Jones | 485 |
John | Jones | 485 |
Cathy | Albertson | 48794 |
Cathy | Albertson | 48794 |
Adam | Jones | 186 |
Computation Tip
It can be computationally convenient to not coalesce group results.
mult
pebblestream:mult("Factor"[, "Result"])
The "Factor" rows are multiplied together. If "Result" is provided, a new column called "Result" is created with the value. Otherwise, the "Factor" column is updated.
Discount ID | Account ID | Discount |
---|---|---|
1 | 1 | 0.9 |
2 | 2 | 0.8 |
3 | 1 | 0.95 |
Applyi the following directive declaration:
pebblestream:from(Discounts)
pebblestream:group("Account ID")
pebblestream:mult("Discount")
pebblestream:coalesce()
To obtain the resulting worksheet.
Account ID | Discount |
---|---|
1 | 0.855 |
2 | 0.8 |
Computation Tip
It can be computationally convenient to not coalesce group results.
equal
The equal sub-directive group's rows that have equal values in the provided column.
pebblestream:equal("Amount"[, "Result"])
The "Amount" rows are checked for whether they are all equal. If "Result" is provided, a new column called "Result" is created with the value. Otherwise, the "Amount" column is updated.
Requires group.
Consider this worksheet Logins
Account ID | Account User | # IP Address |
---|---|---|
1 | 2 | 127.0.38.9 |
1 | 1 | 127.0.5.8 |
1 | 1 | 127.0.5.8 |
1 | 2 | 127.0.48.2 |
2 | 1 | 127.0.59.12 |
User the following directive declaration:
pebblestream:from(Logins)
pebblestream:group("Account ID", "Account User")
pebblestream:equal("IP Address", "Same IP?")
pebblestream:coalesce()
to obtain the worksheet Same IP
Account ID | Account User | Same IP? |
---|---|---|
1 | 2 | FALSE |
1 | 1 | TRUE |
2 | 1 | TRUE |
ratio
The "ratio" sub-directive performs a typical operation amongst accountants and financial analysts. Given a group of numbers, determine the relative weight of each number in the group.
pebblestream:ratio("Amount"[, "Result"])
The "Amount" rows are summed together and the ratio of each row is calculated based on the total of the group's values. If "Result" is provided, a new column called "Result" is created with the row's group ratio. Otherwise, the "Amount" column is updated.
Important
The ratio directive cannot be used with coalesce.
Consider the following worksheet of Transactions
First | Last | Value |
---|---|---|
John | Jones | 3 |
Adam | Jones | 5 |
John | Jones | 1 |
Cathy | Albertson | -1 |
Cathy | Albertson | 1 |
John | Jones | 0 |
Using the following directive declaration
pebblestream:from(Transactions)
pebblestream:group("Last", "First")
pebblestream:ratio("Value", "Ratio")
will produce the following Ratio worksheet when placed in a comment in the header row.
First | Ratio |
---|---|
John | 0.75 |
John | 0.25 |
John | 0.0 |
Adam | 1.0 |
Cathy | 0 |
Cathy | 0 |
regex
Checks whether every row in "Name" matches the provided regular expression. If "Result" is provided, a new column called "Result" is created with the result. Otherwise, the "Name" column is updated.
pebblestream:regex("Name", "[0-9]*MyRegex.*"[, "Result"])
Consider the worksheet Transactions,
Account ID | Account User # | Amount | Tags |
---|---|---|---|
1 | 2 | 894 | home verified |
1 | 1 | 45981 | office verified |
1 | 1 | 15 | home |
1 | 2 | 5958 | verified home |
2 | 1 | 255 | verified mobile |
Determine whether every transaction for a given user was verified using the following directive declaration:
pebblestream:from(Transactions)
pebblestream:group("Account ID", "Account User #")
pebblestream:regex("Tags", ".*\\bverified\\b.*", "User always verified?")
By placing his declaration in the comment in the header section of the Verification Status worksheet to obtain the results below.
Account ID | Account User # | Amount | User Always Verified? |
---|---|---|---|
1 | 2 | 894 | TRUE |
1 | 2 | 5958 | TRUE |
1 | 1 | 45981 | FALSE |
1 | 1 | 15 | FALSE |
2 | 1 | 255 | TRUE |
Updated about 2 years ago