Internals of Group-Aggregate (Using Sorted Rows)

This document describes the structure and execution of group aggregate, implemented as groupAggregate function.

Static ALU Block Structure

Group aggregate is similar like the primitive aggregate, but categorized the input items by group key. For each group key, an normal aggregate is performed.

The classification process of the group by is done by Sort. Therefore, this group-aggregate primitive is actually the block that connected after a sort primitive. In other words, the input streams of this primitive is already sorted by group-by key.

The supported primitives of group-aggregate are:
  • group_aggr_max
  • group_aggr_min
  • group_aggr_sum
  • group_aggr_count
  • group_aggr_cnz
  • group_aggr_mean
  • group_aggr_variance
  • group_aggr_normL1
  • group_aggr_normL2
Correspondingly, the related OPS are:
  • AOP_MAX
  • AOP_MIN
  • AOP_SUM
  • AOP_COUNT
  • AOP_COUNTNONZEROS
  • AOP_MEAN
  • AOP_VARIANCE
  • AOP_NORML1
  • AOP_NORML2

Caution

  • For the primitve group_aggr_sum, group_aggr_mean, group_aggr_variance, group_aggr_normL1, group_aggr_normL2, double is used as the sum result of intermediate calculation, therefore, the II of these primitives is 15. For other primitives, the II is equal to 1.
  • In groupAggregate primitives, only one group-by key / indexing key is supported. However, in a SQL query, normally, several indexing key are used together. Therefore, before the group_aggregate block, several indexing keys are combined to one key by combine-unit, and then pass the newly generated combined indexing key to groupAggregate primitive.