'Joining arrays within group by clause

We have a problem grouping arrays into a single array. We want to join the values from two columns into one single array and aggregate these arrays of multiple rows.

Given the following input:

| id | name | col_1 | col_2 |
| 1  |  a   |   1   |   2   |
| 2  |  a   |   3   |   4   |
| 4  |  b   |   7   |   8   |
| 3  |  b   |   5   |   6   |

We want the following output:

| a | { 1, 2, 3, 4 } |
| b | { 5, 6, 7, 8 } |

The order of the elements is important and should correlate with the id of the aggregated rows.

We tried the array_agg() function:

SELECT array_agg(ARRAY[col_1, col_2]) FROM mytable GROUP BY name;

Unfortunately, this statement raises an error:

ERROR: could not find array type for data type character varying[]

It seems to be impossible to merge arrays in a group by clause using array_agg().

Any ideas?



Solution 1:[1]

UNION ALL

You could "unpivot" with UNION ALL first:

SELECT name, array_agg(c) AS c_arr
FROM  (
   SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
   UNION ALL
   SELECT name, id, 2, col2 FROM tbl
   ORDER  BY name, id, rnk
   ) sub
GROUP  BY 1;

Adapted to produce the order of values you later requested. The manual:

The aggregate functions array_agg, json_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work.

Bold emphasis mine.

LATERAL subquery with VALUES expression

LATERAL requires Postgres 9.3 or later.

SELECT t.name, array_agg(c) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
CROSS  JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
GROUP  BY 1;

Same result. Only needs a single pass over the table.

Custom aggregate function

Or you could create a custom aggregate function like discussed in these related answers:

CREATE AGGREGATE array_agg_mult (anyarray)  (
    SFUNC     = array_cat
  , STYPE     = anyarray
  , INITCOND  = '{}'
);

Then you can:

SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Or, typically faster, while not standard SQL:

SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
FROM  (SELECT * FROM tbl ORDER BY name, id) t
GROUP  BY 1;

The added ORDER BY id (which can be appended to such aggregate functions) guarantees your desired result:

a | {1,2,3,4}
b | {5,6,7,8}

Or you might be interested in this alternative:

SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Which produces 2-dimensional arrays:

a | {{1,2},{3,4}}
b | {{5,6},{7,8}}

The last one can be replaced (and should be, as it's faster!) with the built-in array_agg() in Postgres 9.5 or later - with its added capability of aggregating arrays:

SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Same result. The manual:

input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or null)

So not exactly the same as our custom aggregate function array_agg_mult();

Solution 2:[2]

select n, array_agg(c) as c
from (
    select n, unnest(array[c1, c2]) as c
    from t
) s
group by n

Or simpler

select
    n,
    array_agg(c1) || array_agg(c2) as c
from t
group by n

To address the new ordering requirement:

select n, array_agg(c order by id, o) as c
from (
    select
        id, n,
        unnest(array[c1, c2]) as c,
        unnest(array[1, 2]) as o
    from t
) s
group by n

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2