'BigQuery: Aggregate multiple fields into array
I have some data where for each ID I want to aggregate two or more fields into an array, and I want them to match in terms of order.
So for example if I have the following data:
I want to turn it into this:
Alternatively, something like this would also be fine:
So firstly, if I were to use a query like this, would it do what I want or does it not guarantee that the two fields pull through in the same order (i.e. that the corresponding values in Value_1 and Value_2 may not match)?
SELECT
ID,
ARRAY_AGG (
Value_1
) AS Value_1,
ARRAY_AGG (
Value_2
) AS Value_2
FROM
table
GROUP BY
ID
If not, how can I go about doing this?
Solution 1:[1]
Use ARRAY_AGG
with STRUCT
if you want to pair the values together. For example,
SELECT
ID,
ARRAY_AGG (
STRUCT(Value_1, Value_2)
) AS Values
FROM
table
GROUP BY
ID;
Solution 2:[2]
... and for your alternative
ask:
SELECT
id,
ARRAY_AGG(CONCAT('[', Value_1, ',', Value_2, ']')) AS Values
FROM `yourTable`
GROUP BY id
Solution 3:[3]
Working solution of Array_Agg ()
with Python BigQuery:
!pip install -U google-cloud-bigquery
import pandas as pd
from google.cloud import bigquery
strvalue = """SELECT users ARRAY_AGG(STRUCT(session, page )) as hasComp FROM <datasetname>.<tableName> WHERE Group by users order by users limit 100 """
bigquery_client = bigquery.Client(project="")
dataset = bigquery_client.dataset("")
table = dataset.table('')
table.view_query_legacy_sql = False
query_job = bigquery_client.query(str_value)
df = query_job.to_dataframe()
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | New Alexandria |
Solution 2 | Mikhail Berlyant |
Solution 3 | Muhammad |