'BigQuery SQL JSON Returning additional rows when current row contains multiple values

I have a table that looks like this

keyA           | data:{"value":false}}
keyB           | data:{"value":3}}
keyC           | data:{"value":{"paid":10,"unpaid":20}}} 

For keyA,keyB I can easily extract a single value with JSON_EXTRACT_SCALAR, but for keyC I would like to return multiple values and change the key name, so the final output looks like this:

keyA           | false
keyB           | 3
keyC-paid      | 10
keyD-unpaid    | 20

I know I can use UNNEST and JSON_EXTRACT multiple values and create additional but unsure how to combine them to adjust the key column name as well?



Solution 1:[1]

Try this one:

WITH sample AS (
  SELECT 'keyA' AS col, '{"value":false}' AS data
   UNION ALL
  SELECT 'keyB' AS col, '{"value":3}' AS data
   UNION ALL
  SELECT 'keyC' AS col, '{"value":{"paid":10,"unpaid":20}}' AS data
)
SELECT col || IFNULL('-' || k, '') AS col,
       IFNULL(v, JSON_VALUE(data, '$.value')) AS data
  FROM (
    SELECT col, data, 
           `bqutil.fn.json_extract_keys`(JSON_QUERY(data, '$.value')) AS keys,
           `bqutil.fn.json_extract_values`(JSON_QUERY(data, '$.value')) AS vals
      FROM sample
  ) LEFT JOIN UNNEST(keys) k WITH OFFSET ki 
    LEFT JOIN UNNEST(vals) v WITH OFFSET vi ON ki = vi;

enter image description here

Solution 2:[2]

Even more generic approach

create temp function  extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function  extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;
create temp function extract_all_leaves(input string) returns string language js as '''
  function flattenObj(obj, parent = '', res = {}){
    for(let key in obj){
        let propName = parent ? parent + '.' + key : key;
        if(typeof obj[key] == 'object'){
            flattenObj(obj[key], propName, res);
        } else {
            res[propName] = obj[key];
        }
    }
    return JSON.stringify(res);
  }
  return flattenObj(JSON.parse(input));
  ''';
select col || replace(replace(key, 'value', ''), '.', '-') as col, value, 
from your_table,
unnest([struct(extract_all_leaves(data) as json)]),
unnest(extract_keys(json)) key with offset
join unnest(extract_values(json)) value with offset
using(offset)    

if applied to sample data in your question - output is

enter image description here

Benefit of this approach is that it is quite generic and thus can handle any level of nesting in json

For example for below data/table

enter image description here

the output is

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Mikhail Berlyant