'How do you extract a specific field from a JSON array in Big Query?

I currently have a JSON array that looks like this in Big Query:

[{"name":"","username":null},{"name":"Jimmy Dean","username":"iamjc"},{"name":"Ben Simmons","username":"bens"}]

I want to create a column that just has a list of username values. For the example above, I would like the column to display

"iamjc", "bens"

I've tried using json_extract_scalar with unnest but it is throwing an error because of the null value.

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
""";
select
g.created_at as message_sent,
username as sender,
members as original_members,
ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.username')
  FROM UNNEST(json2array(JSON_EXTRACT(members, '$'))) x
  ) AS members
from `table_1` g
join `table_2` u
 on u._id = json_value(g.user, "$.id") and is_staff is false
where g.type = 'message.new'
order by 1 desc

The error on output is this:

Array cannot have a null element; error in writing field members

Would anyone know how to get this desired output?



Solution 1:[1]

Just add WHERE clause as in below example

  ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.username')
  FROM UNNEST(json2array(JSON_EXTRACT(members, '$'))) x
  WHERE NOT JSON_EXTRACT_SCALAR(x, '$.username') IS NULL
  ) AS members  

Or, use below option (in which case you don't even need json2array UDF

  array(select json_extract_scalar(x, '$.username')
  from unnest(json_extract_array(members)) x
  where not json_extract_scalar(x, '$.username')is null
  ) as members    

if applied to dummy data as in your question

with your_table as (
  select '[{"name":"","username":null},{"name":"Jimmy Dean","username":"iamjc"},{"name":"Ben Simmons","username":"bens"}]' members
)
select *,
  array(select json_extract_scalar(x, '$.username')
  from unnest(json_extract_array(members)) x
  where not json_extract_scalar(x, '$.username')is null
  ) as members
from your_table             

output is

enter image description here

Solution 2:[2]

Specifying IGNORE NULLS when you create an array would be helpful. Refer to below query.

WITH sample_data AS (
  SELECT '[{"name":"","username":null},{"name":"Jimmy Dean","username":"iamjc"},{"name":"Ben Simmons","username":"bens"}]' AS members
)
SELECT ARRAY_AGG(JSON_VALUE(m, '$.username') IGNORE NULLS) AS members 
  FROM sample_data, UNNEST(JSON_QUERY_ARRAY(members, '$')) m;

+------+---------+
| Row  | members |
+------+---------+
| 1    | iamjc   |
|      | benj    |
+------+---------+

And since JSON_EXTRACT_* functions became a legacy, better use new JSON functions:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2