'SparkSQL error: collect_set() cannot have map type data

For SparkSQL on hive, when I used named_struct in the query, it returns results:

SELECT id, collect_set(emp_info) as employee_info
FROM
    (
     SELECT t.id, named_struct("name", t.emp_name, "dept", t.emp_dept) AS emp_info
     FROM mytable t
    ) a
GROUP BY id

But when I replaced named_struct with map, I get an error:

SELECT id, collect_set(emp_info) as employee_info
FROM
    (
     SELECT t.id, map("name", t.emp_name, "dept", t.emp_dept) AS emp_info
     FROM mytable t
    ) a
GROUP BY id

ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'collect_set(a.`emp_info`)' due to data type mismatch: collect_set() cannot have map type data; line 36 pos 27;
'Distinct

I wish to return a map of name and dept, how to use with collect_set? FYI: This query with map runs without issues in Hive(Hue)



Solution 1:[1]

Try converting the map to json first using to_json() method, then you can collect the set in same query such as:

SELECT id, collect_set(to_json(map("name", emp_name, "dept", emp_dept))) AS emp_info
FROM mytable
GROUP BY id

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tiff sea