'SparkSQL error: collect_set() cannot have map type data
For SparkSQL on hive, when I used named_struct
in the query, it returns results:
SELECT id, collect_set(emp_info) as employee_info
FROM
(
SELECT t.id, named_struct("name", t.emp_name, "dept", t.emp_dept) AS emp_info
FROM mytable t
) a
GROUP BY id
But when I replaced named_struct
with map
, I get an error:
SELECT id, collect_set(emp_info) as employee_info
FROM
(
SELECT t.id, map("name", t.emp_name, "dept", t.emp_dept) AS emp_info
FROM mytable t
) a
GROUP BY id
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'collect_set(a.`emp_info`)' due to data type mismatch: collect_set() cannot have map type data; line 36 pos 27;
'Distinct
I wish to return a map of name
and dept
, how to use with collect_set
?
FYI: This query with map
runs without issues in Hive(Hue)
Solution 1:[1]
Try converting the map to json first using to_json()
method, then you can collect the set in same query such as:
SELECT id, collect_set(to_json(map("name", emp_name, "dept", emp_dept))) AS emp_info
FROM mytable
GROUP BY id
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | tiff sea |