'How to extract values from key value map?

I have a column of type map, where the key and value changes. I am trying to extract the value and create a new column.

Input:

----------------+
|symbols        |
+---------------+
|[3pea -> 3PEA] |
|[barello -> BA]|
|[]             |
|[]             |
+---------------+

Expected output:

--------+
|symbols|
+-------+
|3PEA   |
|BA     |
|       |
|       |
+-------+

Here is what I tried so far using a udf:

def map_value=udf((inputMap:Map[String,String])=> {inputMap.map(x=>x._2) 
      })

java.lang.UnsupportedOperationException: Schema for type scala.collection.immutable.Iterable[String] is not supported



Solution 1:[1]

import org.apache.spark.sql.functions._
import spark.implicits._
val m = Seq(Array("A -> abc"), Array("B -> 0.11856755943424617"), Array("C -> kqcams"))

val df = m.toDF("map_data")
df.show
// Simulate your data I think.

val df2 = df.withColumn("xxx", split(concat_ws("",$"map_data"), "-> ")).select($"xxx".getItem(1).as("map_val")).drop("xxx")
df2.show(false)

results in:

+--------------------+
|            map_data|
+--------------------+
|          [A -> abc]|
|[B -> 0.118567559...|
|       [C -> kqcams]|
+--------------------+

+-------------------+
|map_val            |
+-------------------+
|abc                |
|0.11856755943424617|
|kqcams             |
+-------------------+

Solution 2:[2]

Since Spark scala v2.3 api, sql v2.3 api, or pyspark v2.4 api you can use the spark sql function map_values

The following is in pyspark, scala would be very similar.
Setup (assuming working SparkSession as spark):

from pyspark.sql import functions as F

df = (
    spark.read.json(sc.parallelize(["""[
        {"key": ["3pea"],    "value": ["3PEA"] },
        {"key": ["barello"], "value": ["BA"]   }
    ]"""]))
    .select(F.map_from_arrays(F.col("key"), F.col("value")).alias("symbols") )
)

df.printSchema()
df.show()
root
 |-- symbols: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+---------------+
|        symbols|
+---------------+
| [3pea -> 3PEA]|
|[barello -> BA]|
+---------------+
df.select((F.map_values(F.col("symbols"))[0]).alias("map_vals")).show()
+--------+
|map_vals|
+--------+
|    3PEA|
|      BA|
+--------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 thebluephantom
Solution 2 Clay