'SPARK SQL - case when then
I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?
select case when 1=1 then 1 else 0 end from table
Thanks Sridhar
Solution 1:[1]
Before Spark 1.2.0
The supported syntax (which I just tried out on Spark 1.0.2) seems to be
SELECT IF(1=1, 1, 0) FROM table
This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.
Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.
Update for Spark 1.2.0 and beyond
As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:
SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData
Update for most recent place to figure out syntax from the SQL Parser
The parser source can now be found here.
Update for more complex examples
In response to a question below, the modern syntax supports complex Boolean conditions.
SELECT
CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer
You can involve multiple columns in the condition.
SELECT
CASE WHEN id = 1 OR state = 'MA'
THEN "OneOrMA"
ELSE "NotOneOrMA" END AS IdRedux
FROM customer
You can also nest CASE WHEN THEN expression.
SELECT
CASE WHEN id = 1
THEN "OneOrMA"
ELSE
CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
END AS IdRedux
FROM customer
Solution 2:[2]
For Spark 2.+ Spark when function
From documentation:
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer.
// Scala:
people.select(when(col("gender") === "male", 0)
.when(col("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
Solution 3:[3]
This syntax worked for me in Databricks:
select
org,
patient_id,
case
when (age is null) then 'Not Available'
when (age < 15) then 'Less than 15'
when (age >= 15 and age < 25) then '15 to 25'
when (age >= 25 and age < 35) then '25 to 35'
when (age >= 35 and age < 45) then '35 to 45'
when (age >= 45) then '45 and Older'
end as age_range
from demo
Solution 4:[4]
The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:
? case
? ? ? when exp1 in ('a','b','c')
? ? ? ? then element_at(map('a','A','b','B','c','C'), exp1)
? ? ? else exp1
? ? end
Solution 5:[5]
Based on my current production code, this works
val identifierDF =
tempIdentifierDF.select(tempIdentifierDF("t_item_account_id"),
when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_cusip")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_ticker")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_isin")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_sedol")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_valoren")),100)
.otherwise(0)
.alias("identifier_in_description_score")
)
Solution 6:[6]
Spark DataFrame API (Python version) also enable to do next query:
df.selectExpr('time', \
'CASE WHEN (time > 1) THAN time * 1.1 ELSE time END AS updated_time')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Michael Brenndoerfer |
Solution 3 | John |
Solution 4 | mr.polden2010 |
Solution 5 | |
Solution 6 | Vladimir Gavrysh |