'Trigger IF Statement only when two Spark dataframe meet the conditions

I have two identical Spark DataFrame. They have the same columns. I am trying to create a IF-Else statement in one line but couldnt find a better way to do it.

if (df1.col1 = df2.col2 and df1.col2 < df2.col2){
   val final_df = df1.union(df2)
}
else{
println("No Match")
}

What I am trying to do is from the two dataframe. If any 1 row in col1 from both df1 and df2 match,and also the match rows in df1 and df2, the rows in col2 from df2 has to be larger than in df1. Then I will union both dataframe. Otherwise, print message no match.

Better way to illustrate it below

df1.show()
+----+----------+
|name|version_nb|
+----+----------+
|tony|56        |
|sam |96        |
|john|9         |

df2.show()
+----+----------+
|name|version_nb|
+----+----------+
|tony|78        |
|mary|12        |
|Rob |2         |

In this scenario above, the if-else statement will trigger and union both DF because in name column from both DF has a match also the version_nb in df2 is larger than df1. But if we only have the name match but the version_nb in df2 is either smaller or equal to df1. Then it will print out message No match.

That will be great if you all can give me some ideas or suggestions on how to create the if-else statement. That will be great.



Solution 1:[1]

You can join the dataframes on name, filter for rows where version in df2 is greater than version in df1, then check if any rows exists. For your example:

val df3 = df1.join(df2, "name").filter(df2.col("version_nb") > df1.col("version_nb"))

if (!df3.isEmpty)
...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Saining Li