'Modify date (month) in spark date column based on condition
I would like to modify my date column in spark df to subtract 1 month only if certain months appear. I.e. only if date is yyyy-07-31 or date is yyyy-04-30 change it to yyyy-06-31 and yyyy-03-30 respectively. Any ideas how to do it using pyspark functions?
|DateColumn|
|2016-04-30|
|2015-04-30|
|2017-09-03|
|2017-07-31|
|2016-09-01|
|2018-07-31|
...
Expected outcome:
|DateColumn|
|2016-03-30| <- changed
|2015-03-30| <- changed
|2017-09-03|
|2017-06-31| <- changed
|2016-09-01|
|2018-06-31| <- changed
...
Solution 1:[1]
I would recommend using the functions
module and then combining several functionalities:
.when()
and thenotherwise()
.month()
.date_format()
.add_months(date, -1)
For example, it could come down to:
import pyspark.sql.functions as F
df = spark.createDataFrame([{'date': '2022-04-15'}, {'date': '2022-05-17'}])
df \
.withColumn('new_date',
F.when(F.month(F.col('date')).isin([4, 7]),
F.add_months(F.date_format('date', 'yyyy-MM-dd'), -1))
.otherwise(F.col('date'))) \
.show()
and you would then get:
+----------+----------+
| date| new_date|
+----------+----------+
|2022-04-15|2022-03-15|
|2022-05-17|2022-05-17|
+----------+----------+
Update (turned out it's rather substring problem than generic months subtracting, see comments for detail)
Actually, you can stack .when
functionality to do some sort of if-elif-else
in Python. Below code shows how to perform it in the scenario you shared:
from pyspark.sql.functions import when
df \
.withColumn('new_date',
when(F.substring(F.col("date"), -5, 5) == '01-31', F.concat(F.year(F.col("date")), F.lit('-12-31')))
.when(F.substring(F.col("date"), -5, 5) == '04-30', F.concat(F.year(F.col("date")), F.lit('-03-30')))
.when(F.substring(F.col("date"), -5, 5) == '07-31', F.concat(F.year(F.col("date")), F.lit('-06-30')))
.when(F.substring(F.col("date"), -5, 5) == '10-31', F.concat(F.year(F.col("date")), F.lit('-09-30')))
.otherwise(F.col('date'))) \
.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |