'Spark SQL: Parse date string from dd/mm/yyyy to yyyy/mm/dd
I want to use spark SQL or pyspark to reformat a date field from 'dd/mm/yyyy' to 'yyyy/mm/dd'. The field type is string:
from pyspark.sql import SparkSession
from pyspark.sql.types import StringType
spark = SparkSession.builder.master("local[1]")\
.appName("date.com")\
.getOrCreate()
my_df = spark.createDataFrame(["13/04/2020", "16/04/2020", "19/04/2020"], StringType()).toDF("date")
expected_df = spark.createDataFrame(["2020/04/12", "2020/04/16", "2020/04/19"], StringType()).toDF("date")
I have tried the following spark sql command, but this returns the date as literally 'yyyy/MM/dd' rather than '2020/04/12'.
select date_format(date, 'dd/MM/yyyy'), 'yyyy/MM/dd' as reformatted_date
FROM my_df
I have also looked at the following documentation but didn't see anything that fits my scenario: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
If it's not possible in spark sql then pyspark would work. Any ideas?
Solution 1:[1]
You need to convert to date type using to_date
first:
select date_format(to_date(date, 'dd/MM/yyyy'), 'yyyy/MM/dd') as reformatted_date
from my_df
Solution 2:[2]
df1.select( to_date(date_format(to_date(lit("12/12/2020"), "dd/MM/yyyy"), "yyyy-MM-dd") ).as("campo")).show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mck |
Solution 2 | Cristina Cano Sandin |