'How do I add a new date column with constant value to a Spark DataFrame (using PySpark)?

I want to add a column with a default date ('1901-01-01') with exiting dataframe using pyspark?

I used below code snippet

from pyspark.sql import functions as F
  strRecordStartTime="1970-01-01"
  recrodStartTime=hashNonKeyData.withColumn("RECORD_START_DATE_TIME",
  lit(strRecordStartTime).cast("timestamp")
  )

It gives me following error org.apache.spark.sql.AnalysisException: cannot resolve '1970-01-01' Any pointer is appreciated?



Solution 1:[1]

Try to use python native datetime with lit, I'm sorry don't have the access to machine now.

recrodStartTime = hashNonKeyData.withColumn('RECORD_START_DATE_TIME', lit(datetime.datetime(1970, 1, 1))

Solution 2:[2]

I have created one spark dataframe:

from pyspark.sql.types import StringType
df1 = spark.createDataFrame(["Ravi","Gaurav","Ketan","Mahesh"], StringType()).toDF("Name")

Now lets add one new column to the exiting dataframe:

from pyspark.sql.functions import lit
import dateutil.parser
yourdate = dateutil.parser.parse('1901-01-01') 
df2= df1.withColumn('Age', lit(yourdate)) // addition of new column
df2.show() // to print the dataframe

You can validate your your schema by using below command.

df2.printSchema

Hope that helps.

Solution 3:[3]

from pyspark.sql import functions as F

strRecordStartTime = "1970-01-01"

recrodStartTime = hashNonKeyData.withColumn("RECORD_START_DATE_TIME", F.to_date(F.lit(strRecordStartTime)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Nurlan Zhangali