'How to convert timestamp column of Spark Dataframe to string column

I want to convert Spark dataframe all TIMESTAMP columns into String columns. Could anybody say how to do that automatically for each dataframe?

The position of Timestamp column can change and also name of column can be different for each Dataframe.

For example in one of dataframe it can be DataFrame1 columnA, but in Dataframe2 it can be columnX.

So I need to use somehow information about column type in any given table and convert it to string column.

Do you have any ideas?



Solution 1:[1]

You can write a method to convert the timestamp to string and use that where ever required.

Solution 2:[2]

Take a look at the functions library. date_format looks like what you want.

It converts a timestamp to a String in the specified format

I use it this way:

    Dataset<Target> newData = spark.createDataset(targets, Encoders.bean(Target.class));

        newData.printSchema();

        newData.withColumn("date-time", functions.date_format(new Column("timestamp"), "yyyy-MM-dd_HH")).write()
                .mode(SaveMode.Append).option("basePath", basePath).partitionBy("date-time")

timestamp is a field in Target of type java.sql.Timestamp

Solution 3:[3]

val transformedColumns = DF.schema.fields.map(field =>
        if (field.dataType.typeName == "timestamp") {
          date_format(col(field.name), "yyyy-MM-dd HH:mm:ss").alias(field.name).cast(StringType)
        } else {
          col(field.name)
        }
      )
val transformedDF: DF.select(transformedColumns: _*)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kumar
Solution 2
Solution 3 Aditya