'PySpark read data into Dataframe, transform in sql, then save to dataframe
New to Spark and Synapse....Need to do some transformation including adding a columns, changing datatypes, etc. I am reading a csv into a dataframe. I'd like to save the dataframe as a temp view, do my transformation in SQL (using a cell with %%sql), then save the data frame as parquet file in another folder.
If I create add columns in my temp view, do I need to save my temp view back to another data frame? Or does my Original dataframe now include the new columns? If not, how do I create a new dataframe ( that I can write as parquet) from my sql temp view?
Or is there a link that shows a good algorithm for performing my task?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|