'pyspark - getting error 'list' object has no attribute 'write' when attempting to write to a delta table
I am attempting to read the first X number of rows of a delta table into a dataframe, and then write (overwrite) that back to the delta table. Here is code:
# read from entire delta table into dataframe
revEnrichRef = spark.read.format("delta").load("/mnt/tables/myTable")
# retrieve first 5 rows
dfSubset = revEnrichRef.head(5)
dfSubset.write.format("delta").mode("overwrite").save("/mnt/tables/myTable")
at this point I get the error: 'list' object has no attribute 'write'
I guess that means head returns list rather than a new dateframe. What I really want is a solution that will return x rows to a dataframe. Alternatively, have a way to do this without an intermediary dataframe is just as good. Any help is appreciated. Thanks
Solution 1:[1]
You can do so with the limit method. This returns a dataframe limited to the number of rows passed as the argument.
dfSubset = revEnrichRef.limit(5)
The head method is an action which will collect 5 rows from your dataframe as a list. (or a single Row object if n = 1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ScootCork |