'pyspark - getting error 'list' object has no attribute 'write' when attempting to write to a delta table

I am attempting to read the first X number of rows of a delta table into a dataframe, and then write (overwrite) that back to the delta table. Here is code:

# read from entire delta table into dataframe
revEnrichRef = spark.read.format("delta").load("/mnt/tables/myTable")

# retrieve first 5 rows
dfSubset = revEnrichRef.head(5)
dfSubset.write.format("delta").mode("overwrite").save("/mnt/tables/myTable")

at this point I get the error: 'list' object has no attribute 'write'

I guess that means head returns list rather than a new dateframe. What I really want is a solution that will return x rows to a dataframe. Alternatively, have a way to do this without an intermediary dataframe is just as good. Any help is appreciated. Thanks



Solution 1:[1]

You can do so with the limit method. This returns a dataframe limited to the number of rows passed as the argument.

dfSubset = revEnrichRef.limit(5)

The head method is an action which will collect 5 rows from your dataframe as a list. (or a single Row object if n = 1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ScootCork