'AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from 'pyspark/cloudpickle/__init__.py'>
While executing pyspark code from a script. Getting following error while df.show().
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data = [("James","","Smith","36636","M",3000),
("Michael","Rose","","40288","M",4000)]
schema = StructType([
StructField("firstname",StringType(),True),
StructField("middlename",StringType(),True),
StructField("lastname",StringType(),True),
StructField("id", StringType(), True),
StructField("gender", StringType(), True),
StructField("salary", IntegerType(), True)
])
df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show(truncate=False)
AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from '/Users/amijha0/Applications/apache-spark/spark-3.1.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/cloudpickle/__init__.py'>
Solution 1:[1]
The issue is because of pyspark version. I checked the installed modules using pip freeze
$ python -m pip freeze | grep pyspark
pyspark==3.0.0
Path says it is using Spark-3.1.1 and i am using pyspark==3.0.0 which in not having "_fill_function" function on pyspark.cloudpickle module. Hence the AttributeError.
For solution, I upgraded the pyspark version
python -m pip install --upgrade pyspark==3.1.1 --use-feature=2020-resolver
Solution 2:[2]
I had the same issue and fixed it by upgrading pyspark.
$ python -m pip install --user --upgrade pyspark
The version I had was pyspark 3.0.0, and after the upgrade command above, it was pyspark 3.2.1
Problem solved
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | VirtualLogic |
Solution 2 | Rajesh Ramachander |