'AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from 'pyspark/cloudpickle/__init__.py'>

While executing pyspark code from a script. Getting following error while df.show().

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
data = [("James","","Smith","36636","M",3000),
    ("Michael","Rose","","40288","M",4000)]

schema = StructType([
    StructField("firstname",StringType(),True),
    StructField("middlename",StringType(),True),
    StructField("lastname",StringType(),True),
    StructField("id", StringType(), True),
    StructField("gender", StringType(), True),
    StructField("salary", IntegerType(), True)
  ])
 
df = spark.createDataFrame(data=data,schema=schema)
df.printSchema()
df.show(truncate=False)


AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from '/Users/amijha0/Applications/apache-spark/spark-3.1.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/cloudpickle/__init__.py'>



Solution 1:[1]

The issue is because of pyspark version. I checked the installed modules using pip freeze

$ python -m pip freeze | grep pyspark
pyspark==3.0.0

Path says it is using Spark-3.1.1 and i am using pyspark==3.0.0 which in not having "_fill_function" function on pyspark.cloudpickle module. Hence the AttributeError.

For solution, I upgraded the pyspark version

 python -m pip install --upgrade pyspark==3.1.1 --use-feature=2020-resolver

Solution 2:[2]

I had the same issue and fixed it by upgrading pyspark.

$ python -m pip install --user --upgrade pyspark

The version I had was pyspark 3.0.0, and after the upgrade command above, it was pyspark 3.2.1

Problem solved

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 VirtualLogic
Solution 2 Rajesh Ramachander