'Why do I got TypeError: cannot pickle '_thread.RLock' object when using pyspark
I'm using spark to deal with my data, like that:
dataframe_mysql = spark.read.format('jdbc').options(
url='jdbc:mysql://xxxxxxx',
driver='com.mysql.cj.jdbc.Driver',
dbtable='(select * from test_table where id > 100) t',
user='xxxxxx',
password='xxxxxx'
).load()
result = spark.sparkContext.parallelize(dataframe_mysql, 1)
But I got this error from spark:
Traceback (most recent call last): File "/private/var/www/http/hawk-scripts/hawk_etl/scripts/spark_rds_to_parquet.py", line 46, in process() File "/private/var/www/http/hawk-scripts/hawk_etl/scripts/spark_rds_to_parquet.py", line 36, in process result = spark.sparkContext.parallelize(dataframe_mysql, 1).map(func) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 574, in parallelize File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 611, in _serialize_to_jvm File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 211, in dump_stream File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 143, in _write_with_length File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 427, in dumps TypeError: cannot pickle '_thread.RLock' object
Am I using it wrong? How should I use SparkContext.parallelize to deal with DataFrame?
Solution 1:[1]
I got it, dataframe_mysql
is already a Dataframe
, if want to get a RDD
, just use dataframe.rdd
instead of parallelize
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Neptune |