'Pipe Pyspark OSError: [WinError 87] The parameter is incorrect
I have installed Spark 3.0.0 on a Windows 64 bit machine with Python 3.9.7 using an anaconda
base environment.
I'm trying to execute the next code in the pyspark
shell to test the RDD pipe method.
myCollection = "Spark the Definitive Guide : Big Data as Made Simple".split(" ")
words = spark.sparkContext.parallelize(myCollection,2)
words.pipe("echo hello").collect()
Then, I get the following error when catching the pipe()
call.
File "C:\Users\aitor.hernandez\Spark3\python\lib\pyspark.zip\pyspark\worker.py", line 597, in main
File "C:\Users\aitor.hernandez\Spark3\python\lib\pyspark.zip\pyspark\worker.py", line 587, in process
File "C:\Users\aitor.hernandez\Spark3\python\pyspark\rdd.py", line 425, in func
return f(iterator)
File "C:\Users\aitor.hernandez\Spark3\python\pyspark\rdd.py", line 827, in func
pipe = Popen(
File "C:\Users\aitor.hernandez\Anaconda3\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\aitor.hernandez\Anaconda3\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 87] The parameter is incorrect
I've tried changeing the parameter "shell" to true
to init Popen objects.
I've also reviewed similiar issues but the most are about specific packages. None of them resolve this case. Does anyone know what happened and how I can resolve it?
Thank you so much.
Solution 1:[1]
Usually, it's a python version issue. May be ur using old windows system. So python 3.8 or 3.9 is not workings. Try to install python 3.6 or near version it's working. My old system also got the same issue, scala spark working fine but pyspark 3.8 not working, but when I changed python version 3.6 working fine. Try your end as well.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Venu A Positive |