Category "rdd"

Pipe Pyspark OSError: [WinError 87] The parameter is incorrect

I have installed Spark 3.0.0 on a Windows 64 bit machine with Python 3.9.7 using an anaconda base environment. I'm trying to execute the next code in the pyspar

Spark partition size greater than the executor memory

I have four questions. Suppose in spark I have 3 worker nodes. Each worker node has 3 executors and each executor has 3 cores. Each executor has 5 gb memory. (T

How did spark RDD map to Cassandra table?

I am new to Spark, and recently I saw a code is saving data in RDD format to Cassandra table. But I am not able to figure it out how it is doing the column mapp

ValueError: RDD is empty-- Pyspark (Windows Standalone)

I am trying to create an RDD but spark not creating it, throwing back error, pasted below; data = records.map(lambda r: LabeledPoint(extract_label(r), extract_

Jupyter Notebook PySpark OSError [WinError 123] The filename, directory name, or volume label syntax is incorrect:

System Configuration: Operating System: Windows 10 Python Version: 3.7 Spark Version: 2.4.4 SPARK_HOME: C:\spark\spark-2.4.4-bin-hadoop2.7 Problem I am using

Spark dataframe transform multiple rows to column

I am a novice to spark, and I want to transform below source dataframe (load from JSON file): +--+-----+-----+ |A |count|major| +--+-----+-----+ | a| 1| m