'Errors when running spark-submit on a local machine with Apache Spark (stand alone, single node)
I've installed apache spark on my mac with 16 GB of RAM to test my pyspark code locally with small data sets before I test it on a real cluster. I've installed apache spark via brew and use spark-submit to run my pyspark code locally. It works, except sometimes I get a strange generic error when I run my pyspark code via spark-submit. When I re-run the same code it runs fine, then when I run it again I may or may not get that same error. Hence, this error occurs randomly.
Can anyone help explain why this is occurring and how to fix it? Is this a memory issue and is there some setting/parameter needs to be set, to what and where?
Here is how I am running my code:
spark-submit --packages org.apache.spark:spark-avro_2.12:3.2.1 spark-load-avro-1.py
spark-load-avro-1.py code is below. I am loading 4 avro files into a data frame. Total size of the files is less than 100 KB (very small).
from pyspark.sql import functions as psf
from pyspark.sql.types import StringType
from pyspark.sql import SparkSession
# Create Spark session.
spark = SparkSession.builder.getOrCreate()
# Create Spark context.
sc = spark.sparkContext
print("Turn OFF optional Spark logging.")
spark.sparkContext.setLogLevel("OFF")
# Load avro files into the dataframe.
df = spark.read.format("avro").load("/data/")
print('Avro file schema.')
df.printSchema()
# Extract the event header and load into a dataframe.
event_hdr = \
df.select(\
'Properties')\
.select(\
psf.col('Properties.ce_id.member3').cast(StringType()).alias('ce_id'),\
psf.col('Properties.ce_time.member3').cast(StringType()).alias('ce_time'))
The code runs the first time fine, here is the output:
<OP>@OP code % spark-submit --packages org.apache.spark:spark-avro_2.12:3.2.1 spark-load-avro-1.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
:: loading settings :: url = jar:file:/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/<OP>/.ivy2/cache
The jars for the packages stored in: /Users/<OP>/.ivy2/jars
org.apache.spark#spark-avro_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-1ddc2b1b-7d27-412f-b0fd-e4cfdf4fdff2;1.0
confs: [default]
found org.apache.spark#spark-avro_2.12;3.2.1 in central
found org.tukaani#xz;1.8 in central
found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 117ms :: artifacts dl 2ms
:: modules in use:
org.apache.spark#spark-avro_2.12;3.2.1 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.tukaani#xz;1.8 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-1ddc2b1b-7d27-412f-b0fd-e4cfdf4fdff2
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/3ms)
22/05/05 11:06:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/05 11:06:45 INFO SparkContext: Running Spark version 3.2.1
22/05/05 11:06:45 INFO ResourceUtils: ==============================================================
22/05/05 11:06:45 INFO ResourceUtils: No custom resources configured for spark.driver.
22/05/05 11:06:45 INFO ResourceUtils: ==============================================================
22/05/05 11:06:45 INFO SparkContext: Submitted application: spark-load-avro-1.py
22/05/05 11:06:45 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/05/05 11:06:45 INFO ResourceProfile: Limiting resource is cpu
22/05/05 11:06:45 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/05/05 11:06:45 INFO SecurityManager: Changing view acls to: <OP>
22/05/05 11:06:45 INFO SecurityManager: Changing modify acls to: <OP>
22/05/05 11:06:45 INFO SecurityManager: Changing view acls groups to:
22/05/05 11:06:45 INFO SecurityManager: Changing modify acls groups to:
22/05/05 11:06:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(<OP>); groups with view permissions: Set(); users with modify permissions: Set(<OP>); groups with modify permissions: Set()
22/05/05 11:06:45 INFO Utils: Successfully started service 'sparkDriver' on port 49251.
22/05/05 11:06:45 INFO SparkEnv: Registering MapOutputTracker
22/05/05 11:06:45 INFO SparkEnv: Registering BlockManagerMaster
22/05/05 11:06:45 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/05/05 11:06:45 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/05/05 11:06:45 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/05/05 11:06:45 INFO DiskBlockManager: Created local directory at /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/blockmgr-e7716b36-14aa-4b39-8439-5188165cf96a
22/05/05 11:06:45 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
22/05/05 11:06:45 INFO SparkEnv: Registering OutputCommitCoordinator
22/05/05 11:06:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/05/05 11:06:45 INFO Utils: Successfully started service 'SparkUI' on port 4041.
22/05/05 11:06:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://<ip-reducted>:4041
22/05/05 11:06:45 INFO SparkContext: Added JAR file:///Users/<OP>/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar at spark://<ip-reducted>:49251/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO SparkContext: Added JAR file:///Users/<OP>/.ivy2/jars/org.tukaani_xz-1.8.jar at spark://<ip-reducted>:49251/jars/org.tukaani_xz-1.8.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO SparkContext: Added JAR file:///Users/<OP>/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://<ip-reducted>:49251/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO SparkContext: Added file file:///Users/<OP>/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar at file:///Users/<OP>/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: Copying /Users/<OP>/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.apache.spark_spark-avro_2.12-3.2.1.jar
22/05/05 11:06:45 INFO SparkContext: Added file file:///Users/<OP>/.ivy2/jars/org.tukaani_xz-1.8.jar at file:///Users/<OP>/.ivy2/jars/org.tukaani_xz-1.8.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: Copying /Users/<OP>/.ivy2/jars/org.tukaani_xz-1.8.jar to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.tukaani_xz-1.8.jar
22/05/05 11:06:45 INFO SparkContext: Added file file:///Users/<OP>/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at file:///Users/<OP>/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: Copying /Users/<OP>/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.spark-project.spark_unused-1.0.0.jar
22/05/05 11:06:45 INFO Executor: Starting executor ID driver on host <ip-reducted>
22/05/05 11:06:45 INFO Executor: Fetching file:///Users/<OP>/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: /Users/<OP>/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar has been previously copied to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.spark-project.spark_unused-1.0.0.jar
22/05/05 11:06:45 INFO Executor: Fetching file:///Users/<OP>/.ivy2/jars/org.tukaani_xz-1.8.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: /Users/<OP>/.ivy2/jars/org.tukaani_xz-1.8.jar has been previously copied to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.tukaani_xz-1.8.jar
22/05/05 11:06:45 INFO Executor: Fetching file:///Users/<OP>/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: /Users/<OP>/.ivy2/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar has been previously copied to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.apache.spark_spark-avro_2.12-3.2.1.jar
22/05/05 11:06:45 INFO Executor: Fetching spark://<ip-reducted>:49251/jars/org.tukaani_xz-1.8.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO TransportClientFactory: Successfully created connection to /<ip-reducted>:49251 after 14 ms (0 ms spent in bootstraps)
22/05/05 11:06:45 INFO Utils: Fetching spark://<ip-reducted>:49251/jars/org.tukaani_xz-1.8.jar to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/fetchFileTemp5703672242839882477.tmp
22/05/05 11:06:45 INFO Utils: /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/fetchFileTemp5703672242839882477.tmp has been previously copied to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.tukaani_xz-1.8.jar
22/05/05 11:06:45 INFO Executor: Adding file:/private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.tukaani_xz-1.8.jar to class loader
22/05/05 11:06:45 INFO Executor: Fetching spark://<ip-reducted>:49251/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: Fetching spark://<ip-reducted>:49251/jars/org.apache.spark_spark-avro_2.12-3.2.1.jar to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/fetchFileTemp15119737041244840771.tmp
22/05/05 11:06:45 INFO Utils: /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/fetchFileTemp15119737041244840771.tmp has been previously copied to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.apache.spark_spark-avro_2.12-3.2.1.jar
22/05/05 11:06:45 INFO Executor: Adding file:/private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.apache.spark_spark-avro_2.12-3.2.1.jar to class loader
22/05/05 11:06:45 INFO Executor: Fetching spark://<ip-reducted>:49251/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1651766805237
22/05/05 11:06:45 INFO Utils: Fetching spark://<ip-reducted>:49251/jars/org.spark-project.spark_unused-1.0.0.jar to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/fetchFileTemp2839155158615789924.tmp
22/05/05 11:06:45 INFO Utils: /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/fetchFileTemp2839155158615789924.tmp has been previously copied to /private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.spark-project.spark_unused-1.0.0.jar
22/05/05 11:06:45 INFO Executor: Adding file:/private/var/folders/2w/_hgclycd23lftjxygbswr_yc0000gq/T/spark-29fd3283-a8c1-4295-8b60-f9dfd79793ac/userFiles-edb15b12-a223-4765-bf17-0dde4c7cdf9b/org.spark-project.spark_unused-1.0.0.jar to class loader
22/05/05 11:06:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49254.
22/05/05 11:06:45 INFO NettyBlockTransferService: Server created on <ip-reducted>:49254
22/05/05 11:06:45 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/05/05 11:06:45 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, <ip-reducted>, 49254, None)
22/05/05 11:06:45 INFO BlockManagerMasterEndpoint: Registering block manager <ip-reducted>:49254 with 434.4 MiB RAM, BlockManagerId(driver, <ip-reducted>, 49254, None)
22/05/05 11:06:45 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, <ip-reducted>, 49254, None)
22/05/05 11:06:45 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, <ip-reducted>, 49254, None)
Turn OFF optional Spark logging.
Avro file schema.
root
|-- SequenceNumber: long (nullable = true)
|-- Offset: string (nullable = true)
|-- EnqueuedTimeUtc: string (nullable = true)
|-- SystemProperties: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- member0: long (nullable = true)
| | |-- member1: double (nullable = true)
| | |-- member2: string (nullable = true)
| | |-- member3: binary (nullable = true)
|-- Properties: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- member0: long (nullable = true)
| | |-- member1: double (nullable = true)
| | |-- member2: string (nullable = true)
| | |-- member3: binary (nullable = true)
|-- Body: binary (nullable = true)
When I re-run it, sometimes I get these errors and if I re-run again it usually runs fine and produces the above good output. I pinpointed that the error occurs during the creation of the "event_hdr" dataframe in the code shown above.
<OP>@OP code % spark-submit --packages org.apache.spark:spark-avro_2.12:3.2.1 spark-load-avro-1.py
...
Error start:
java.lang.IllegalArgumentException: Too large frame: 5785721462337832960
at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
22/05/05 11:23:09 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /<ip-reducted>:49399 is closed
22/05/05 11:23:09 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: Too large frame: 5785721462337832960
at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
22/05/05 11:23:10 INFO SparkUI: Stopped Spark web UI at http://<ip-reducted>:4041
22/05/05 11:23:10 ERROR Utils: Uncaught exception in thread Thread-4
java.lang.NullPointerException
at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:173)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:927)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2567)
at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2086)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1442)
at org.apache.spark.SparkContext.stop(SparkContext.scala:2086)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:677)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
22/05/05 11:23:10 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/05/05 11:23:10 INFO MemoryStore: MemoryStore cleared
22/05/05 11:23:10 INFO BlockManager: BlockManager stopped
22/05/05 11:23:10 INFO BlockManagerMaster: BlockManagerMaster stopped
22/05/05 11:23:10 WARN MetricsSystem: Stopping a MetricsSystem that is not running
22/05/05 11:23:10 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/05/05 11:23:10 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/Users/<OP>/g-drive/git/michael-okulik/pyspark-test/code/spark-load-avro-1.py", line 7, in <module>
spark = SparkSession.builder.getOrCreate()
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 392, in getOrCreate
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 146, in __init__
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1585, in __call__
File "/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalArgumentException: Too large frame: 5785721462337832960
at org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
22/05/05 11:23:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0
java.lang.ExceptionInInitializerError
at org.apache.spark.executor.Executor.stop(Executor.scala:333)
at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2019)
at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException
at org.apache.spark.shuffle.ShuffleBlockPusher$.<init>(ShuffleBlockPusher.scala:465)
at org.apache.spark.shuffle.ShuffleBlockPusher$.<clinit>(ShuffleBlockPusher.scala)
... 16 more
<OP>@OP code %
Appreciate everyone's help!!
Best Regards
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|