'spark ETL and spark thrift server
Some details:
- Spark SQL (version 3.2.1)
- Driver: Hive JDBC (version 2.3.9)
ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to run pyspark script using spark-submit as they both use the same metastore_db
error:
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@3acaa384, see the next exception for details.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
... 140 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /tmp/metastore_db.
I need to be able to run PySpark (Spark ETL) while having spark thrift server up for BI tool queries. Any workaround for it?
Thanks!
Solution 1:[1]
In my case the solution was to move the metastore_db to a database server like MySql (in my case) or Postgresql.
You will have to configure $SPARK_HOME/conf/hive-site.xml and include your jdbc driver in $SPARK_HOME/jars path
hive-site.xml example for MySQL connection
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://XXX.XXX.XXX.XXX:3306/metastore?createDatabaseIfNotExist=true&useSSL=FALSE&autoReconnect=true&nullCatalogMeansCurrent=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>YOUR_USER</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>YOUR_PASSWORD</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>http</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.http.endpoint</name>
<value>cliservice</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description/>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
</configuration>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Luis Estrada |