'Error to write dataframe in Cassandra table on Amazon Keyspaces

I'm trying to write a dataframe on AWS (Keyspace), but I'm getting the following messages below:

Stack:

dfExploded.write.cassandraFormat(table = "table", keyspace = "hub").mode(SaveMode.Append).save()
21/08/18 21:45:18 WARN DefaultTokenFactoryRegistry: [s0] Unsupported partitioner 'com.amazonaws.cassandra.DefaultPartitioner', token map will be empty.
java.lang.AssertionError: assertion failed: There are no contact points in the given set of hosts
  at scala.Predef$.assert(Predef.scala:223)
  at com.datastax.spark.connector.cql.LocalNodeFirstLoadBalancingPolicy$.determineDataCenter(LocalNodeFirstLoadBalancingPolicy.scala:195)
  at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$dataCenterNodes$1(CassandraConnector.scala:192)
  at scala.Option.getOrElse(Option.scala:189)
  at com.datastax.spark.connector.cql.CassandraConnector$.dataCenterNodes(CassandraConnector.scala:192)
  at com.datastax.spark.connector.cql.CassandraConnector$.alternativeConnectionConfigs(CassandraConnector.scala:207)
  at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$3(CassandraConnector.scala:169)
  at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:34)
  at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
  at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
  at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:89)
  at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
  at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
  at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
  at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
  at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
  at org.apache.spark.sql.DataFrameWriter.getTable$1(DataFrameWriter.scala:339)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)

SparkSubmit:

spark-submit --deploy-mode cluster --master yarn  \
--conf=spark.cassandra.connection.port="9142" \
--conf=spark.cassandra.connection.host="cassandra.sa-east-1.amazonaws.com" \
--conf=spark.cassandra.auth.username="BUU" \
--conf=spark.cassandra.auth.password="123456789" \
--conf=spark.cassandra.connection.ssl.enabled="true" \
--conf=spark.cassandra.connection.ssl.trustStore.path="cassandra_truststore.jks"
--conf=spark.cassandra.connection.ssl.trustStore.password="123456"

Connection by cqlsh everything ok, but in spark got this error



Solution 1:[1]

To read and write data between Keyspaces and Apache Spark by using the open-source Spark Cassandra Connector all you have to do is update the partitioner for your Keyspaces account.

Docs: https://docs.aws.amazon.com/keyspaces/latest/devguide/spark-integrating.html

Solution 2:[2]

The issue as the error states is that AWS Keyspaces uses a partitioner (com.amazonaws.cassandra.DefaultPartitioner) that isn't supported by the Spark-Cassandra-connector.

There isn't a lot of public docs around what the underlying database is for AWS Keyspaces so I've long-suspected that there's a CQL API engine sitting in front of Keyspaces so it "looks" like Cassandra but it's probably backed by something else like Dynamo DB. I'm more than happy to be corrected by someone here from AWS just so I can put that to bed. ?

The default Cassandra partitioner is Murmur3Partitioner and is the only recommended partitioner. The older partitioners such as RandomPartitioner and ByteOrderedPartitioner are supported only for backward compatibility but should never be used for new clusters.

Finally, we don't test the Spark connector against AWS Keyspaces so be prepared for a lot of surprises there. Cheers!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arturo Hinojosa
Solution 2 Erick Ramirez