'How did spark RDD map to Cassandra table?
I am new to Spark, and recently I saw a code is saving data in RDD format to Cassandra table. But I am not able to figure it out how it is doing the column mapping. It neither uses case class, also specifies any column names in the code like below:
rdd
.map(x => (x._1, x._2, x_3)) // x is a List here
.repartitionByCassandraReplica(keyspace, tableName)
.saveToCassandra(keyspace, tableName)
Since x inside is simply a List[(Int, String, Int)]
, which is not a case class, there is no name mapping to Cassandra table. So is there any definite order in Cassandra table that can match the order of columns we specify in the code?
Solution 1:[1]
This mapping relies on the order of columns in the Cassandra table definition that is as following:
- Partition key columns in the specified order
- Clustering columns in the specified order
- Alphabetically sorted by name for rest of the columns
Spark Cassandra Connector relies that these columns from table definition will be matched to the order of fields in the Scala tuple. You can see that in the source code of TupleColumnMapper
class.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Alex Ott |