'Is it good practice to have multiple keyspaces in Cassandra?
I've Cassandra configured on Amazon EC2 having 3 nodes(instances) in single cluster. Now what I want to do is give some space on Cassandra to my clients by creating separate keyspace for each in a single cluster. Number of clients will increase day by day so there is no fix number of keyspaces I can assume to be created.
Will there be a performance issue if I create too many keyspaces in single cluster?
If it's not good practice, then Is there any other workaround to fit my need? I don't want to configure multiple instances of Cassandra.
Solution 1:[1]
A small number of separate keyspaces are fine, but using a large number of keyspaces will cause performance issues. The problem is not so much the keyspace overhead, but the large number of tables duplicated in each keyspace. Cassandra has per table overhead such as reserving 1 MB of heap. Good advice is to not exceed a few hundred tables.
How will thousands of tables in Cassandra perform? There are open bug reports that indicate having a large number of tables in the thousands can cause high CPU utilization CASSANDRA-10588 and longer startup times CASSANDRA-794.
Randy Fraden at BlackRock gave an excellent presentation at the 2015 Cassandra Summit on Multi-Tenancy in Cassandra at BlackRock. As noted above, the usual recommendation for multi-tenancy is to put the tenant_id in the partition key. BlackRock then used custom IAuthenticator and IAuthorizer modules to enforce tenant security at the level of a partition.
For those situations which require the same tables in multiple keyspaces, there is a feature request to allow for template tables CASSANDRA-7662 which would add a little syntactic sugar to ease the task of creating similar tables.
Solution 2:[2]
It depends. Depends how many clients finally you'd like to have (e.g. are we talking about hundreds or thousands?), how many tables are in each keyspaces, and how they are used. More keyspaces x more tables = more memtables to be kept in memory. Tables overhead also differ for different cassandra version. If it's just a standard "multi-tenancy" then you might consider to add tenant_id column to partition key.
Also take a look at similar posts asking about number of tables.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | mmatloka |