'Can relational database scale horizontally

After some googling I have found:

Note from mysql docs:

MySQL Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost, commodity hardware to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs.

Can relational database be horizontal scaling? Will it be somehow based on NoSQL database?

Do someone have any real world example?

How can I manage sql requests, transactions, and so on in such database?



Solution 1:[1]

It is possible but takes lots of maintenance efforts, Explanation -

Vertical Scaling of data (synonymous to Normalisation in SQL databases) is referred as splitting data column wise into multiple tables in order to reduce space redundancy. Example of user table -

enter image description here

Horizontal Scaling of data (synonymous to sharding) is referred as splitting row wise into multiple tables in order to reduce time taken to fetch data. Example of user table -

enter image description here

Key point to note here is as we can see tables in SQL databases are Normalised into multiple tables of related data. In order to shard data of such table on multiple machines, you would need to shard related normalised data accordingly which in turn would increase maintenance efforts. Like in the example presented above of SQL database,

Customer table which is related as one to many relation with Order table

If you move some rows of customer data onto other machine (referred as sharding) you would also need to move its related order data onto the same machine which would be troublesome task in case of multiple related tables.

Its convenient for NOSQL databases to shard out as they follow flat table structure (data is stored in aggregated form rather than normalised form).

Solution 2:[2]

Thanks for the question and answer. I was trying to explain this to someone like this:

In terms of the CAP theorem, you can't have all three. So when a partition (network or server failure) occurs:

  • A relational database on a single server is giving you C (consistency). So when a P (partition - server/network failure) occurs, you can't have A (availability - db goes down)

  • A nosql datastore if you want A when a P occurs, you can't have C (one or more of your replicated partitions will be out of sync, until the n/w comes back and they all sync up). So it will only be eventually consistent

EDITED #2: to provide more perspective based on the comment below by Manish. My intention is to explain by example, why you cant have all 3. As noted below in the comments, there are other dbs where you can have C when P occurs at the expense of A.

Solution 3:[3]

Google Spanner is an example of a relational database that can scale horizontally. Sharding and replication are done automatically so no need to worry about that. For more information please check out this paper.

Solution 4:[4]

Yes, but it need to migrate when storage increased.

Some open source tools can support the feature, for example: Vitess or Apache ShardingSphere.

Solution 5:[5]

Yes it can. It is called NewSQL.

NewSQL is a new approach to relational databases that wants to combine transactional ACID (atomicity, consistency, isolation, durability) guarantees of good ol’ RDBMSs and the horizontal scalability of NoSQL. Source

Examples for Databases:

  • User-Shared MySQL Cluster
  • Citus (PostgreSQL extension)
  • CockroachDB
  • Azure Cosmos DB
  • Google Spanner
  • NuoDB
  • Vitess
  • Splice Machine (part of Hadoop ecosystem)
  • MemQSL (in memory store)
  • VoltDB (in memory store)

Examples for Data Warehouses:

  • IBM Netezza
  • Oracle
  • Teradata
  • Hive Engine (part of Hadoop ecosystem)
  • Spark SQL (part of Hadoop ecosystem)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 anegru
Solution 4 Liang Zhang
Solution 5 illuminato