'behaviour of balancer in mongodb sharding
I was experimenting with mongo sharding
. The collection has shard key as {policyId,startTime}
.
policyId - java UUID (limited values,lets say 50)
startTime - monotonically increasing time.
After inserting around 30M(32 GB)
documents in the collection : Below is the data distribution:
shard key: { "policyId" : 1, "startDate" : 1 }
unique: false
balancing: true
chunks:
sharda 63
shardb 138
During insertion sh.isBalancerRunning()
was giving 'false' as result. When I stopped inserting more documents, balancer started moving chunks. After that I got even distribution of data.
Below are my concerns / Questions regarding balancer:
1. If insertion in db is stopped, then only balancer is active and started moving chunks. If I insert more data for longer duration which will create more chunks and data will be more skewed. Chunk migration will itself take more time to balance the shards. So how does mongo decide when to migrate chunks
?
2. I was able notice spikes in write latency
if data is getting inserted after 20M
docs. Does it mean balancer is moving some of the chunks intermittently?
3. Count API gives inconsistent result during chunk migration because balancer copies chunks from one shard to another and deletes the old chunk. Should we expect Find API
will also give incorrect result (duplicate docs)?
If is possible could any one share any documentation/blog for mongo balancer for better understanding.
Solution 1:[1]
Assumption is wrong (i.e. If insertion in db is stopped, then only balancer is active and started moving chunks). The balancer process automatically migrates chunks when there is an uneven distribution of a sharded collection’s chunks across the shards.
Migration is not a continuous or steadily process. Automatic migration happens when it is required. for more details refer https://docs.mongodb.com/v3.0/core/sharding-balancing/#sharding-migration-thresholds
Read while migration will not give incorrect result. No duplicates records should come via find API.
For more about balancer refer https://docs.mongodb.com/manual/core/sharding-balancer-administration/
About migration refer https://docs.mongodb.com/v3.0/core/sharding-chunk-migration/
Solution 2:[2]
There are various things to consider
- Default chunk size - 64 MB
- Cardinality - If cardinality is more then and your data over period of time will not cause same value to be more than 64 MB ( assume you store 1 or more years data ) then you don't have to worry. In case not then you probably had to increase the default chunk size
- Suppose you have 2 shards - Cardinality (hash key) is 100 then 50 values data will go to 1 shard and 50% to other. If you have range keys then 0-50 will go to 1 shard and 50-100 in other.
- Now suppose your current chunk with value A to F reaches size 64 MB then this chunk will be split and data will be moved to other shard.
- If your cardinality is low then A value itself can be more than 64 MB and chunk will not be able to split and marked as Jumbo chunk
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ramraj Patel |
Solution 2 | Mahesh Malpani |