Docs Menu
Docs Home
/
MongoDB Manual
/ /

Create Chunks in a Sharded Cluster

In most situations a sharded cluster will create/split and distribute chunks automatically without user intervention. However, in a limited number of cases, MongoDB cannot create enough chunks or distribute data fast enough to support the required throughput.

For example, if you want to ingest a large volume of data into a cluster that is unbalanced, or where the ingestion of data will lead to data imbalance, such as with monotonically increasing or decreasing shard keys. Pre-splitting the chunks of an empty sharded collection can help with the throughput in these cases.

Alternatively, by defining the zones and zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. For more information, see Empty Collection.

Warning

Only pre-split chunks for an empty collection. Manually splitting chunks for a populated collection can lead to unpredictable chunk ranges and sizes as well as inefficient or ineffective balancing behavior.

To split empty chunks manually, you can run the split command:

Example

To create chunks for documents in the myapp.users collection using the email field as the shard key, use the following operation in mongosh:

for ( var x=97; x<97+26; x++ ){
for ( var y=97; y<97+26; y+=6 ) {
var prefix = String.fromCharCode(x) + String.fromCharCode(y);
db.adminCommand( { split: "myapp.users", middle: { email : prefix } } );
}
}

This assumes a collection size of 100 million documents.

  • For information on the initial chunks created and distributed by the sharding command, see Empty Collection.

  • For information on the balancer and automatic distribution of chunks across shards, see Cluster Balancer and Chunk Migration.

  • For information on manually migrating chunks, see Migrate Chunks in a Sharded Cluster.

Tip

See also:

Back

Data Partitioning