Hello everyone,
I have a large platform that provides eCommerce stores for people, similar to Shopify, and I’m facing a problem about choosing the best shard key for my collections.
Context:
- Platform Overview:
- The platform allows users to create and manage their own online stores.
- We have multiple collections, such as
orders
,products
,customers
, etc. - Each collection includes a
store_id
field to identify the store to which the document belongs.
- Current Considerations:
- We are considering using
store_id
as the shard key for our collections to ensure that all documents related to the same store are located on the same shard. - We aim to achieve data locality, reduce network overhead, and simplify querying across collections related to the same store.
- Questions:
- Is using
store_id
as the shard key a good approach for our scenario? - What are the potential pitfalls of using
store_id
as a shard key? - Are there better alternatives or additional strategies we should consider to ensure optimal performance and scalability?
- How do we handle cases where certain stores have significantly more data than others, potentially causing unbalanced shards?
- What are the pros and cons of using a hashed shard key versus a ranged shard key for
store_id
in this context? - Any best practices for sharding in a multi-tenant eCommerce platform would be highly appreciated.