Hello everyone,

I have a large platform that provides eCommerce stores for people, similar to Shopify, and I’m facing a problem about choosing the best shard key for my collections.

Context:

  1. Platform Overview:
  • The platform allows users to create and manage their own online stores.
  • We have multiple collections, such as orders, products, customers, etc.
  • Each collection includes a store_id field to identify the store to which the document belongs.
  1. Current Considerations:
  • We are considering using store_id as the shard key for our collections to ensure that all documents related to the same store are located on the same shard.
  • We aim to achieve data locality, reduce network overhead, and simplify querying across collections related to the same store.
  1. Questions:
  • Is using store_id as the shard key a good approach for our scenario?
  • What are the potential pitfalls of using store_id as a shard key?
  • Are there better alternatives or additional strategies we should consider to ensure optimal performance and scalability?
  • How do we handle cases where certain stores have significantly more data than others, potentially causing unbalanced shards?
  • What are the pros and cons of using a hashed shard key versus a ranged shard key for store_id in this context?
  • Any best practices for sharding in a multi-tenant eCommerce platform would be highly appreciated.