The documentation says, to shard a GridFS collection:
To shard the
chunks
collection, use either{ files_id : 1, n : 1 }
or{ files_id : 1 }
as the shard key index.
But there is no explanation as to the advantages or disadvantages of either method. Where could I find the most recent information on this? Is there a knowledge base article somewhere? I have used a search engine and find many old postings that mention the two options, but not many that discuss which one is beneficial, under which conditions.
Our example: we are going to store a large quantity of images in a five-shard cluster (let’s say at least 60TB). Each individual image is small (less than 16MB). I’m not sure we should even use GridFS, but that decision has already been made. There will be many more writes than reads. Our main performance consideration is write speed. What is the best sharding key?