steevej
(Steeve Juneau)
5
One idea I got since you have
which may or not be applicable to the whole use case is to:
- take 1 node out of the replica set
- make this node a dedicated not replicated node where collection Y is written
My idea is that since you read and write a lots of data from the same cluster, you constantly swap in and out the working set. Your cluster might be overloaded by disk I/O. With the new setup, the replica set is not busy replicating all the writes and can serve the reads better.
Anyway 4 nodes is not a recommended configuration so you lose nothing by taking a node out.
Can you give more details about:
What metrics are you using for the above.
The hardware configuration of the 4 nodes.
The read and write concerns you use.
Since the script usually takes 1 hour, it looks like an analytic use-case and having dedicated notes for analytic is sometimes a good choice.
Have you tried different sizes?
I don’t know but it should be straight forward to test. But, I suspect that it may be better because the documents need to be fetched again, adding a lot of I/O again.