Randomly slow script

Boow · 2024-04-22T13:09:20.166Z

Hi,

We have a script that read a collection X, do some processing and write in the collection Y. Usually the script takes ~1 hour but sometime it’s very very slow. What can affect performance?

I don’t see a big load on my cluster… maybe a lock on my X collection that is slowing down my script? Size of my X collection: Long(‘27776896250’).

Thanks for your help.

steevej · 2024-04-22T14:01:24.945Z

Without more information from your exact cluster setup with cannot really help.

When you write read a collection X and size of X being 27_776_896_250 do you mean you read all of it and then write all of it back into a collection Y? Where is the script running compared to the server?

Boow · 2024-04-22T15:07:23.916Z

Oh yes sorry.

Replica set of 4 nodes.

Where is the script running compared to the server?

Script is in another machine but communicate with Internal network (1GB/s).

When you write read a collection X and size of X being 27_776_896_250 do you mean you read all of it and then write all of it back into a collection Y?

I read the data from the collection X, do some processing (if needed) and then I apply it to the new collection Y. I work in batches of 300. I create an empty collection, write all data and then create index (maybe it’s better to create the index before? I don’t think so because mongo will have to keep it up-to-date…).

Let me know if you have suggestion / tips / questions.
Thanks.

steevej · 2024-04-22T15:11:08.804Z

Use the aggregation framework with an $out stage rather than

Boow:

do some processing

steevej · 2024-04-26T12:16:21.269Z

One idea I got since you have

Boow:

Replica set of 4 nodes.

which may or not be applicable to the whole use case is to:

take 1 node out of the replica set
make this node a dedicated not replicated node where collection Y is written

My idea is that since you read and write a lots of data from the same cluster, you constantly swap in and out the working set. Your cluster might be overloaded by disk I/O. With the new setup, the replica set is not busy replicating all the writes and can serve the reads better.

Anyway 4 nodes is not a recommended configuration so you lose nothing by taking a node out.

Can you give more details about:

Boow:

it’s very very slow

Boow:

I don’t see a big load on my cluster

What metrics are you using for the above.

The hardware configuration of the 4 nodes.

The read and write concerns you use.

Since the script usually takes 1 hour, it looks like an analytic use-case and having dedicated notes for analytic is sometimes a good choice.

Boow:

I work in batches of 300

Have you tried different sizes?

Boow:

maybe it’s better to create the index before

I don’t know but it should be straight forward to test. But, I suspect that it may be better because the documents need to be fetched again, adding a lot of I/O again.

steevej · 2024-05-03T12:57:25.418Z

@Boow, it has been a week since I provided what I think is valuable input.

I would appreciate a follow-up.

Thanks