Hi there!
In my app I have history collection. I moved old records (older than 1 year) into DataLake archive, changed connection string to read data from both archive and common collection.
Is it ok that performance decreased even when reading unarchived data? Despite the fact that the query is executed using an indexed field and this datetime field is the key for archiving.
Hi @Denis_Stogniy,
Welcome to MongoDB community
I would suggest with specific atlas workload problems to open a support case. Our team has better visibility into your clusters config and logs.
When opening a case provide run timings and specific cluster details as well as query explain plans.
Thanks
Pavel
Hey @Denis_Stogniy ,
Thanks for raising this, I can probably shed a bit of light on this, but it also makes sense to open a case if youâd like some deeper analysis.
Regarding the performance on the âfederated collectionâ (i.e. targeting archived and cluster data together), you should expect to see lower performance than connecting directly to your cluster but the degree of the performance impact is based on the type of query and how you optimized the archive.
One example would be a âstreaming queryâ, something like a âfind()â. Weâll start returning data as soon as the underlying storage returns it, so data coming back from the cluster will be immediately returned to you, and then data coming from the archive will be next (most likely). There will be a minor increase in latency as the data has to go from the cluster to the federated endpoint but it should be minimal.
On the other hand, a âblocking queryâ like a âsortâ that requires all relevant data from the cluster and the archive to be brought together is going to be as slow as the slowest tier of storage queried which will most likely be the archive and that can be significantly slower than your cluster.
The last piece to remember is that when you setup Online Archive you select âQuery Fieldsâ. Queries that utilize those fields will have improved performance on the archival data, so a âfindâ on a field that was identified as a query field should perform better than a find on a field that was not identified as a query field.
Iâm the PM for Online Archive and am happy to discuss further if itâs helpful, you can reach me at benjamin.flast@mongodb.com.
Best,
Ben
2 Likes
even without DataLake archive (I just connect to the cluster through the federation) i get the same performance as with DataLake
for example request through the federation 2 sec(without DataLake archive), same directly 100 ms
Hey @Max_Virchenko that is expected behavior. When connecting to your cluster through the federation layer we see a bit of additional latency due to additional network hops and various other steps that occur. We see somewhere between 1 and 2 additional seconds of additional latency for any basic query through data federation, and that can go higher when combining data from multiple clusters.
1 Like
system
(system)
Closed
6
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.