Hi there, I’m investigating the behaviour of change streams and I got across a question I’d need some help about.
I’work in Golang, so I will refer to this language but think that conceptually it doesn’t make any difference.
In doing a watch on a collection the events returned have a structure similar to the following (not completed for space reasons):
{
"_id" : {
"_data" : "8267C867A2000000192B042C0100296E5A1004D3284E04D8914467A2160201812E4C4C463C6F7065726174696F6E54797065003C696E736572740046646F63756D656E744B65790046645F6964006467C867A2FB1CBCD579F4E4DE000004"
},
"operationType" : "insert",
"clusterTime" : Timestamp({ t: 1741186978, i: 25 }),
"wallTime" : ISODate("2025-03-05T16:02:58.871+01:00"),
"fullDocument" : {
"_id" : ObjectId("67c867a2fb1cbcd579f4e4de"),
"_bid" : "20250305DIREZ000451900000",
....
"ns" : {
"db" : "rp2nl",
"coll" : "outbox"
},
"documentKey" : {
"_id" : ObjectId("67c867a2fb1cbcd579f4e4de")
}
}
You get an _id._data that is an Hex string of 188bytes. This string, as the name suggest is the id of the oplog entry and is a coded representation of a number of infos. Is an opaque structure that could change between versions but you can find a a snippet - referenced also in Mongo docs - to help investigate this opaque structure. Now. You can use the _id above to use as the value of the ResumeAfter params to restart the watch from the next entry in the oplog. Now, in the Mongo API there exists something called ResumeToken(). This API is described as: ResumeToken returns the last cached resume token for this change stream, or nil if a resume token has not been stored. and in the example at the same link is used exactly for the task. So far so good. This API on time to time (same stream, same config, same all) returns ‘strings’ that are of different type: i.e. 8267C867A3000000012B0429296E1404
. Shorter and, as you can imagine different content. Now, this can be used in the same way as a ResumeAfter param and everything works. Beside other aspects if you decode the longer one and extract the token-type sub field you can find the value 128 whereas if you do the same with the shorter it seems to be 0. This different ugly duck string seems to have some relation with the oplog entry but it’s not clear to me. In a scenario where
- I watch a stream,
- on every event I get the _id._data and save the ResumeToken() Api response together
- Close the stream
- Restart with sort of saved in the past value gotten at point 2
- watch the stream e do the same as Step 2
I find that in the majority of the cases the _id._data is equal to the value returned by the API but in some cases _id._data is long format whereas ResumeApi() returns the short format. Apparently the ResumeToken() API returns something that can be used as a resume but not exactly the _id of the oplog entry.
- In the Step 5 the entries that had the long format (at step 2) keep the same long format _id._data equal to ResumeApi() response,
- whereas events that in step 2 got a short token from ResumeApi() keep getting a short form in step 5 but… with a different value (not always but I spotted a case where it happened).
Now, I get to the question:
- Short form seems to be a sort of resume token of demarkation as to speak
- with some specific use and value.
What is the reason of its existence?
Sincerely yours, appreciate if someone takes time to shed some light on the topic.
Mario