Backup(full,block based incremental and log based incremental backup) of mongodb database using wiredtiger utility

We have sharded replica mongodb database with wiredtiger as storage engine.
Mongodb tools like mongodump and mongorestore cannot be used to backup database (on sharded cluster) as we cannot stop the writes on the database.

We understand we can utilize wiredtiger utility wt for backing up mongodb.

As per wiredtiger document, WiredTiger backups are “on-line” or “hot” backups, and applications may continue to read and write the databases while a snapshot is taken.

Is it possible to perform mongodb backup using wiredtiger utility without stopping writes on database?

what do you mean by this? i don’t think you need to stop writes to use mongodump. There’s even an option to catch oplog entries.

Please find the extracts from mongodb doc,
1)
Sharded Clusters
mongodump and mongorestore cannot be part of a backup strategy for 4.2+ sharded clusters that have sharded transactions in progress, as backups created with mongodump do not maintain the atomicity guarantees of transactions across shards.

2)Lock the Cluster
The sharded cluster must remain locked during the backup process to protect the database from writes, which may cause inconsistencies in the backup.

So above two itself suggests we cannot use mongodump for backups(production).
Is it possible to create application using wiredtiger for performing hot online backups?
https://source.wiredtiger.com/11.2.0/command_line.html#util_backup

if you don’t use cross-shard transaction, it’s ok to use mongodump. I can’t find where the mongodb official doc says writes have to be stopped to use the tool. If you have such reference, pls share a link.

ok i got this link. It says the cluster has to be locked to avoid writes.

It is simply to avoid writes after the backup, so that the backup from all your nodes are consistent. It doesn’t mean you have to lock writes to use that tool. As i showed earlier, you can use oplog option to catch the writes.

If you don’t use cross shard transaction and you can do backup on the shards somehow at the same point, then your data will be somehow consistent. :slight_smile:

To call out, the main purpose of backup is to reduce data loss, not to fully recover from an outage. So the best way to deal with an outage is always to avoid an outage.

What we currently do is to for each shard, we lock a secondary node (primary still serves writes) and do EBS snapshot and unlock, then go to next shard.

So is our backup fully consistent ? of course not. And here’s a link to my comment on an related post: Can mongo sharded cluster be recreated from secondary DC nodes (both config and shard)? - #2 by Kobe_W

What I understand is mongodump is not recommended for production databases. Also it does not support incremental backup .

will wiredtiger utility help here??

Hello recently I was interested in incremental backup in MongoDB and I can share what I found.

You in deed can use wiretiger for backup, but I think it is very complicated and I never found any documentation how to do that.

However i found that you can use LVM2 which is linux tool for creating and managing logical volumes
And if you learn more about LVM2 I think you will be happy.
How it works is simple you just log in mongodb_shell and pause writes with db.fsyncLock(); then create snapshot using lvm2 command in bash linux shell it probably takes just under 1second or so even if you have 10TB storage because it is just awesome :smiley:
then you will have to unlock db.fsyncUnlock(); So now on you can read consisten files from snapshot so it will not change while you make backup to different disk or server or whatever. And you pause server for just 1 second !!! If you want to have additional file containing only incremental changes, you can use RSYNC and DIFF and PATCH tools on linux to find best solutions for you.

Awesome thing about LVM snapshot is that it will not take any space in storage until mongodb change something in original /data path

Hope it was helpfull … i just learned LVM2 it is so awesome tool i just want to share.
God bless.