Are you using MongoDB Atlas? Or, self-hosted? sounds like you’re trying to manage data retention on different replicas differently, which is atypical… can be challenging… MongoDB replication typically keeps all data in sync across all members of a replica set… so to have one replica with a larger retention requires a lil creativity.
Here’s one way to approach this… assuming you’re working with MongoDB on-prem.
Use a Hidden Secondary
MongoDB lets you use hidden secondaries or replicas. A hidden secondary can have the same data as the primary but doesn’t participate in elections or serve read operations unless explicitly queried.
Here’s how you might set it up…
Create a Hidden Secondary
- Add a new secondary to the replica set, if you haven’t already.
rs.add({
host: "<your-hidden-replica-host>",
priority: 0,
hidden: true
})
hidden: true ensures that the secondary remains hidden and won’t become a primary.
3. Disable TTL Index on Hidden Secondary
By default, TTL indexes will be applied to all nodes. To avoid TTL purging old data on your hidden secondary, you need to disable the TTL process on this node.
- Connect to the hidden secondary and run:
db.adminCommand({setParameter: 1, ttlMonitorEnabled: false})
This will ensure your hidden replica retains all the data and is not affected by the TTL deletion that will happen on the other nodes.
4. Set Up TTL on Primary
To automatically delete older documents from your primary (and other replicas except the hidden one), you can use a TTL index. Let’s assume you have a createdAt field in your documents that records when they were inserted.
Create a TTL index like this:
db.collection.createIndex({ "createdAt": 1 }, { expireAfterSeconds: 7889238 }) // Roughly 3 months in seconds
5. Verify Your Setup
Once the TTL index is in place on your primary and the hidden secondary has TTL disabled, you’ll have:
- The primary and its regular secondaries retaining 3 months of data (thanks to TTL).
- The hidden secondary retaining the full year’s data (since TTL is disabled on it).
Another way to approach this … is with MongoDB Atlas Online Archive…
Here’s how you can leverage MongoDB Atlas Online Archive for your scenario:
1. Use Online Archive to Offload Older Data
Instead of keeping older data on a separate replica, you can set up Atlas Online Archive to move data older than 3 months from your primary database into an archive. This archived data is still queryable—so, while it’ll be a bit slower, you won’t have to worry about losing access or the ability to query the archived data.
Here’s how this might work:
- The primary and replica nodes would retain the last 3 months of data.
- Older data (older than 3 months but less than a year, in your case) would automatically move to the Online Archive, which is backed by cloud storage.
- This reduces the size of your main database and keeps performance high without losing access to the old data.
Query Archived Data Seamlessly
When querying, you can still access archived data. MongoDB Atlas will automatically merge results from your operational database and your archive, so you can query both data sets as if they’re still in the same collection.
This means:
- You won’t have to worry about managing a separate hidden replica for one-year-old data.
- Queries can fetch both current (3-month) and archived (older) data seamlessly.
Set Up Online Archive in Atlas
- Define the Archive Policy: You can create an archive rule based on a field like
createdAt (or any other field that makes sense for your data). You’ll specify that any document older than 3 months should be archived.
In Atlas:
- Go to **Data Services** -> **Online Archive**.
- Choose the collection you want to archive data from.
- Define the archive rule. For example, "archive any documents older than 3 months based on the `createdAt` field."
-
Set Storage Duration: You can configure how long the data should be stored in the archive before it’s deleted entirely (if you want). In your case, you could keep the data for up to 1 year before purging it.
-
Query Your Data: MongoDB handles querying archived data transparently… you’ll get a connection string… When you run a query, it will automatically pull from both the active database and the archive, so you don’t need to change how you access the data.
4. Benefits of Using Online Archive
- Cost-Effective: Archiving data to cloud object storage (like S3) is cheaper than keeping it in your high-performance operational DB.
- Performance Boost: Your primary replica set stays lean and performs well with a smaller working data set (3 months).
- Scalability: You don’t need to manage additional infrastructure like hidden secondaries.
- Compliance & Retention: You can easily meet data retention policies without cluttering your primary database.
Hope this helps… let us know how you make out.