Hi,
I have duplicate records in collection invoices. I want to perform following steps on the collection:
- Identify duplicate records based on a key which I am able to find out:
db.invoices.aggregate([{ $group: { _id: "$request.doc_number", count: { $sum: 1 } } }, { $match: { count: { $gt: 50 } } }] )
- Insert duplicate records in new collection:
I’m facing issues with this as they query is inserting only ObjectIds and not full documents.
// Define the name of the new collection
const newCollectionName = "DuplicateInvoices";
// Step 1: Find duplicate doc_number groups
const duplicates = db.Invoices.aggregate([
{
$group: {
_id: "$request.doc_number",
count: { $sum: 1 },
docs: { $push: "$_id" } // Collect all document IDs for each doc_number
}
},
{
$match: {
count: { $gt: 1 } // Include only groups with more than 1 occurrence
}
}
]).toArray();
// Step 2: Insert duplicates into the new collection
duplicates.forEach(group => {
// Remove one document ID from the list of duplicates to keep in the original collection
const [idToKeep, ...idsToMove] = group.docs;
// Insert the duplicates (excluding one record) into the new collection
db[newCollectionName].insertMany(
idsToMove.map(id => ({ _id: id })) // Insert documents by their IDs
);
});
- I want to delete the duplicate invoices from the original collection except 1 original record.
I am yet to try this.
Appreciate any tips.
Best Regards,
Hi @Mohammad_Yusuf_Ghazi , It looks like you are facing a problem because you are doing a group and after that you need to access your full document.
So, let’s break down your requirements.
- Identify duplicate records based on a key.
- Insert duplicate records into a new collection, ensuring full documents are copied.
- Delete duplicate records from the original collection, keeping one original record.
const duplicates = db.invoices.aggregate([
{
$group: {
_id: "$request.doc_number",
count: { $sum: 1 },
docs: { $push: "$$ROOT" } // Collect all documents for each doc_number
}
},
{
$match: {
count: { $gt: 1 } // Include only groups with more than 1 occurrence
}
}
]).toArray();
At this point we need to push for docs on group stage the full document, you are pushing just the id.
On step 2 we have
// Step 2: Insert duplicates into the new collection
duplicates.forEach(group => {
// Remove one document from the list of duplicates to keep in the original collection
const [docToKeep, ...docsToMove] = group.docs;
// Insert the duplicates (excluding one record) into the new collection
db[newCollectionName].insertMany(docsToMove);
});
At this way, the docsToMove is an array with the full documents, not just the ids.
And finally you can perform the delete query.
I hope that it help you.
Best Regards,
1 Like