MongoDB Aggregation
Using lookup, group, project and unwind to make important queries,
What's an aggregation channel?
An Aggregation Pipeline is a series of blocks of calculation that you apply one by one to a set of documents.
Each channel stage performs some new calculation or manipulation on the documents to which it's passed and also passes them on to the coming stage.
The stages can find, sludge, join or manipulate the documents and there's a channel driver for just about everything that you want to do.
That being said, I would estimate that 90 of the channeling that we do consists of $match,$project, $lookup, $unwind, and $group.
So, let’s get into it. Match finding and filtering documents
The match does what it says. It passes only the documents that match the query that's used onto the coming stage in the channel. This works the same as the sludge query that you pass to MongoDB’s discovery() system.
There's a whole range of aggregation expressions that can be used to make your$ match more flexible, for exp.
{
$match: {
product-type: {
$in: ['shoe', 'shirt']
}
}
}
Project re-shaping documents
project is a way of re-shaping the documents that you have at a particular stage in the channel. protuberance doesn’t sludge or find any documents, so there will be the same number of documents in the channel ahead and after this stage, but the documents will look different.
You can brand/ remove fields, or produce new advised fields. This is great for simplifying the documents and making sure that you have only the data that you need.
You're also suitable to elect the fields that you do want or the field that you don’t want by using 0 or 1 as the protuberance value. Note, when using 1 to only keep fields, it will always be kept unless you specify else!
{
$project: {
created: 1,
products: 1
}
}
Keeps everything except the created and products fields of the input croakers.
{
$project: {
created: 0,
products: 0
}
}
Lookup costing from different collections
The capability to look up documents in other collections is one of the most important aspects of aggregation.
Let’s say that you have a collection of orders, and you want to see the information about the products relating to each order. Using regular query functions for this is extremely hamstrung because you would need to run a query for every order to get its product information. It's far more effective to get all the information using one aggregation query
{
$lookup: {
** the collection you want to get docs from
from: 'Product',
** The new field in the local docs
** with what the lookup finds (can be anything!)
as: 'products',
** the field on the current collection
** used in the search
local field: 'productIds',
** the field on the collection you're
** searching to match the 'localField'
foreignField: '_id',
}
}
The$ lookup adds a new field( products) containing an array of documents where the specified localField and foreignField are matching.
Unwind breaking out of arrays
The$ unwind driver takes an array field and makes a set of identical documents, one for every element in the array.
Using$ unwind with$ lookup
It's common to see$ lookup used in confluence with the$ unwind channel driver.$ unwind takes an array property on a document and turns it into a new document for every element in the array.
Let’s say that you have a collection of products, and you want to find the Suppliers of the products in the collection from the Supplier collection
Product Documents
{ _id: 1, supplierId: 1 },
{ _id: 2, supplierId: 4 },
{ _id: 3, supplierId: 2 },
You can use $lookup to find the supplier like this:
{
$lookup: {
from: 'Supplier',
as: 'supplier',
localField: 'supplierId',
foreignField: '_id'
}
},
The problem then's that a lookup returns an array of documents
Product documents with$ lookup'd suppliers
{
_id: 1,
supplierId: 1,
supplier: [{
_id: 1, name: 'Rahul'
}]
},
{
_id: 2,
supplierId: 4,
supplier: [{
_id: 4, name: 'Sumit'
}]
},
{
_id: 3,
supplierId: 2,
supplier: [{
_id: 2, name: 'Manish'
}]
},
Because we know that, it is a unique field we also know that the array that's created by the lookup will only ever have one element, so we can unwind the documents.
{ $unwind: '$supplier' },
Note: You need to prefix the field that you want to unwind with a ‘$ ’
This rolls out the arrays and leaves us with what we wanted
Added up product documents with suppliers
{
_id: 1,
supplierId: 1,
supplier: {
_id: 1, name: 'Rahul'
}
},
{
_id: 2,
supplierId: 4,
supplier: {
_id: 4, name: 'Sumit'
}
},
{
_id: 3,
supplierId: 2,
supplier: {
_id: 2, name: 'Manish'
}
},
Group collecting documents into groups
You can use $group to bunch documents together grounded on a field value that's common to all of the documents. It can also be used for useful effects like casting all of the values of a specific field.
{
$group: {
_id: 'productType',
count: { $sum : 1 }
}
}
Sum the retail prices for all documents in the channel
{
$group: {
_id: null,
totalRetailPrice: { $sum: '$retailPrice' }
}
}
NOTE: If you want to group all documents into the channel at the point the group is executed into one document, also you can set the group, id to null
The group only passes along the values that you specify within the stage. In the former exemplifications, the only particulars on the documents after the group stage would be, id and count in the first, or, id and maxReference in the alternate.
If you want to keep the other field on the object you need to decide how the group should deal with them. There are several ways to accumulate the fields together
{
$group: {
_id: 'productType',
** Keep the highest value
maxReference: { $max: '$reference' },
** Make an array of distinct values for material
materials: { $addToSet: '$material' },
** Keep the last value (useful if documents are sorted)
mostRecentlyCreated: { $last: '$created' },
}
}
Conclusion
This is just the launch, with the colorful drivers, expressions, accumulators, and all the other tools that MongoDB provides you can recoup your data on your terms.