Pipeline operators in MongoDB ($group, $unwind, $sort, $limit, $skip)

One, $group

1. Basic operation

$groupIt can be used to group documents. For example, I want to group orders by city and count the number of orders in each city:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",count:{$sum:1}}})

We pass the field to be grouped to the _id field of the $group function, and then whenever we find one, we add 1 to the count, so that we can count the number of orders in each city.

2. Arithmetic Operators

Through arithmetic operators, we can sum or average the grouped documents. For example, I want to calculate the total shipping cost of each city order as follows:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",totalFreight:{$sum:"$freight"}}})

Group by address first, and then sum. Some of the query results are posted here, as follows:

{
    "_id" : "HaiKou",
    "totalFreight" : 20.0
}
{
    "_id" : "HangZhou",
    "totalFreight" : 10.0
}

You can also calculate the average freight for each city, as follows:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",avgFreight:{$avg:"$freight"}}})

First group by address, and then calculate the average.

3. Extreme value operators

The extreme value operator is used to obtain the marginal value of the grouped data set, for example, to obtain the most expensive freight for each city, as follows:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",maxFreight:{$max:"$freight"}}})

Query the cheapest shipping cost in each city:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",minFreight:{$min:"$freight"}}})

After grouping by city, get the first freight bill for that city:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",firstFreight:{$first:"$freight"}}})

Get the last freight bill after grouping:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",lastFreight:{$last:"$freight"}}})

Data operator

$addToSetYou can put a certain field after grouping into an array, but the repeated elements will only appear once, and the order of adding the elements to the array is irregular, for example, put the freight of each city after grouping into an array In, as follows:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",freights:{$addToSet:"$freight"}}})

Duplicate freight will not be added.

$push does not impose restrictions on repeated data, and can be added as follows:

db.sang_collect.aggregate({$group:{_id:"$orderAddressL",freights:{$push:"$freight"}}})

Two, $unwind

$unwindUsed to split the document, you can split the values ​​in the document into separate documents, for example, my data is as follows:

{
    "_id" : ObjectId("59f93c8b8523cfae4cf4ba86"),
    "name" : "鲁迅",
    "books" : [ 
        {
            "name" : "呐喊",
            "publisher" : "花城出版社"
        }, 
        {
            "name" : "彷徨",
            "publisher" : "南海出版出"
        }
    ]
}

Use the $unwindcommand to split it into independent documents, as follows:

db.sang_books.aggregate({$unwind:"$books"})

The results of the split are as follows:

{
    "_id" : ObjectId("59f93c8b8523cfae4cf4ba86"),
    "name" : "鲁迅",
    "books" : {
        "name" : "呐喊",
        "publisher" : "花城出版社"
    }
}
{
    "_id" : ObjectId("59f93c8b8523cfae4cf4ba86"),
    "name" : "鲁迅",
    "books" : {
        "name" : "彷徨",
        "publisher" : "南海出版出"
    }
}

Three, other operators: sort/limit/skip

$sortThe operation can sort the documents as follows:

db.sang_collect.aggregate({$sort:{orderAddressL:1}})

The usage is the same as the normal search we introduced earlier. It can be sorted by existing fields, or sorted by renamed fields, as follows:

db.sang_collect.aggregate({$project:{oa:"$orderAddressL"}},{$sort:{oa:-1}})

1 means ascending order, -1 means descending order.

$limitReturn the first n documents in the result, as the first three documents in the returned result:

db.sang_collect.aggregate({$project:{oa:"$orderAddressL"}},{$limit:3})

$skipRepresents skipping the first n documents, for example, skipping the first 5 documents, as follows:

db.sang_collect.aggregate({$project:{oa:"$orderAddressL"}},{$skip:5})

$skipThe efficiency is low, so use it with caution.

Four, summary

At the beginning of the pipeline, as much data as possible is filtered out. This has two advantages:

Indexes are only used when querying directly from the collection. Performing filtering as early as possible can make the index work; after
the filtered data is filtered out, the execution pressure of the subsequent pipelines can also be reduced. In addition, MongoDB does not allow an aggregation operation to occupy too much memory. If an aggregation operation occupies more than 20% of the memory, an error will be reported directly.