-
Notifications
You must be signed in to change notification settings - Fork 358
SPL assists MongoDB:Find multiple latest by filter criteria
The data for a collection (named category_time) in the MongoDB database is as follows:
[
{
"_id": 1,
"category": "FIRE",
"time": "2024-05-11T07:11:00Z"
},
{
"_id": 2,
"category": "FIRE",
"time": "2024-05-11T08:11:00Z"
},
{
"_id": 3,
"category": "FIRE",
"time": "2024-05-11T09:11:00Z"
},
{
"_id": 4,
"category": "POLICE",
"time": "2024-05-11T07:22:00Z"
},
{
"_id": 5,
"category": "POLICE",
"time": "2024-05-11T08:22:00Z"
},
{
"_id": 6,
"category": "POLICE",
"time": "2024-05-11T09:22:00Z"
},
{
"_id": 7,
"category": "AMBULANCE",
"time": "2024-05-11T07:33:00Z"
},
{
"_id": 8,
"category": "AMBULANCE",
"time": "2024-05-11T08:33:00Z"
},
{
"_id": 9,
"category": "AMBULANCE",
"time": "2024-05-11T09:33:00Z"
}
]
Now input two collect parameters, corresponding to the category collect and time collect, such as ["FIRE", "AMBULANCE"] and ["2024-05-11T08:15:00Z", "2024-05-11T09:00:00Z"]. Please combine the members of these two collects into multiple sets of parameters, that is, cross product. In this example, there are four sets, each consisting of a category and a time. Then use 4 sets of parameters to traverse and search the document, each time finding the record with category equal to the parameter and time earlier than the parameter but closest.
[
{
"category": "FIRE",
"time": "2024-05-11T08:15:00Z",
"last_entry_on_or_before": {
"_id": 2,
"category": "FIRE",
"time": "2024-05-11T08:11:00Z"
}
},
{
"category": "FIRE",
"time": "2024-05-11T09:00:00Z",
"last_entry_on_or_before": {
"_id": 2,
"category": "FIRE",
"time": "2024-05-11T08:11:00Z"
}
},
{
"category": "AMBULANCE",
"time": "2024-05-11T08:15:00Z",
"last_entry_on_or_before": {
"_id": 7,
"category": "AMBULANCE",
"time": "2024-05-11T07:33:00Z"
}
},
{
"category": "AMBULANCE",
"time": "2024-05-11T09:00:00Z",
"last_entry_on_or_before": {
"_id": 8,
"category": "AMBULANCE",
"time": "2024-05-11T08:33:00Z"
}
}
]
The difficulty of this question lies in querying each set of parameters once and selecting the record with the closest time from each query result. I have tried many methods using MongoDB query, but the results are not correct. One nearly-correct way is as follows:
db.category_time.aggregate([
{
"$match": {
"_id": {
"$exists": false
}
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
"$documents": [
{
"category": //your input category array here
["FIRE",
"AMBULANCE"
],
//your input time array here
"time": [
"2024-05-11T08:15:00Z",
"2024-05-11T09:00:00Z"
]
}
]
},
{
"$unwind": "$category"
},
{
"$unwind": "$time"
}
]
}
},
{
"$lookup": {
"from": "collection",
"localField": "category",
"foreignField": "category",
"let": {
ts: "$time"
},
"pipeline": [
{
"$match": {
$expr: {
$lte: [
"$time",
"$$ts"
]
}
}
},
{
"$sort": {
"time": -1
}
},
{
"$limit": 1
}
],
"as": "last_entry_on_or_before"
}
},
{
"$unwind": "$last_entry_on_or_before"
}
])
SPL provides the maxp function, which makes it easy to find the closest record in records with time less than the parameter:
| A | |
|---|---|
| 1 | =mongo_open@d("mongodb://127.0.0.1:27017/local") |
| 2 | =mongo_shell@d(A1, "{'find':'category_time'}") |
| 3 | =mongo_close(A1) |
| 4 | ["FIRE", "AMBULANCE"] |
| 5 | ["2024-05-11T08:15:00Z", "2024-05-11T09:00:00Z"] |
| 6 | >A5.run(~=datetime(~,"yyyy-MM-dd'T'HH:mm:ss'Z'")) |
| 7 | =A4.conj(A5.new(~:time,A4.~:category)).sort(category,time) |
| 8 | >A2.run('time'=datetime('time',"yyyy-MM-dd'T'HH:mm:ss'Z'")) |
| 9 | =A7.new(category,time,A2.select(category==A7.category && time<=A7.time).maxp(time):last_entry_on_or_before) |
| 10 | =json(A9) |
Question source:https://stackoverflow.com/questions/78529669/mongodb-find-multiple-latest-by-filter-criteria
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code