For fear you might be interested, a short tutorial on using MongoDB.
Note:
This is me blabbing away
> This is a command I typed in the mongo shell (which is a Javascript interpreter)
This is the response in the MongoShell, it basically echoes the value returned by the command
Right, so lets say we are making a simple application to maintain invoices. Obviously this is a web application, and we will have a JSON object that describes our invoice in the web browser and we would post that object over to the application server which would figure out how to store that into the database.
So lets create a sample JSON object to describe an invoice and store it into a variable called invoice.
> invoice = {number: 13214, date: new Date(), lines: [{prod: "Prod A", qty: 1}, {prod: "Prod B", qty: 3}, {prod: "Prod A", qty: 5}]}
{
"number" : 13214,
"date" : ISODate("2019-10-29T21:36:43.052Z"),
"lines" : [
{
"prod" : "Prod A",
"qty" : 1
},
{
"prod" : "Prod B",
"qty" : 3
},
{
"prod" : "Prod A",
"qty" : 5
}
]
}
Now in a SQL world, I'd need to create 2 tables for this, an invoice table, and a line item table. I would need to map the JSON fields to the database columns and insert the invoice, loop over the lines array, and write each of those into the line items table. In case something fails along the way, I'd need to wrap the whole thing into a transaction and either commit or rollback the change. Not a trivial exercise by any means.
Now lets look at what that looks like in MongoDB. Note in Mongo terminology, tables are called collections, rows are called documents. Here, I am connecting to a fresh mongo DB, NOTHING exists yet.
> db.invs.insertOne(invoice)
{
"acknowledged" : true,
"insertedId" : ObjectId("5db8b1435dae31b910b676d4")
}
Voila! We are done. db.invs refers to a collection, the collection does not exist so Mongo created it for me. I did not supply a unique ID, so mongo gave me one. I've stored my invoice and I can retrieve it from the database.
> db.invs.find()
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "number" : 13214, "date" : ISODate("2019-10-29T21:36:43.052Z"), "lines" : [ { "prod" : "Prod A", "qty" : 1 }, { "prod" : "Prod B", "qty" : 3 }, { "prod" : "Prod A", "qty" : 5 } ] }
Life could not be much easier than that now, could it? There are lots of tutorials out there that will show you how to do CRUD on that, what I am interested in is the relationships.
I've only one collection holding invoices (invs) you and I know, that whilst it was a pain in the ass in the SQL world doing the whole object relational mapping crap, having that invoice line items was pretty essential, it is one of the tables we query most often in the database. Should I actually model the invoice and the line items seperately, so that I can work effectively with the line items?
Mongo will let me search on the line items:
> db.invs.find({"lines.prod": "Prod A"})
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "number" : 13214, "date" : ISODate("2019-10-29T21:36:43.052Z"), "lines" : [ { "prod" : "Prod A", "qty" : 1 }, { "prod" : "Prod B", "qty" : 3 }, { "prod" : "Prod A", "qty" : 5 } ] }
But that is returning all of the invoice, it is not just returning the line items for "Prod A", that is not what I am looking for, I just want the line items for Prod A.
Mongo allows us to create indexes, and it can create indexes on the "Embedded Documents", i.e. our line items.
> db.invs.createIndex({"lines.prod": 1, "lines.qty": 1, "number": 1, "date": 1} )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Now under the covers, it has created an index with 3 entries in it, one for each of our line items. I can now speedily query line items relating to a certain product. I've also included a number of superfluous fields in the index, superfluous from a searching / sorting perspective, however because I have included them, if I search on that index, if all of the data I need is contained in the index, it won't even need to go back to the invoice for that data. It's almost like I have a virtual line item table. However, alas, I cannot query it the way I want.
> db.invs.find({"lines.prod": "Prod A"})
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "number" : 13214, "date" : ISODate("2019-10-29T21:36:43.052Z"), "lines" : [ { "prod" : "Prod A", "qty" : 1 }, { "prod" : "Prod B", "qty" : 3 }, { "prod" : "Prod A", "qty" : 5 } ] }
A find is still returning one invoice with three line items in it. What I wanted back as just the 2 line items. Even if I limit my query to only return line item data
> db.invs.find({"lines.prod": "Prod A"},{"lines.prod": 1, "lines.qty": 1, "_id": 0} )
{ "lines" : [ { "prod" : "Prod A", "qty" : 1 }, { "prod" : "Prod B", "qty" : 3 }, { "prod" : "Prod A", "qty" : 5 } ] }
It is still not giving me back the 2 line items that I wanted to see. To get what I want, I have to do a far fancier query to see each of the embedded documents seperately in it's own object.
> db.invs.aggregate({$project:{"lines.prod": 1, "lines.qty": 1} },{$unwind: '$lines'} )
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod A", "qty" : 1 } }
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod B", "qty" : 3 } }
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod A", "qty" : 5 } }
Very cool, so now I can see the line items I wanted, but it is a bit messy, what I would really like is a sort of virtual table with the line items. So I create a "view" called lineItems using the same fancy query.
> db.createView("lineItems", "invs",[ {$project:{"lines.prod": 1, "lines.qty": 1} },{$unwind: '$lines'}] )
{ "ok" : 1 }
And now I can
> db.lineItems.find()
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod A", "qty" : 1 } }
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod B", "qty" : 3 } }
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod A", "qty" : 5 } }
Very cool, that is more like it, now I can also query on just the line items that I want. As I add invoice documents, my virtual line item table will show the new line items for me. I can query and get just the line items for one product.
db.lineItems.find({"lines.prod": "Prod A"})
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod A", "qty" : 1 } }
{ "_id" : ObjectId("5db8b1435dae31b910b676d4"), "lines" : { "prod" : "Prod A", "qty" : 5 } }
Fantastic! So I can do all my analysis stuff, like checking how many of each product I have sold, just as easily as if I had gone through the whole ORM crap to put it into a relational database.
> db.lineItems.aggregate({$group: { _id: "$lines.prod", totalQty: {$sum: "$lines.qty"} } })
{ "_id" : "Prod A", "totalQty" : 6 }
{ "_id" : "Prod B", "totalQty" : 3 }
That seems to me like a way better method of working than sticking with the old SQL world, not to mention if my application goes viral, I know that I can scale out to having thousands, or millions or even a billion records in it.
Ok, maybe it wasn't so short, but hopefully it was vaguely interesting!