[Databases] MongoDB – document-oriented NoSQL storage

MongoDB is an open-source NoSQL database. It is written in C++. Manages collections of documents saved in BSON (Binary javaScript Object Notation, it extends JSON) format. In the article I want to show basic CRUD (Create, Remove, Update, Delete) operations and an example of simple MapReduce job.

Installing MongoDB.

Before we start using MongoDB it has to be installed. Just download it from http://www.mongodb.org/downloads and start mongod – MongoDB daemon. Instructions for Ubuntu operating system:
1. Download Linux xx-bit.
2. Type in console:

sudo mkdir -p /data/db
sudo chown `id -u` /data/db

This is the default catalogue for MongoSB storage.
3. Start daemon:

./mongodb

Daemon should be in bin catalogue of your downloaded file.

CRUD operations

Now login to our newly created database using mongo console:

./mongo

Create database:

> use csmdb;
switched to db csmdb

Magic touch creates database ;).

Create some articles:

> db.articles.insert({type: "computer", text : "very interesting article about computers..."})
> db.articles.insert({type: "car", text : "something about cars?"})
> db.articles.insert({type: "car", text: "I like cars very much! I think I can share with you my passion!"})

You don’t have to declare schema. Since MongoDB is schemaless you can insert into collection whatever you want.

Read all articles:

> db.articles.find()
{ "_id" : ObjectId("4f464146bbb36bc4d1135dd9"), "type" : "computer", "text" : "very interesting article about computers..." }
{ "_id" : ObjectId("4f46414cbbb36bc4d1135dda"), "type" : "car", "text" : "something about cars?" }
{ "_id" : ObjectId("4f464152bbb36bc4d1135ddb"), "type" : "car", "text" : "I like cars very much! I think I can share with you my passion!" }

Read only computer articles:

> db.articles.find({type: "computer"})
{ "_id" : ObjectId("4f464146bbb36bc4d1135dd9"), "type" : "computer", "text" : "very interesting article about computers..." }

Document-style query language instead of SQL.

Update one of the articles:

> db.articles.update({_id: ObjectId("4f464146bbb36bc4d1135dd9")}, {"type" : "computer", "text" : "Short article  about computers..."})
> db.articles.find({_id: ObjectId("4f464146bbb36bc4d1135dd9")})
{ "_id" : ObjectId("4f464146bbb36bc4d1135dd9"), "type" : "computer", "text" : "Short article  about computers..." }

Delete one of the articles:

> db.articles.remove({_id: ObjectId("4f464146bbb36bc4d1135dd9")})

Simple example of MapReduce job

As you see, MongoDB is very simple to use. Let’s do some MapReduce jobs!
For example: you want to check which article type has most words. So, at first, you have to count words in each article and then – sum it up. In MongoDB you can use MapReduce for it:

Firstly, create function map, which counts words in given article:

> map = function () {
...     var result = this.text.split(" ").length;
...     emit(this.type, result);
... }

Very simple JavaScript function. Splits text and emits key-value pair.

Secondly, create function reduce, which sums word counts:

> reduce = function (key, values) {
...     var result = 0;
...     for (var i = 0; i 
              

Key = article type, values = an array of integers with word counts of this types articles. And finally fetch our results:

 > db.articles.mapReduce(map, reduce, {out: {inline: 1}});
{
        "results" : [
                {
                        "_id" : "car",
                        "value" : 17
                },
                {
                        "_id" : "computer",
                        "value" : 5
                }
        ],
        "timeMillis" : 1,
        "counts" : {
                "input" : 3,
                "emit" : 3,
                "reduce" : 1,
                "output" : 2
        },
        "ok" : 1,
}

Results and statistics. Who wants more? mapReduce function gets map and reduce functions as arguments. Third argument is used for managing out – in this example we don’t save result collection – we get it inline.

Unfortunately, MongoDB also has some drawbacks:
- maximum of 2 GB storage for 32-bit system (64-bit systems doesn’t have such limitation),
- maximum of 800 B for index (sum of lengths of index arguments for 1 document).

I’ve been working with MongoDB in my last project. I found it easy to start developing. I liked it mostly for MapReduce engine, index support and schemalessness.

This entry was posted in coding, operating systems, other, unix and tagged , , , , mapreduce, mongodb, . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: