Map-Reduce Using MongoDB

Srasthy Chaudhary
6 min readSep 7, 2021

What is MongoDB?

MongoDB is an open-source document database and leading NoSQL database. MongoDB is written in C++.

MongoDB is a cross-platform, document-oriented database that provides, high performance, high availability, and easy scalability. MongoDB works on the concept of collection and document.

Database

️The database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.

Collection

A collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purposes.

Document

️A document is a set of key-value pairs. Documents have a dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.

_id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you don’t provide then MongoDB provides a unique id for every document. These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next 2 bytes for process id of MongoDB server, and remaining 3 bytes are simple incremental VALUE.

What is the aggression framework?

️Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and perform various operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single-purpose aggregation methods.

Map-Reduce

️An aggregation pipeline provides better performance and usability than a map-reduce operation.

️Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group, $merge, and others.

️For map-reduce operations that require custom functionality, MongoDB provides the $accumulator and $function aggregation operators starting in version 4.4. Use these operators to define custom aggregation expressions in JavaScript.

Step 1: Installing MongoDB

Download from here for WINDOWS-

https://fastdl.mongodb.org/windows/mongodb-windows-x86_64-4.4.6-signed.msi

Step 2: Installing MongoDB tools

Download from here for WINDOWS-

https://fastdl.mongodb.org/tools/db/mongodb-database-tools-windows-x86_64-100.3.1.msi

Step 3: Setting up environment variables path for using MongoDB server and database tools from CLI.

Just CLICK on Edit then choose the NEW option then paste the path and apply then click ok!

Step 4: Using map-reduce function in MongoDB

MapReduce consists of two programs :

I)The mapper program performs filtering and sorting of the data.

II)The reducer program performs summarise operations like counting the number of words. It is used for reducing large volumes of raw data into meaningful aggregated results.

👩‍💻Github link for data-sets used:

https://github.com/srasthychaudhary/MapReduce-using-MongoDB

For doing this task I have taken two small examples :

  1. Persons
  2. Deck Of Cards

Let’s get started…

Example 1: Persons

Step 1: Importing data into the database

>>mongoimport persons.json -d Persons -c peoples --jsonArray

So here , -d : data-base name & -c : collection-name

Also while importing if you got an error :

!! Failed: Error reading separator after document #1: bad JSON Array format -found no opening bracket ‘[‘ in the input source.

To solve this error all you need to do is remove the keyword--jsonArray from the above command that’s it now, it can import because sometimes it happens if you provide a single document to upload it says no need to provide an array.

DB & Collection Created

Step 2: Aggregation Pipeline

>>db.peoples.aggregate([$match:{gender : “male”}},{$group:{_id:{dob:”$dob.age”},males:{$sum:1}}},{$sort:{males:1}}])

Here :

$match: Takes a document that specifies the query conditions according to our search requirement.

$group: Groups input documents by the specified _id expression and for each distinct grouping, outputs a document. The _id field of each output document contains the unique group by value.

$sum : It keeps on counting the document items

$sort :

>>To sort in ascending order use : $sort = 1

>> To sort in descending order use : $sort = -1

Aggregation Pipeline

Therefore, using an aggregation pipeline it divided the database where it classified all the males according to their date of birth.

Step 3: Mapper and Reducer Program

###Map Function###

>>var mapFunc2=function(){var peeps=emit(this.gender,this.age);$split:[peeps,”,”];};

Here,

this: Refers to the document that the map-reduce operation is processing.

$split: Divides a string into an array of substrings based on a delimiter.

var:Refers to defining a variable in the function.

###Reduce Function###

var reduceFunc2=function(keyGender,valuesAge){return valuesAge.length;};

###MapReduce Function###

db.collection-name.mapReduce(mapFunc2,reduceFunc2,{out:”map_reduced”})

Map-Reduce Program

Hence, using this program we map-reduced the database in the form of the key: value form and classified them in males and females according to their age.

Example 2: Deck Of Cards

Deck Of Cards

Step 1: Importing data into the database

>>mongoimport cards.json -d deck_of_cards -c cards--jsonArray

Data Imported in DB

Step 2: Aggregation Pipeline

>>db.cards.aggregate([{$match:{value:{$gte:1}}},{$group:{ _id:”$value”,cards:{$push:”$suit”}}},{$sort:{“_id”:1}}])

Aggregation Pipeline

Here:

$match: Scans from the whole document whose value is $gte i.e greater than equal to 1.

$group: It retrieves the location of that particular value then $push: returns an array of all values that result from applying an expression to each document in a group of documents that share the same group by key.

$sort: Sorts _id in ascending order.

Therefore, using pipeline it categorized all the cards according to their values.

Tip: If you also face the opening and closing of braces, my suggestion will be to use either notepad++ or VS Code.

Let’s have a look at how the processing of data is done behind the scene using the below figure :

Step 3: Mapper & Reducer Program

###Map function###

var mapFunctn2 = function() {
var cards=emit(this.suit, this.value);
$split:[cards,”,”];
};

###Reduce Function###

var reduceFunctn2 = function(keySuit, valuesValue) {
return Array.sum(valuesValue);
valuesValue.length;
};

###Map-Reduce Function###

db.cards.mapReduce(
mapFunctn2,
reduceFunctn2,
{ out: “map_example” }
)

###Query your search from db named map_example

db.map_example.find().sort( { } )

Hence, it divided the database into 4 parts according to their values & shapes i.e Jack(J), Queen(Q), Ace(A), J(Joker)&Numbers.

Thank you

--

--