Map-Reduce Using MongoDB
What is MongoDB?
MongoDB is an open-source document database and leading NoSQL database. MongoDB is written in C++.
MongoDB is a cross-platform, document-oriented database that provides, high performance, high availability, and easy scalability. MongoDB works on the concept of collection and document.
Database
️The database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.
Collection
A collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purposes.
Document
️A document is a set of key-value pairs. Documents have a dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.
️_id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you don’t provide then MongoDB provides a unique id for every document. These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next 2 bytes for process id of MongoDB server, and remaining 3 bytes are simple incremental VALUE.
What is the aggression framework?
️Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and perform various operations on the grouped data to return a single result. MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single-purpose aggregation methods.
Map-Reduce
️An aggregation pipeline provides better performance and usability than a map-reduce operation.
️Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group
, $merge
, and others.
️For map-reduce operations that require custom functionality, MongoDB provides the $accumulator
and $function
aggregation operators starting in version 4.4. Use these operators to define custom aggregation expressions in JavaScript.
Step 1: Installing MongoDB
Download from here for WINDOWS-
https://fastdl.mongodb.org/windows/mongodb-windows-x86_64-4.4.6-signed.msi
Step 2: Installing MongoDB tools
Download from here for WINDOWS-
https://fastdl.mongodb.org/tools/db/mongodb-database-tools-windows-x86_64-100.3.1.msi
Step 3: Setting up environment variables path for using MongoDB server and database tools from CLI.
Just CLICK on Edit then choose the NEW option then paste the path and apply then click ok!


Step 4: Using map-reduce function in MongoDB
MapReduce consists of two programs :
I)The mapper program performs filtering and sorting of the data.
II)The reducer program performs summarise operations like counting the number of words. It is used for reducing large volumes of raw data into meaningful aggregated results.

👩💻Github link for data-sets used:
https://github.com/srasthychaudhary/MapReduce-using-MongoDB
For doing this task I have taken two small examples :
- Persons
- Deck Of Cards
Let’s get started…
Example 1: Persons

Step 1: Importing data into the database
>>mongoimport persons.json -d Persons -c peoples --jsonArray
So here , -d : data-base name & -c : collection-name
Also while importing if you got an error :
!! Failed: Error reading separator after document #1: bad JSON Array format -found no opening bracket ‘[‘ in the input source.
To solve this error all you need to do is remove the keyword--jsonArray
from the above command that’s it now, it can import because sometimes it happens if you provide a single document to upload it says no need to provide an array.


DB & Collection Created
Step 2: Aggregation Pipeline
>>db.peoples.aggregate([$match:{gender : “male”}},{$group:{_id:{dob:”$dob.age”},males:{$sum:1}}},{$sort:{males:1}}])
Here :
$match
: Takes a document that specifies the query conditions according to our search requirement.
$group
: Groups input documents by the specified _id
expression and for each distinct grouping, outputs a document. The _id
field of each output document contains the unique group by value.
$sum
: It keeps on counting the document items
$sort :
>>To sort in ascending order use : $sort = 1
>> To sort in descending order use : $sort = -1

Aggregation Pipeline
Therefore, using an aggregation pipeline it divided the database where it classified all the males according to their date of birth.
Step 3: Mapper and Reducer Program
###Map Function###
>>var mapFunc2=function(){var peeps=emit(this.gender,this.age);$split:[peeps,”,”];};
Here,
this:
Refers to the document that the map-reduce operation is processing.
$split:
Divides a string into an array of substrings based on a delimiter.
var:
Refers to defining a variable in the function.
###Reduce Function###
var reduceFunc2=function(keyGender,valuesAge){return valuesAge.length;};
###MapReduce Function###
db.collection-name.mapReduce(mapFunc2,reduceFunc2,{out:”map_reduced”})

Map-Reduce Program
Hence, using this program we map-reduced the database in the form of the key: value form and classified them in males and females according to their age.
Example 2: Deck Of Cards

Deck Of Cards
Step 1: Importing data into the database
>>mongoimport cards.json -d deck_of_cards -c cards--jsonArray

Data Imported in DB
Step 2: Aggregation Pipeline
>>db.cards.aggregate([{$match:{value:{$gte:1}}},{$group:{ _id:”$value”,cards:{$push:”$suit”}}},{$sort:{“_id”:1}}])

Aggregation Pipeline
Here:
$match:
Scans from the whole document whose value is $gte
i.e greater than equal to 1.
$group:
It retrieves the location of that particular value then $push:
returns an array of all values that result from applying an expression to each document in a group of documents that share the same group by key.
$sort:
Sorts _id
in ascending order.
Therefore, using pipeline it categorized all the cards according to their values.

Tip: If you also face the opening and closing of braces, my suggestion will be to use either notepad++ or VS Code.
Let’s have a look at how the processing of data is done behind the scene using the below figure :

Step 3: Mapper & Reducer Program
###Map function###
var mapFunctn2 = function() {
var cards=emit(this.suit, this.value);
$split:[cards,”,”];
};
###Reduce Function###
var reduceFunctn2 = function(keySuit, valuesValue) {
return Array.sum(valuesValue);
valuesValue.length;
};
###Map-Reduce Function###
db.cards.mapReduce(
mapFunctn2,
reduceFunctn2,
{ out: “map_example” }
)
###Query your search from db named map_example
db.map_example.find().sort( { } )

Hence, it divided the database into 4 parts according to their values & shapes i.e Jack(J), Queen(Q), Ace(A), J(Joker)&Numbers.
Thank you