MapReduce is a framework designed by Google [1]. It is loosely based on the Map and Reduce programming constructs of functional languages like Lisp [2]. Google’s MapReduce is used in a distributed computing, scalable platform. It was designed for data-intensive applications which need to process huge amounts of data.
Various open source implementations exist. The most popular one is Apache’s Hadoop [3]. Hadoop is a scalable distributed computing platform that includes a file system (HDFS) to store massive data, and a Java MapReduce implementation to process that data.
Other MapReduce implementations exist. QT Concurrent has a MapReduce implementation for multi-core processors.
This module provides an introduction to parallel programming with MapReduce, using QT Concurrent.
Programming with MapReduce
When we decide to solve a programming problem using MapReduce, we provide an implementation for a mapper and for a reducer. N mappers are executed simultaneously, executing the same task for different input data. One or more reducers receive the output generated by the mappers, and apply a reducing phase yielding a final result.
A simple program to count word occurrences in a text corpus is described in [4]:
map(String input_key, String input_value):
// input_key: document name
// input_value: document contents
for each word w in input_value:
EmitIntermediate(w, "1");
reduce(String output_key, Iterator intermediate_values):
// output_key: a word
// output_values: a list of counts
int result = 0;
for each v in intermediate_values:
result += ParseInt(v);
Emit(AsString(result));
MapReduce pseudocode that counts word occurrences in a text corpus.
As shown in the code above, a map function receives an input key and an input value (<key, value>), and generates one or more intermediate output <key,value(s)> pairs. A reduce function receives intermediate keys and values that were the ouput by the mappers, processes them in some way, and generates one or more final key-value(s) pairs (<key,value(s)).
You can find a Hadoop implementation of the word count problem at [5], and a QT concurrent implementation in the QT Concurrent package (path: qtconcurrent/examples/wordcount) which can be checked out with subversion.









