Hamdaan Khalid | Map Reduce - Parallel Processing At Scale ( Part 1/3 )

Map Reduce - Parallel Processing At Scale ( Part 1/3 )

Map-Reduce is a framework/library/programming model for running parallel data processing in a distributed environment. Originally developed by google in 2004 ( https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf ), Map-Reduce was created to avoid one-off software for distributing complex queries and computations, and provide a more generalized approach to process datasets over commodity servers. Since Map-Reduce's origin other programming models/ frameworks have also come into this space, the most prominent players being Apache Spark (yup, my cat is named after this).

This article is a part of a 3 part series I am very excited to write about. This being the first of the three. This article will give a high level overview, the next will be an implementation of the framework as I have understood based on the paper, and the next will go more into the details such as performance, and fault tolerance.

High Level Overview:

In order to use map reduce the programmer is expected to write their software in 2 phases, a map phase, and a reduce phase. One may loop through maps, and reduces multiple times, but there needs to be these 2 distinct steps to any program.

The Map function follows the given signature: map (k1,v1) → list(k2,v2)

The paper describes the map function in the following words:

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same inter

After looking at a few examples the best way for a map function for me to make sense is the following. Given a filename (k1), and the content of the file (v1) -> produce a new list/set of intermediate key/value pairs (k2, v2), that can then be aggregated/reduced laster on to reach your final output. These intermediate values are written to disk with the key-value pairs being written to N files where N is then used to dictate the number of reduce tasks to run. The files are persisted on a distributed file system accessible to all the nodes in our cluster.

This is initially confusing because not all processing tasks can be broken down in that manner, at least not intuitively. Map Reduce forces you to adopt this sort of mental model to solving your problem, which makes writing the map reduce jobs hard.

The other phase is known as the Reduce Phase.

The Reduce phase follows the given signature: (k2,list(v2)) → list(v2)

The paper describes reduce function in the following words:

The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation.

The reduce function is in my opinion slightly easier to understand. It is a transformation we write that takes one of the intermediate files written and does some sort of further aggregation/reduction on them to produce a final output. Usually the number of reduce tasks dictates the number of final output files.

Under the Hood:

Map-Reduce was created to run on a cluster of commodity servers, that shared access to a distributed file system. On top of the cluster we would have an abstraction where the user can define certain configuration, such as directories, and file naming, and kick off a map reduce job. A user could also look into the status of the job. The Paper gives an example in C++ on the last page that makes this more clear.

On a cluster of 100 servers running the software will have one special server called the "Master" and 99 servers called workers. It is the job of the Master to distribute tasks to complete to workers. A worker asks for a task, executes it if it receives one, and inform of the master upon completion. The Master is responsible for maintaining the state of which map tasks are to be completed, starting usually with the one for each input file. The master also maintains the state for which reduce tasks need to be finished, and which are running. Outside of distributing tasks, a master will also keep a track for worker failures, and check whether a worker has completed the task it was given. Workers can execute maps or reduces, as a matter of fact, the same worker may execute a map task and follow that up with a reduce task if the master asks it to do so. Masters maintain the order for executing the map tasks before workers can start consuming reduce tasks.

The above figure is pulled from the paper and really helps in clarifying the execution cycle.

In my implementation, the master is being implemented as web server with 3 endpoints:

Upon initialization it loads up files from a defined directory and creates corresponding Map tasks that have to be executed, and reduce tasks that have to be executed. This handled by services and repositories that persist state between requests. For now these will be persisted in memory.
GET /status: it retrieves the status of a job that has been kicked off.
GET /task: Called by workers, this endpoint finds an unfinished task, if any and sends it to a worker to execute.
POST /task: Called by workers, this endpoint also accepts a request body which tells the master server what job was completed.

The workers perform the following infinite loop:

Upon initialization they load in a user defined Map task, and a user defined reduce task.

while true:
  request task from master
  if no task returned:
    sleep 5 seconds
  else 
    if task is map task:
      run map task and write output to intermediate files (reads file name, and content, and passes to user defined func)
    else:
      run reduce task and write output to output files (reads intermediate file content, and runs user defined function)
    inform master for task completion

Where Next?

I finished implementing the master web server and I will be starting on the workers as of this week. The code will be written in Java and be hosted publicly. My next article in this series will be more of a code walk through! Stay tuned :)

Comments

Commenter: Anon

For the love of god fix the formatting 😂

Commenter: Aa

are the map nodes and reduce nodes running different software?

Commenter: Hk

Reply to aa: No, the master and workers run sort of different programs but the workers all run the same program. Each worker can run maps and reduces both.

Map Reduce - Parallel Processing At Scale ( Part 1/3 )

Comments

Add a comment: