Design

gRPC framework for master-worker communication is set up by adapting the example https://github.com/grpc/grpc/tree/master/examples/cpp/helloworld
Master communicates with workers asynchronously. Dedicated threads are created to receive responses from each worker
A job id is assigned to each mapper/reducer task. The ID is used to create unique file names as well as to identify slow/dead workers
Number of reducer workers is communicated with mapper workers. This information helps mapper to name intermediate files and to assign a key to a reducer
Number of intermediate files created by a mapper worker is equal to the number of reducer workers. i.e. a mapper worker would have an intermediate file dedicated to each reducer worker
Intermediate file name created by mapper worker comprises of both mapper job id and index of reducer for which the intermediate file is targeted for
RPC message sent to reducer would comprise of completed mapper job IDs. This allows reducer worker to construct intermediate file names
Each key value is assigned to a reducer. A key is associated with a reducer by the relation reducer_index = (sum of ASCII values of character values in the key % number of reducer workers). Mapper worker identifies the appropriate intermediate file for a 'key, value' pair using this relation

Handling slow worker and worker failure

Master maintains a database with job details and tracks the workers assigned to finish each job. The database helps the master to identify free workers to assign the next job to. As per design all the mapper jobs are scheduled first. Then the master waits until all the mapper jobs are completed. Subsequently all the reducer jobs are scheduled and the master waits for their completion. Since map and reduce operations follow job issue and wait sequence, master uses the same approach (given below) to identify slow/failed workers during both phases

Assign all the jobs to next available thread one after the other
Once all the jobs are issued, check if all jobs are completed, in intervals of 100 milliseconds. Continue checking for a maximum duration of WAIT_PERIOD. This is the checkpoint to start re-assigning jobs.
After WAIT_PERIOD, assign each incomplete job to the next available worker using a new job ID. When all the incomplete jobs are re-assigned, check for their completion in intervals of 100 milliseconds for a maximum duration of WAIT_PERIOD
Repeat the above step until all the jobs are completed
WAIT_PERIOD is a configuration parameter, which is currently configured as 5 sec to prevent very long jobs to be mistaken as slow job
Any mapper job completed before the checkpoint by a dead or slow worker is not re-assigned as the intermediate files corresponding to them are available in the file system. Any intermediate files created by a slow mapper after the checkpoint are ignored by the master. Any output file created by a slow reducer after the checkpoint is to be discarded. Master has the information of correct output files and can be shared with any other application as needed

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
README.md		README.md
metadata.yml		metadata.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design

Handling slow worker and worker failure

About

Releases

Packages

Languages

monvin/Map-Reduce-CS-6210-

Folders and files

Latest commit

History

Repository files navigation

Design

Handling slow worker and worker failure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages