Big Data and Hadoop: MapReduce Job Execution

Job Submition > Job Initialization > Task Assignment > Task Execution

JobSubmitter is started
1. Output Dir exist or not
2. Input file exist or not
3. It will create a JobT racker Id
4. It will create a directory with the name Job Tracker Id and this is dir is created on job tracker machince.
5. It will upload the job related jar/xml file on to this directory.
6. It will create multiple copies of job.jar files and place in different locations of HDFS along with shared directory 10
7. It will compute the no.of splits on the blocks that is submitted by Namenode

Job Initialization
1. Job is nothing but a Map Task and a Reduce Task
2. JT is creating 4 tasks
3. 4 Tasks :
4. No. Of MapTasks= No.Of splits
5. No. Of ReduceTasks= configurable
6. Setup Task
7. Cleanup Task
8. The jobs are placed in Job Queue.
9. This jobs are consumed by Schedulers
10. setNumReduceTask(3)
11. FIFO scheduler, Fair Scheduler, Capacity Scheduler

Task Assignment
1. Now the Scheduler pick a Job and assign to job tracker
2. Now the JT have to which TTs are available.
3. TT responsibility to tell JT that i am available based on HeartBeat.
4. In HeartBeat the TT will send one more info i.e whether it is ready to take new task or not.
5. Based on this info the JT will assign a new task to the tasktracker.
6. The TT will know that a new task is assigned based on the return value of HeartBeat.

Task Execution
7. Setup Task: It will create a tmp dir local to the TT
it will pull the jar files from the Shared directory of JobTracker
8. Setup Task create OutputCommiters
9.Based on OutputCOmmiter the Map Task will know after executing map task where should my output to go
9. Reduce task will use OutputCommiter to check where my output should go.
10. Map Task start execution and produce output and the output consumed by Outputt Commiter.
11. Reduce Task start execution and produce output and the output consumed by Outputt Commiter.
12. Cleanup task executes and removes tmp directory, any local data is also removed.

Task Execution:

MapTask/Reducer Task : Unzip the jars and copy the files into Memory. It will start executing Map Task Reduce Task, it even use local files for reading(shared directory JT)

JT will pull the progess info from the Map Task/Reduce Task this info is fed to the Namenode from which the NN will gives this information to client.
JT will copy this progress info in shared directory

Big Data and Hadoop

Friday, January 29, 2016

MapReduce Job Execution

No comments:

Post a Comment