ABSTRACT
Hadoop is a large scale distributed processing infrastructure designed to handle data intensive applications. In a
commercial large scale cluster framework, a scheduler distributes user jobs evenly among the cluster resources. The
proposed work enhances Hadoop’s fair scheduler that queues the jobs for execution in a fine grained manner using task
scheduling. In contrast, the proposed approach allows backfilling ofjobs submitted to the scheduler. Thus job level and task
level scheduling is enabled by this approach. The jobs are fairly scheduled with fairness among users, pools and priority.
The outcome of the proposed work is that short narrow jobs will be executed in the slot if sufficient resource is not available
for larger jobs. Thus shorter jobs get executed faster by the scheduler when compared to the existing fair scheduling policy
that schedules tasks based on their fairness of remaining execution time. This approach prevents the starvation of smaller
jobs if sufficient resources are available.
Keywords: hadoop, scheduling, fair share scheduler, backfilling