Date post: | 01-Jun-2018 |
Category: |
Documents |
Upload: | iaeme-publication |
View: | 224 times |
Download: | 3 times |
of 12
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
1/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
12
WORK LOAD ANALYSIS SECURITY ASPECTS AND
OPTIMIZATION OF WORKLOAD IN HADOOP
CLUSTERS
Atul U. Patil1, T.I. Bagban
2,B.S.Patil
3, R.U.Patil
4, S.A.Gondil
5
1M.E CSE, ADCET/ Shivaji University, Kolahpur, India
2Asso.Prof. DKTE,Ichalkarnji)/ Shivaji University, Kolahpur, India
3Asso.Prof. PVPIT, Budhgaon)/ Shivaji University, Kolahpur, India
4Asst.Prof. BVCOE, Kolhapur)/ Shivaji University, Kolahpur, India
5Asst.Prof, Bharthi vidhyapit palus, Pune,India.
ABSTRACT
This paper discusses a propose cloud system that mixes On-Demand allocation of resources
with improved utilization, opportunistic provisioning of cycles from idle cloud nodes to alternative
processes .Because for cloud computing to avail all the demanded services to the cloud customers is
extremely troublesome. It's a significant issue to fulfil cloud consumer’s needs. Hence On-Demand
cloud infrastructure exploitation Hadoop configuration with improved C.P.U. utilization and storage
hierarchy improved utilization is projected using Fair4s Job scheduling algorithm. therefore all cloud
nodes that remains idle are all in use and additionally improvement in security challenges andachieves load balancing and quick process of huge information in less quantity of your time and
method all kind of jobs whether or not it\'s massive or little. Here we have a tendency to compare the
GFS read write algorithm and Fair4s job scheduling algorithm for file uploading and file
downloading; and enhance the C.P.U. utilization and storage utilization. Cloud computing moves the
appliance software system and databases to the massive data centres, wherever the management of
the information and services might not be totally trustworthy. thus this security drawback is finding
by encrypting the information using encryption/decryption algorithm and Fair4s Job scheduling
algorithm that solve the problem of utilization of all idle cloud nodes for larger data.
Keywords: C.P.U Utilization, Encryption/decryption algorithm, Fair4s Job scheduling algorithm,
GFS, Storage utilization.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 6, Issue 3, March (2015), pp. 12-23
© IAEME: www.iaeme.com/IJCET.asp
Journal Impact Factor (2015): 8.9958 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
2/12
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
3/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
14
measure helpful for serving to Hadoop operators determine system bottleneck and figure out
solutions for optimizing performance. several previous efforts are accomplished in numerous areas,
together with network systems [06], a cloud infrastructure that mixes on-demand allocation of
resources with expedient provisioning of cycles from idle cloud nodes to different processes by
deploying backfill virtual machines (VMs) [21].A model for securing Map/Reduce computationwithin the cloud. The model uses a language primarily based security approach to enforce data flow
policies that vary dynamically because of a restricted revocable delegation of access rights between
principals. The decentralized label model (DLM) is employed to specific these policies[18].A new
security design, Split Clouds, that protects the data hold on in a cloud, whereas the architecture lets
every organization hold direct security controls to their data, rather than exploit them to cloud
providers. The main of the model includes of time period data summaries, in line security gateway
and third party auditor. By the mix of the 3 solutions, the design can prevent malicious activities
performed even by the safety administrators within the cloud providers [20].Several studies [19],
[20], [21] have been conducted for workload analysis in grid environments and parallel computer
systems.
They proposed various methods for analysing and modelling workload traces. However, the job characteristics and scheduling policies in grid are much different from the ones in a Hadoop
system.
III. THE PROPOSED SYSTEM
Fig.1 System Architecture
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
4/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
15
Cloud computing has become a viable, thought resolution for processing, storage and
distribution, however moving massive amounts of knowledge in associated out of the cloud
presented an insurmountable challenge[4].Cloud computing is a very undefeated paradigm of service
destined computing and has revolutionized the means computing infrastructure is abstracted and
used. Three most well-liked cloud paradigms include:
1. Infrastructure as a Service (IaaS)
2. Platform as a Service (PaaS)
3. Software as a Service (SaaS)
The thought can even be extended to info as a Service or Storage as a Service. Scalable
database management system (DBMS) each for update intensive application workloads, in addition
as decision support systems square measure important a part of the cloud infrastructure. Initial styles
embody distributed databases for update intensive workloads and parallel database systems for
analytical workloads. Changes in information access patterns of application and therefore the have to
be compelled to scale intent on thousands of commodity machines led to birth of a replacementcategory of systems referred to as Key-Value stores[11].In the domain of data analysis, we propose
the Map Reduce paradigm and its open-source implementation Hadoop, in terms of usability and
performance.
The System has six modules:
1. Hadoop Configuration( Cloud Server Setup)
2. Login & Registration
3. Cloud Service Provider(CSP)
4. Fair4s Job Scheduling Algorithm
5. Encryption/decryption module
6.
Administration files(Third Party Auditor)
3.1 Hadoop Configuration (Cloud Server Setup)
The Apache Hadoop is a framework that permits for the decentralized process of huge data
sets across clusters of computers using straightforward programming models. it's designed to
proportion from single servers to several thousand nodes, providing massive computation and
storage capacity, instead of think about underlying hardware to give large availability, the
infrastructure itself is intended to handle failures at the application layer, thus delivering a most
available service on prime of a cluster of nodes, every of which can be vulnerable to failures [6].
Hadoop implements Map reduce, using the HDFS. The Hadoop Distributed File System allows users
to possess one available namespace, unfold across several lots of or thousands of servers, making
one massive file system. Hadoop has been incontestable on clusters with more than two thousandnodes. The present style target is ten thousand node clusters.
Hadoop was inspired by MapReduce, framework during which associate application is de-
escalated into varied tiny parts. Any of those parts (also referred to as fragments or blocks) may be
run on any node within the cluster. The present Hadoop system consists of the Hadoop architecture,
Map-Reduce, the Hadoop distributed file system (HDFS).
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
5/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
16
Fig.2 Architecture of hadoop
JobTracker is that the daemon service for submitting and following MapReduce jobs in
Hadoop. There’s just one Job tracker method run on any hadoop cluster. Job tracker runs on its own
JVM process. In an exceedingly typical production cluster its run on a separate machine. Every slave
node is designed with job tracker node location. The JobTracker is single purpose of failure for the
Hadoop MapReduce service. If it goes down, all running jobs are halted. JobTracker in Hadoop
performs; scheduling applications submit jobs to the task trackers. [9].A TaskTracker is a slave node daemon within the cluster that accepts tasks (Map, reduce and
Shuffle operations) from a JobTracker. There’s just one Task tracker method run on any hadoop
slave node. Task tracker runs on its own JVM method. Each TaskTracker is designed with a group of
slots, these indicate the amount of tasks that it will settle for. The TaskTracker starts a separate JVM
methods to try and do the particular work (called as Task Instance) this is often to confirm that
process failure doesn't take down the task tracker [10].
Namenode stores the entire system namespace. Information like last modified time, created
time, file size, owner, permissions etc. are stored in Namenode [10].The current Apache Hadoop
ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS).
The Hadoop Distributed File System (HDFS)
HDFS is a fault tolerant and self-healing distributed filing system designed to point out acluster of business normal servers into a massively scalable pool of storage. Developed specifically
for large-scale process workloads where quality, flexibility and turnout square measure necessary,
HDFS accepts data in any format despite schema, optimizes for prime system of measurement
streaming, and scales to tried deployments of 100PB and on the way side [8].
3.2 Login and Registration
It offer Interface to Login. Client will upload the file and download file from cloud and
obtain the detailed summery of his account. During this means security is provided to the consumer
by providing consumer user name and password and stores it in info at the most server that ensures
the safety. Any information uploaded and downloaded, log record has every activity which may be
used for more audit trails. With this facility, it ensures enough security to consumer and informationhold on at the cloud servers solely may be changed by the consumer.
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
6/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
17
3.3 Cloud Service Provider (Administrator)
It is administration of user and information. Cloud service supplier has an authority to feature
and take away clients. It ensures enough security on client’s information hold on at the cloud servers.
Conjointly the log records of every registered and authorize consumer on cloud solely will access the
services. This specific consumer log record is helps in improve security.
3.4 Job Scheduling AlgorithmMap-Reduce is a distributed processing model and an implementation for process and
generating giant datasets that's amenable to a broad style of real-time tasks. Clients specify the
workload computation in terms of a map and a reduce operate additionally Users specify a map
operate that processes a key/value combine to come up with a collection of intermediate key/value
pairs, and a reduce operate that merges all intermediate values related to an equivalent intermediate
key. Programs written during this purposeful style area unit
Automatically parallelized and executed on an oversized cluster of commodity machines. The
run-time system takes care of the main points of partitioning the computer file, scheduling the
program's execution across a collection of machines, handling machine failures, and managing thedesired inter-machine communication. This enables programmers with none expertise with parallel
and distributed systems to simply utilize the resources of an oversized distributed system [7].
Our implementation of Fair4s Job scheduling algorithm runs on an oversized cluster of
commodity machines and is very scalable. Map-Reduce is Popularized by open-source Hadoop
project. Our Fair4s Job scheduling algorithm works on process of enormous files by dividing them
on variety of chunks and assignment the tasks to the cluster nodes in hadoop multimode
configuration. In these ways in which our planned Fair4s Job programming algorithm improves the
utilization of the Cluster nodes with parameters like time, CPU, and storage.
3.4.1 Features of Fair4sExtended functionalities available in Fair4s scheduling algorithm create it workload efficient
than GFS read write algorithm square measure listed out below these functionalities permits
algorithm to provides out efficient performance in process huge work load from totally different
clients.
1. Setting Slots Quota for Pools- All jobs are divided into many pools. Every job belongs to at least
one of those pools. Whereas in Fair4S, every pool is designed with a maximum slot occupancy. All
jobs belonging to a uniform pool share the slots quota, and also the range of slots employed by these
jobs at a time is restricted to the utmost slots occupancy of their pool. The slot occupancy higher
limit of user teams makes the slots assignment a lot of versatile and adjustable, and ensures the slots
occupancy isolation across totally different user teams. Though some slots are occupied by some
giant jobs, the influence is barely restricted to the native pool within.2. Setting Slot Quota for Individual Users-In Fair4S, every user is designed with a most slots
occupance. Given a user, regardless of what number jobs he/she submits, the entire range of
occupied slots won't exceed the quota. This constraint on individual user avoids that a user submit
too many roles and these jobs occupy too several slots.
3. Assigning Slots based on Pool Weight- Fair4S, every pool is designed with a weight. All pools
that look ahead to a lot of slots type a queue of pools. Given a pool, the prevalence times within the
queue is linear to the burden of the pool. Therefore, a pool with a high weight are allotted with a lot
of slots. Because the pool weight is configurable, the pool weight-based slot assignment policy
decreases small jobs’ waiting time (for slots) effectively.
4. Extending Job Priorities- Fair4S introduces an in depth and quantified priority for every job. The
task priority is described by associate degree integral range ranged from zero to a thousand.Generally, at intervals a pool, a job with a better priority will preempt the slots used by another job
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
7/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
18
with a lower priority. A quantified job priority contributes to differentiate the priorities of small jobs
in numerous user-groups. Programming Model
3.4.2 Fair4s Job Scheduling Algorithm
A job scheduling algorithm, Fair4S, which is modeled to be biased for small jobs. In varietyof workloads Small jobs account for the majority of the workload, and lots of them require instant
responses, which is an important factor at production Hadoop systems. The inefficiency of Hadoop
fair scheduler and GFS read write algorithm for handling small jobs motivates us to use and analyze
Fair4S, which introduces pool weights and extends job priorities to guarantee the rapid responses for
small jobs [1] In this scenario clients is going to upload or download file from the main server where
the Fair4s Job Scheduling Algorithm going to execute. On main server the mapper function will
provide the list of available cluster I/P addresses to which tasks are get assigned so that the task of
files splitting get assigned to each live clusters. Fair4s Job Scheduling Algorithm splits file according
to size and the available cluster nodes.
3.4.3 Procedure of Slots Allocation
1. The primary step is to allot slots to job pools. Every job pool is organized with two parameters of
maximum slots quota and pool weight. In any case, the count of slots allotted to a job pool wouldn't
exceed its most slots quota. If slots demand for one job pool varies, the utmost slots quota is
manually adjusted by Hadoop operators. If a job pool requests additional slots, the scheduler first
judges whether or not the slots occupance of the pool can exceed the quota. If not, the pool are
appended with the queue and wait for slot allocation. The scheduler allocates the slots by round-
robin algorithm. Probabilistically, a pool with high allocation weight are additional likely to be
allotted with slots.
2. The second step is to allot slots to individual jobs. Every job is organized with a parameter of job
priority that may be a worth between zero and a thousand. The duty priority and deficit are removedand mixed into a weight of the duty. Inside employment pool, idle slots are allotted to the roles with
the highest weight.
3.5 Encryption/decryption
In this, file get encrypted/decrypted by exploitation the RSA encryption/decryption algorithm
encryption/decryption algorithm uses public key & private key for the encryption and
decipherment of data. Consumer transfer the file in conjunction with some secrete/public key so
private key's generated & file get encrypted. At the reverse method by using the public
key/private key pair file get decrypted and downloaded. Like client upload the file with the public
key and also the file name that is used to come up with the distinctive private key's used for
encrypting the file. During this approach uploaded file get encrypted and store at main servers and sothis file get splitted by using the Fair4s Scheduling algorithm that provides distinctive security
feature for cloud data. In an exceedingly reverse method of downloading the data from cloud servers,
file name and public key wont to generate secrete and combines The all parts of file so data get
decrypted and downloaded that ensures the tremendous quantity of security to cloud information.
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
8/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
19
Fig.3 RSA encryption/decryption
3.6 Administration of client files(Third Party Auditor)
This module provides facility for auditing all client files, as numerous activities are done by
client. Files Log records and got created and hold on Main Server. for every registered client Log
record is get created that records the varied activities like that operations (upload/download)
performed by client. Additionally Log records keep track of your time and date at that varied
activities carried out by client. For the security and security of the client data and conjointly for the
auditing functions the Log records helps. Additionally for the Administrator Log record facility is
provided that records the Log info of all the registered clients. In order that Administrator will
control over the all the info hold on Cloud servers. Administrator will see client wise Log records
that helps us to notice the fraud information access if any fake user attempt to access the info hold on
Cloud servers.Registered Client Log records:
Fig.4 List of Log records of clients.
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
9/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
20
IV. RESULTS
Our results of the project will be explained well with the help of project work done on
number of clients and one main server and then three to five secondary servers so then we have get
these results bases on three parameters taken into consideration like1) Time
2) CPU Utilization
3) Storage Utilization.
Our evaluation examines the improved utilization of Cluster nodes i.e. Secondary servers by
uploading and downloading files by using Fair4s scheduling algorithm versus GFS read write
algorithm from three perspectives. First is improved time utilization and second is improved CPU
utilization also the storage utilization also get improved tremendously.
4.1 Results for time utilization
Fig.5 Time Utilization Graph For Uploading Files
Fig. 5 shows time utilization for GFS and Fair4s algorithm for uploading files.
These are:
Uploading File Size(in Kb) Time (in milisec) for GFS Time (in milisec) for Fair4s
1742936 1720 107
4734113 928 170
6938669 1473 117
11527296 1857 704
3057917 253 38
17385800 1859 839
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
10/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
21
Fig.06 Time Utilization Graph for Download Files
Fig. 06 shows time utilization for GFS and Fair4s for downloading files.
These are:
Number of Files Time (in milisec) for GFS Time (in milisec) for Fair4s
5 840 795
7 1937 1852
9 4814 3698
11 5143 4111
4.2 Results for CPU utilization
Fig.07 CPU Utilizationon Graph for GFS Files
Fig.08 describes the CPU utilization for GFS files on number of cluster nodes.
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
11/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
22
Fig.08 Describes CPU utilization graph on Fair4s Algorithm on number of Cluster nodes in Hadoop.
V. CONCLUSION
We have proposed improved cloud architecture that mixes On-Demand schedulingof
infrastructure resources with optimized utilization, opportunistic provisioning of cycles from idle
nodes to different processes. A cloud infrastructure using Hadoop configuration with improved
processor utilization and storage space utilization is proposed using Fair4s Job scheduling algorithm.
Hence all unutilized nodes that remains idle are all get utilised and mostly improvement in security
problems and achieves load balancing and quick process of huge data in less amount of your time.
We tend to compare the GFS read write algorithm and fair4s map reduce algorithm for file uploading
and file downloading; and optimizes the processor utilization and storage space use. During this
paper, we tend to additionally plan a number of the techniques that area unit implemented to guarddata and propose design to protect data in cloud. This model was proposed to store data in cloud in
encrypted information using RSA technique that relies on encryption and decryption of data. Till
currently in several planned works, there's Hadoop configuration for cloud infrastructure. However
still the cloud nodes remains idle. Hence no such work on C.P.U. utilization for GFS read write
algorithm versus fair4s scheduling algorithm and storage utilization for GFS read write algorithm
versus fair4s algorithm, done.
We give the backfill problem solution using an on-demand user workload on cloud structure
using hadoop. We tend to contribute to an increase of the processor utilization and time utilization
between GFS and Fair4s. In our work additionally all cloud nodes area unit get fully utilised , no any
cloud stay idle, additionally processing of file get at faster rate so tasks get processed at less quantity
of your time that is additionally a big advantage hence improve utilization. We tend to additionally
implement RSA algorithm to secure the data, hence improve security.
VI. REFERENCES
1. ZujieRen, Jian Wan“Workload Analysis, Implications, and Optimization on a Production
Hadoop Cluster:A Case Study on Taobao”,CO IEEE TRANSACTIONS ON SERVICES
COMPUTING, VOL. 7, NO. 2, APRIL-JUNE 2014.
2. M. Zaharia, D. Borthakur, J.S. Sarma, S. Shenker, and I. Stoica, ‘‘Job Scheduling for Multi-
User Mapreduce Clusters,’’ (Univ.California, Berkeley, CA, USA, Tech. Rep. No.
UCB/EECS-2009-55, Apr. 2009).
8/9/2019 Workload Analysis Security Aspects and Optimization of Workload in Hadoop Clusters
12/12
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 6, Issue 3, March (2015), pp. 12-23 © IAEME
23
3. Y. Chen, S. Alspaugh, and R.H. Katz, ‘‘Interactive Analytical Processing in Big Data
Systems: A Cross-Industry Study of Mapreduce Workloads,’’ Proc. VLDB Endowment, vol.
5, no. 12, Aug. 2012
4. Divyakant Agrawal et al., “Big Data and Cloud Computing: Current State and Future
Opportunities”, EDBT, pp 22-24, March 2011.5. Z. Ren, X. Xu, J. Wan, W. Shi, and M. Zhou, ‘‘Workload Characterization on a Production
Hadoop Cluster: A Case Study on Taobao,’’ in Proc. IEEE IISWC, 2012, pp. 3-13.
6. Jeffrey Dean et al., “MapReduce: simplified data processing on large clusters”,
communications of the acm, Vol S1, No. 1, pp.107-113, 2008 January.
7. Y. Chen, S. Alspaugh, D. Borthakur, and R.H. Katz, ‘‘Energy Efficiency for Large-Scale
Mapreduce Workloads with Significant Interactive Analysis,’’ in Proc. EuroSys, 2012, pp. 43
56.
8. Stackoverflow(2014,07,14).“HadoopArchitecture Internals: use of job and task
trackers”[English].Available:http://stackoverflow.com/questions/11263187/hadoop
architecture-internals-use-of-job-and-task-trackers
9.
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, ‘‘An Analysis of Traces from aProduction Mapreduce Cluster,’’ in Proc. CCGRID, 2010, pp. 94-103.
10. J. Dean et al.,“MapReduce: a flexible data processing tool”,In CACM, Jan 2010.
11. M. Stonebraker et al., “MapReduce and parallel DBMSs: friends or foes?” In CACM. Jan
2010.
12. X. Liu, J. Han, Y. Zhong, C. Han, and X. He, ‘‘Implementing WebGIS on Hadoop: A Case
Study of Improving Small File I/O Performance on HDFS,’’ in Proc. CLUSTER, 2009, pp. 1-
8.
13. A. Abouzeid et al., “HadoopDB: An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical Workloads”, In VLDB 2009.
14.
S. Das et al., “Ricardo: Integrating R and Hadoop”, In SIGMOD 2010.15.
J. Cohen et al.,“MAD Skills: New Analysis Practices for Big Data”, In VLDB, 2009.
16. Gaizhen Yang et al., “The Application of SaaS-Based Cloud Computing in the University
Research and Teaching Platform”, ISIE, pp. 210-213, 2011.
17. Paul Marshall et al., “Improving Utilization of Infrastructure Clouds”,IEEE/ACM
International Symposium, pp. 205-2014, 2011.
18. F. Wang, Q. Xin, B. Hong, S.A. Brandt, E.L. Miller, D.D.E. Long, and T.T. Mclarty, ‘‘File
System Workload Analysis for Large Scale Scientific Computing Applications,’’ in Proc.
MSST, 2004,
19. ]pp. 139-152.[23] M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, andI.
Stoica, ‘‘Delay Scheduling: A Simple Technique for AchievingLocality and Fairness in
Cluster Scheduling,’’ in Proc. EuroSys, 2010, pp. 265-278.20. E. Medernach, ‘‘Workload Analysis of a Cluster in a Grid Environment,’’ in Proc. Job
Scheduling Strategies Parallel Process. 2005, pp. 36-61
21. K. Christodoulopoulos, V. Gkamas, and E.A. Varvarigos, ‘‘Statistical Analysis and Modeling
of Jobs in a Grid Environment,’’ J. Grid Computing, vol. 6, no. 1, 2008.
22. Gandhali Upadhye and Astt. Prof. Trupti Dange, “Nephele: Efficient Data Processing Using
Hadoop” International journal of Computer Engineering & Technology (IJCET), Volume 5,
Issue 7, 2014, pp. 11 - 16, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
23. Suhas V. Ambade and Prof. Priya Deshpande, “Hadoop Block Placement Policy For
Different File Formats” International journal of Computer Engineering & Technology
(IJCET), Volume 5, Issue 12, 2014, pp. 249 - 256, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.