Show simple item record

dc.contributor.advisorJamali, Nadeem
dc.creatorSedighi Gilani, Mohammad Hossein
dc.date.accessioned2016-05-25T03:11:59Z
dc.date.available2016-05-25T03:11:59Z
dc.date.created2016-04
dc.date.issued2016-05-16
dc.date.submittedApril 2016
dc.identifier.urihttp://hdl.handle.net/10388/ETD-2016-04-2560
dc.description.abstractThe falling cost of cluster computing has significantly increased its use in the last decade. As a result, the number of users, the size of clusters, and the diversity of jobs that are submitted to clusters have grown. These changes lead to a quest for redesigning of clusters' resource management systems. The growth in the number of users and increase in the size of clusters require a more scalable approach to resource management. Moreover, ever-increasing use of clusters for carrying out a diverse range of computations demands fault-tolerant and highly available cluster management systems. Last, but not the least, serving highly parallel and interactive jobs in a cluster with hundreds of nodes, requires high throughput scheduling with a very short service time. This research presents MACRM, a multi-agent cluster resource management system. MACRM is an adaptive distributed/centralized resource management system which addresses the requirements of scalability, fault-tolerance, high availability, and high throughput scheduling. It breaks up resource management responsibilities and delegates it to different agents to be scalable in various aspects. Also, modularity in MACRM's design increases fault-tolerance because components are replicable and recoverable. Furthermore, MACRM has a very short service time in different loads. It can maintain an average service time of less than 15ms by adaptively switching between centralized and distributed decision making based on a cluster's load. Comparing MACRM with representative centralized and distributed systems (YARN [67] and Sparrow [52]) shows several advantages. We show that MACRM scales better when the number of resources, users, or jobs increase in a cluster. As well, MACRM has faster and less expensive failure recovery mechanisms compared with the two other systems. And finally, our experiments show that MACRM's average service time beats the other systems, particularly in high loads.
dc.language.isoeng
dc.subjectCluster computing
dc.subjectResource management
dc.subjectMulti-agent systems
dc.titleMACRM: A Multi-agent Cluster Resource Management System
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)
dc.type.materialtext
dc.type.genreThesis
dc.contributor.committeeMemberRoy, Chanchal
dc.contributor.committeeMemberStanely, Kevin
dc.contributor.committeeMemberGokaraju, Ramakrishna


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record