A Design Framework for Efficient Distributed Analytics on Structured Big Data

View/ Open
Date
2021-08-10Author
Orensa, Noah
ORCID
0000-0002-9049-2074Type
ThesisDegree Level
MastersMetadata
Show full item recordAbstract
Distributed analytics architectures are often comprised of two elements: a compute engine and a storage system. Conventional distributed storage systems usually store data in the form of files or key-value pairs. This abstraction simplifies how the data is accessed and reasoned about by an application developer. However, the separation of compute and storage systems makes it difficult to optimize costly disk and network operations. By design the storage system is isolated from the workload and its performance requirements such as block co-location and replication. Furthermore, optimizing fine-grained data access requests becomes difficult as the storage layer is hidden away behind such abstractions.
Using a clean slate approach, this thesis proposes a modular distributed analytics system design which is centered around a unified interface for distributed data objects named the DDO. The interface couples key mechanisms that utilize storage, memory, and compute resources. This coupling makes it ideal to optimize data access requests across all memory hierarchy levels, with respect to the workload and its performance requirements. In addition to the DDO, a complementary DDO controller implementation controls the logical view of DDOs, their replication, and distribution across the cluster. A proof-of-concept implementation shows improvement in mean query time by 3-6x on the TPC-H and TPC-DS benchmarks, and more than an order of magnitude improvement in many cases.
Degree
Master of Science (M.Sc.)Department
Computer ScienceProgram
Computer ScienceSupervisor
Makaroff, Dwight J; Eager, Derek LCommittee
Stakhanova, Natalia; Jamali, Nadeem; Berscheid, Brian MCopyright Date
July 2021Subject
Distributed systems
Data warehousing
Reliable and fault-tolerant systems
High performance computing
Software engineering