University of SaskatchewanHARVEST
  • Login
  • Submit Your Work
  • About
    • About HARVEST
    • Guidelines
    • Browse
      • All of HARVEST
      • Communities & Collections
      • By Issue Date
      • Authors
      • Titles
      • Subjects
      • This Collection
      • By Issue Date
      • Authors
      • Titles
      • Subjects
    • My Account
      • Login
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      View Item 
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item

      A Design Framework for Efficient Distributed Analytics on Structured Big Data

      Thumbnail
      View/Open
      ORENSA-THESIS-2021.pdf (622.5Kb)
      Date
      2021-08-10
      Author
      Orensa, Noah
      ORCID
      0000-0002-9049-2074
      Type
      Thesis
      Degree Level
      Masters
      Metadata
      Show full item record
      Abstract
      Distributed analytics architectures are often comprised of two elements: a compute engine and a storage system. Conventional distributed storage systems usually store data in the form of files or key-value pairs. This abstraction simplifies how the data is accessed and reasoned about by an application developer. However, the separation of compute and storage systems makes it difficult to optimize costly disk and network operations. By design the storage system is isolated from the workload and its performance requirements such as block co-location and replication. Furthermore, optimizing fine-grained data access requests becomes difficult as the storage layer is hidden away behind such abstractions. Using a clean slate approach, this thesis proposes a modular distributed analytics system design which is centered around a unified interface for distributed data objects named the DDO. The interface couples key mechanisms that utilize storage, memory, and compute resources. This coupling makes it ideal to optimize data access requests across all memory hierarchy levels, with respect to the workload and its performance requirements. In addition to the DDO, a complementary DDO controller implementation controls the logical view of DDOs, their replication, and distribution across the cluster. A proof-of-concept implementation shows improvement in mean query time by 3-6x on the TPC-H and TPC-DS benchmarks, and more than an order of magnitude improvement in many cases.
      Degree
      Master of Science (M.Sc.)
      Department
      Computer Science
      Program
      Computer Science
      Supervisor
      Makaroff, Dwight J; Eager, Derek L
      Committee
      Stakhanova, Natalia; Jamali, Nadeem; Berscheid, Brian M
      Copyright Date
      July 2021
      URI
      https://hdl.handle.net/10388/13511
      Subject
      Distributed systems
      Data warehousing
      Reliable and fault-tolerant systems
      High performance computing
      Software engineering
      Collections
      • Graduate Theses and Dissertations
      University of Saskatchewan

      University Library

      © University of Saskatchewan
      Contact Us | Disclaimer | Privacy