University of SaskatchewanHARVEST
  • Login
  • Submit Your Work
  • About
    • About HARVEST
    • Guidelines
    • Browse
      • All of HARVEST
      • Communities & Collections
      • By Issue Date
      • Authors
      • Titles
      • Subjects
      • This Collection
      • By Issue Date
      • Authors
      • Titles
      • Subjects
    • My Account
      • Login
      JavaScript is disabled for your browser. Some features of this site may not work without it.
      View Item 
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item
      • HARVEST
      • Electronic Theses and Dissertations
      • Graduate Theses and Dissertations
      • View Item

      Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

      Thumbnail
      View/Open
      JIN-THESIS.pdf (1.382Mb)
      Date
      2012-08-17
      Author
      Jin, Xingxing
      Type
      Thesis
      Degree Level
      Masters
      Metadata
      Show full item record
      Abstract
      High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, this warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this thesis, the contemporary fixed-size warp design is abandoned and a hybrid warp size (HWS) mechanism is proposed. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. Once a branch divergence occurs, split warps are squeezed according to the proposed algorithm, and warp sizes are downscaled wherever applicable. Based on updated warp sizes, warp schedulers calculate the number of cycles the current warp needs and issue the next warp accordingly. As a result, hybrid warps are pushed into pipelines as soon as possible and more pipeline stages are overlapped. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. This work also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism aimed at improving SIMD utilization by forming new warps out of split warps in real time. The warp forming policy is modified to better tolerate warp conflicts. Also, squeeze operations are added before a warp merges with other warps. The simulation shows that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform for the same set of GPU benchmarks.
      Degree
      Master of Science (M.Sc.)
      Department
      Electrical and Computer Engineering
      Program
      Electrical Engineering
      Supervisor
      Ko, Seok-Bum; Daku, Brian
      Committee
      Wahid, Khan; Karki, Rajesh; Ikechukwuka, Oguocha
      Copyright Date
      June 2012
      URI
      http://hdl.handle.net/10388/ETD-2012-06-527
      Subject
      SIMD, GPU, Warp, Branch Divergence
      Collections
      • Graduate Theses and Dissertations
      University of Saskatchewan

      University Library

      © University of Saskatchewan
      Contact Us | Disclaimer | Privacy