Repository logo
 

Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

dc.contributor.advisorKo, Seok-Bumen_US
dc.contributor.advisorDaku, Brianen_US
dc.contributor.committeeMemberWahid, Khanen_US
dc.contributor.committeeMemberKarki, Rajeshen_US
dc.contributor.committeeMemberIkechukwuka, Oguochaen_US
dc.creatorJin, Xingxingen_US
dc.date.accessioned2013-01-03T22:31:53Z
dc.date.available2013-01-03T22:31:53Z
dc.date.created2012-06en_US
dc.date.issued2012-08-17en_US
dc.date.submittedJune 2012en_US
dc.description.abstractHigh single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, this warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this thesis, the contemporary fixed-size warp design is abandoned and a hybrid warp size (HWS) mechanism is proposed. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. Once a branch divergence occurs, split warps are squeezed according to the proposed algorithm, and warp sizes are downscaled wherever applicable. Based on updated warp sizes, warp schedulers calculate the number of cycles the current warp needs and issue the next warp accordingly. As a result, hybrid warps are pushed into pipelines as soon as possible and more pipeline stages are overlapped. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. This work also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism aimed at improving SIMD utilization by forming new warps out of split warps in real time. The warp forming policy is modified to better tolerate warp conflicts. Also, squeeze operations are added before a warp merges with other warps. The simulation shows that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform for the same set of GPU benchmarks.en_US
dc.identifier.urihttp://hdl.handle.net/10388/ETD-2012-06-527en_US
dc.language.isoengen_US
dc.subjectSIMD, GPU, Warp, Branch Divergenceen_US
dc.titleImproving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanismen_US
dc.type.genreThesisen_US
dc.type.materialtexten_US
thesis.degree.departmentElectrical and Computer Engineeringen_US
thesis.degree.disciplineElectrical Engineeringen_US
thesis.degree.grantorUniversity of Saskatchewanen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Science (M.Sc.)en_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JIN-THESIS.pdf
Size:
1.38 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1005 B
Format:
Plain Text
Description: