Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

Jin, Xingxing

Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

dc.contributor.advisor	Ko, Seok-Bum	en_US
dc.contributor.advisor	Daku, Brian	en_US
dc.contributor.committeeMember	Wahid, Khan	en_US
dc.contributor.committeeMember	Karki, Rajesh	en_US
dc.contributor.committeeMember	Ikechukwuka, Oguocha	en_US
dc.creator	Jin, Xingxing	en_US
dc.date.accessioned	2013-01-03T22:31:53Z
dc.date.available	2013-01-03T22:31:53Z
dc.date.created	2012-06	en_US
dc.date.issued	2012-08-17	en_US
dc.date.submitted	June 2012	en_US
dc.description.abstract	High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, this warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this thesis, the contemporary fixed-size warp design is abandoned and a hybrid warp size (HWS) mechanism is proposed. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. Once a branch divergence occurs, split warps are squeezed according to the proposed algorithm, and warp sizes are downscaled wherever applicable. Based on updated warp sizes, warp schedulers calculate the number of cycles the current warp needs and issue the next warp accordingly. As a result, hybrid warps are pushed into pipelines as soon as possible and more pipeline stages are overlapped. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. This work also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism aimed at improving SIMD utilization by forming new warps out of split warps in real time. The warp forming policy is modified to better tolerate warp conflicts. Also, squeeze operations are added before a warp merges with other warps. The simulation shows that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform for the same set of GPU benchmarks.	en_US
dc.identifier.uri	http://hdl.handle.net/10388/ETD-2012-06-527	en_US
dc.language.iso	eng	en_US
dc.subject	SIMD, GPU, Warp, Branch Divergence	en_US
dc.title	Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism	en_US
dc.type.genre	Thesis	en_US
dc.type.material	text	en_US
thesis.degree.department	Electrical and Computer Engineering	en_US
thesis.degree.discipline	Electrical Engineering	en_US
thesis.degree.grantor	University of Saskatchewan	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	Master of Science (M.Sc.)	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JIN-THESIS.pdf
Size:: 1.38 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1005 B
Format:: Plain Text
Description:

Download

Collections

Graduate Theses and Dissertations