An in-depth performance analysis of irregular workloads on VLIW APU

dc.contributor.authorDoerksen, Matthew
dc.contributor.examiningcommitteeDomaratzki, Michael (Computer Science) Thomas, Gabriel (Electrical and Computer Engineering)en_US
dc.contributor.supervisorThulasiraman, Parimala (Computer Science)en_US
dc.date.accessioned2014-05-26T13:25:41Z
dc.date.available2014-05-26T13:25:41Z
dc.date.issued2014-05-26
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelMaster of Science (M.Sc.)en_US
dc.description.abstractHeterogeneous multi-core architectures have a higher performance/power ratio than traditional homogeneous architectures. Due to their heterogeneity, these architectures support diverse applications but developing parallel algorithms on these architectures can be difficult. In implementing algorithms for heterogeneous systems, proprietary languages are often required, limiting portability. Although general purpose graphics processing units (GPUs) have shown great promise in accelerating the performance of throughput computing applications, it is still limited by the memory wall. The memory wall can greatly affect application performance for problems that incorporate amorphous parallelism or irregular workload. Now, AMD's Fusion series of Accelerated Processing Units (APUs) attempts to solve this problem. The APU is a radical change from the traditional systems of a few years ago. This design change enables consumers to have a capable CPU connected to a powerful, compute-capable GPU using a Very Long Instruction Word (VLIW) architecture. In this thesis, I address the suitability of irregular workload problems on APU architectures. I consider four scientific computing problems of varying characteristics and map them onto the architectural features of the APU. I develop several software optimizations for each problem by making effective use of VLIW static scheduling through techniques such as loop unrolling and vectorization. Using AMD's OpenCL profiler, I analyze the execution of the various optimizations and provide an in-depth performance analysis using metrics such as kernel occupancy, ALUFetchRatio, ALUBusy Percentage and ALUPacking. Finally, I show the effect of register pressure due to vectorization and the limitations associated with the APU architecture for irregular workloads.en_US
dc.description.noteOctober 2014en_US
dc.identifier.urihttp://hdl.handle.net/1993/23593
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectAPUen_US
dc.subjectOpenCLen_US
dc.subjectirregularen_US
dc.subjectGPUen_US
dc.titleAn in-depth performance analysis of irregular workloads on VLIW APUen_US
dc.typemaster thesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
doerksen_matthew.pdf
Size:
4.65 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.25 KB
Format:
Item-specific license agreed to upon submission
Description: