Exploiting parallelism of irregular problems and performance evaluation on heterogeneous multi-core architectures

Loading...
Thumbnail Image
Date
2012-10-04
Authors
Xu, Meilian
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this thesis, we design, develop and implement parallel algorithms for irregular problems on heterogeneous multi-core architectures. Irregular problems exhibit random and unpredictable memory access patterns, poor spatial locality and input dependent control flow. Heterogeneous multi-core processors vary in: clock frequency, power dissipation, programming model (MIMD vs. SIMD), memory design and computing units, scalar versus vector units. The heterogeneity of the processors makes designing efficient parallel algorithms for irregular problems on heterogeneous multicore processors challenging. Techniques of mapping tasks or data on traditional parallel computers can not be used as is on heterogeneous multi-core processors due to the varying hardware. In an attempt to understand the efficiency of futuristic heterogeneous multi-core architectures on applications we study several computation and bandwidth oriented irregular problems on one heterogeneous multi-core architecture, the IBM Cell Broadband Engine (Cell BE). The Cell BE consists of a general processor and eight specialized processors and addresses vector/data-level parallelism and instruction-level parallelism simultaneously. Through these studies on the Cell BE, we provide some discussions and insight on the performance of the applications on heterogeneous multi-core architectures. Verifying these experimental results require some performance modeling. Due to the diversity of heterogeneous multi-core architectures, theoretical performance models used for homogeneous multi-core architectures do not provide accurate results. Therefore, in this thesis we propose an analytical performance prediction model that considers the multitude architectural features of heterogeneous multi-cores (such as DMA transfers, number of instructions and operations, the processor frequency and DMA bandwidth). We show that the execution time from our prediction model is comparable to the execution time of the experimental results for a complex medical imaging application.
Description
Keywords
Heterogeneous multi-core architectures, irregular problems, IBM Cell Broadband Engine, iterative reconstruction technique, OS-SART (ordered subset simultaneous algebraic reconstruction technique, Microwave Tomography, Performance Prediction Model, FFT
Citation