Distributed cluster computing on high-speed switched LANs
In the area of high-performance computing, there is an ongoing technological convergence toward the use of distributed computing on networked workstation clusters. The emergence of high-performance workstations offering high-availability at relatively low-cost combined with recent advances in high-speed switched networks is motivating this change. As a result, a new computing paradigm centering around the use of distributed clusters of workstations (and/or PCs) interconnected with low-latency, high-bandwidth networks (like ATM and switched Fast or Gigabit Ethernet) is becoming a commonplace high-performance computing infrastructure. Parallel programming on these compute clusters using message-passing tools like PVM or MPI has many advantages including superior price-performance, scalability, and very large aggregate processing power and memory capacity. This computational paradigm has the potential to satisfy the computational demands of many large scientific and engineering applications which were historically achieved only with the use of traditional supercomputers or MPP systems. In this thesis, the challenges that have to be met to bring the performance of cluster systems close to the traditional parallel machines are explored. In particular there is a need to benchmark the key metrics of network communication performance (bandwidth and latency) that are crucial for understanding the overall performance of distributed applications. This thesis provides a characterization and systematic analysis of the end-to-end and collective communication performance of three cluster interconnects, viz. switched Ethernet, ATM and switched Fast Ethernet. Further two parallel applications that exhibit significantly different communication and computation patterns, viz. a matrix multiplication algorithm and a large fluid flow simulation problem, were implemented to serve as benchmarks for overall system performance evaluation. The results of the thesis experiments are reported and specific conclusions are drawn from them.