An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations
Electronic Theses of Indian Institute of Science
View Archive InfoField | Value | |
Title |
An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations
|
|
Creator |
Pananilath, Irshad Muhammed
|
|
Subject |
Lattice-Boltzmann Computations
Computational Fluid Dynamics Tiling Stencil Computations Single Instruction Multiple Data (SIMD) Parallel Computers Parallel Processing Loop Transformations Lattice-Boltzman Method (LBM) Lattice Boltzman Method Lattice-Boltzmann Equation Computer Science |
|
Description |
Lattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark. |
|
Contributor |
Bondhugula, Uday
|
|
Date |
2018-03-09T06:54:29Z
2018-03-09T06:54:29Z 2018-03-09 2014 |
|
Type |
Thesis
|
|
Identifier |
http://hdl.handle.net/2005/3259
http://etd.ncsi.iisc.ernet.in/abstracts/4120/G26635-Abs.pdf |
|
Language |
en_US
|
|
Relation |
G26635
|
|