Record Details

An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations

Electronic Theses of Indian Institute of Science

View Archive Info
 
 
Field Value
 
Title An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations
 
Creator Pananilath, Irshad Muhammed
 
Subject Lattice-Boltzmann Computations
Computational Fluid Dynamics
Tiling Stencil Computations
Single Instruction Multiple Data (SIMD)
Parallel Computers
Parallel Processing
Loop Transformations
Lattice-Boltzman Method (LBM)
Lattice Boltzman Method
Lattice-Boltzmann Equation
Computer Science
 
Description Lattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures.
Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required.
Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model.
Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark.
 
Contributor Bondhugula, Uday
 
Date 2018-03-09T06:54:29Z
2018-03-09T06:54:29Z
2018-03-09
2014
 
Type Thesis
 
Identifier http://hdl.handle.net/2005/3259
http://etd.ncsi.iisc.ernet.in/abstracts/4120/G26635-Abs.pdf
 
Language en_US
 
Relation G26635