FPGA based high performance double-precision matrix multiplication
DSpace at IIT Bombay
View Archive InfoField | Value | |
Title |
FPGA based high performance double-precision matrix multiplication
|
|
Creator |
KUMAR, VBY
JOSHI, S PATKAR, SB NARAYANAN, H |
|
Subject |
high performance computing
matrix multiplication rank-1 scheme fpga implementation memory-bandwidth trade-off scalability |
|
Description |
We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, optimized for implementation on high-end FPGAs. It forms the kernel in many important tile-based BLAS algorithms, making an excellent candidate for acceleration. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I. This compares favourably with both related art and general purpose CPU implementations.
|
|
Publisher |
SPRINGER/PLENUM PUBLISHERS
|
|
Date |
2011-10-22T07:22:36Z
2011-12-15T09:10:48Z 2011-10-22T07:22:36Z 2011-12-15T09:10:48Z 2010 |
|
Type |
Article; Proceedings Paper
|
|
Identifier |
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING,38,322-338
0885-7458 http://dx.doi.org/10.1007/s10766-010-0131-8 http://dspace.library.iitb.ac.in/xmlui/handle/10054/14840 http://hdl.handle.net/100/1667 |
|
Source |
22nd International Conference on VLSI Design/8th International Conference on Embedded Systems,New Delhi, INDIA,JAN, 2009
|
|
Language |
English
|
|