Record Details

FPGA based high performance double-precision matrix multiplication

DSpace at IIT Bombay


Field	Value

Title	FPGA based high performance double-precision matrix multiplication

Creator	KUMAR, VBY JOSHI, S PATKAR, SB NARAYANAN, H

Description	We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, an important kernel in many tile-based BLAS algorithms, optimized for implementation on high-end FPGAs. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from I to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for dcsign-I.

Publisher	IEEE COMPUTER SOC

Date	2011-10-25T04:33:56Z 2011-12-15T09:11:26Z 2011-10-25T04:33:56Z 2011-12-15T09:11:26Z 2009

Type	Proceedings Paper

Identifier	22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS,341-346 978-0-7695-3506-7 1063-9667 http://dx.doi.org/10.1109/VLSI.Design.2009.13 http://dspace.library.iitb.ac.in/xmlui/handle/10054/15603 http://hdl.handle.net/100/2071

Source	22nd International Conference on VLSI Design held with 8th International Conference on Embedded Systems,New Delhi, INDIA,JAN 05-09, 2009

Language	English

ICAR Research Data Repository for Knowledge Management

Record Details

FPGA based high performance double-precision matrix multiplication

DSpace at IIT Bombay