FPU generator for design space exploration

Sameh Galal, Ofer Shacham, John S. Brunhaver, Jing Pu, Artem Vassiliev, Mark Horowitz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations


FPUs have been a topic of research for almost a century, leading to thousands of papers and books. Each advance focuses on the virtues of some specific new technique. This paper compares the energy efficiency of both throughput-optimized and latency-sensitive designs, each employing an array of optimization techniques, through a fair "apples to apples" methodology. This comparison required us to build many optimized FP units. We accomplished this by creating a highly parameterized FPgenerator, hierarchically encompassing lower-level generators for summation trees, Booth encoders, adders, etc. Having constructed this generator we quickly relearned a number of low-level issues that are critical and are often the most neglected by papers. By exploring cascade and fused multiply-add architectures across a variety of bit widths, summation trees, booth encoders, pipelining techniques, and pipe depths, we found that for most throughput based designs, a Booth-3 fused multiply-add architecture with a Wallace combining tree is optimal. For latency designs, we found that Booth-2 cascade multiply-add architectures are better. As we describe in the paper, Wallace is not always the optimal combining network due to wire delay and track count, and the precise way the CSA's are connected in the tree can make a larger difference than the type of tree used.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE 21st Symposium on Computer Arithmetic, ARITH 2013
Number of pages10
StatePublished - Aug 13 2013
Externally publishedYes
Event21st Symposium on Computer Arithmetic, ARITH 2013 - Austin, TX, United States
Duration: Apr 7 2013Apr 10 2013

Publication series

NameProceedings - Symposium on Computer Arithmetic


Other21st Symposium on Computer Arithmetic, ARITH 2013
Country/TerritoryUnited States
CityAustin, TX


  • Fused multiply add
  • floating point
  • multipliers
  • power efficiency

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture


Dive into the research topics of 'FPU generator for design space exploration'. Together they form a unique fingerprint.

Cite this