A Run-time Tapered Floating-Point Adder/Subtractor Supporting Vectorization
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this era of widespread embedded computing, energy efficiency has become the new performance criterion; as a result, accelerator-rich multi-processor system-on-chips are widely utilized in embedded computing hardware. Due to abun-dant and inexpensive computational capacity, computationally intensive machine learning applications have gained a lot of traction and are currently being used in a wide range of application domains. Furthermore, there is an increasing trend toward developing hardware accelerators for machine learning applications for embedded edge devices where performance and energy efficiency are critical. Although floating-point operations are frequently used for accuracy in these hardware accelerators, reduced width floating point formats are also used to reduce hardware complexity and thus power consumption while pre-serving accuracy. Mixed-precision DNN, vectorization techniques, and any-precision DNN concepts have also proven to boost performance, energy efficiency, and memory bandwidth. In this paper, we propose the design of a vectorized floating-point adder/subtractor that can handle arbitrary length floating-point formats with varying exponent and mantissa widths. The whole idea of this paper is to bring flexibility to each layer in a DNN model for arithmetic operations; depending on the requirement of computation of each layer, exponent width and the floating-point format are chosen dynamically. In comparison to existing designs in the literature, the proposed design is 1.69� area and 1.61� power-efficient, and it supports true vectorization with no restrictions on exponent and mantissa widths. � 2022 IEEE.
Description
Keywords
Digital Circuits; Floating-point Hard-ware; Hardware Accelerators; Intellectual property (IP); SIMD
Citation
0