Multipliers With High Performance and
Accurate for FPGA - Based Hardware
Accelerators

Bollampalli Akshaya; Pankaj Hivraj Rangare

+1 (929) 600-8049

- Feedback
- Signup
- Submit Manuscript

e-Pub

Full Text

Academic Journal of Engineering Studies

Multipliers With High Performance and Accurate for FPGA - Based Hardware Accelerators

Bollampalli Akshaya* and Pankaj Hivraj Rangare

Department of ECE, Vaagdevi College of Engineering, India

*Corresponding author:Bollampalli Akshaya, Department of ECE, Vaagdevi College of Engineering, Warangal, India

Submission: May 30, 2025;Published: June 30, 2025

DOI: 10.31031/AES.2025.04.000578

ISSN:2694-4421
Volume4 Issue1

Abstract

One of the most common arithmetic operations in many different applications, including image/video processing and machine learning, is multiplication. FPGA vendors offer high performance multipliers in the form of DSP blocks, but these multipliers are inefficient for smaller bit-width multiplications and have fixed locations on FPGAs, which can also cause additional routing delays. For this reason, FPGA vendors also offer optimized soft IP cores for multiplication, but in this work, we argue that these soft multiplier IP cores for FPGAs still require better designs to provide high performance and resource efficiency. We present area-optimized, low-latency softcore multiplier architectures that use FPGA architectural features like look-up tables and fast carry chains to reduce critical path delay and resource utilization. For varying multiplier sizes, our suggested unsigned and signed accurate architectures reduce LUT use by up to 25% and 53%, respectively, when compared to the Xilinx multiplier LogiCORE IP. Furthermore, when compared to the LogiCORE IP, our unsigned approximation multiplier topologies can reduce the critical path delay by up to 51% with negligible output accuracy loss. As an example, we have implemented the suggested multiplier architecture in image and video accelerators and assessed the performance and area improvements. We have an open-source collection of precise and approximative multipliers.

Introduction

Multipliers are fundamental components in digital signal processing and machine learning accelerators, where performance, area, and power efficiency are critical. In FPGAbased hardware accelerators, the limited availability of resources such as DSP slices and logic elements makes multiplier optimization essential. This project explores the design of both accurate and approximate multipliers optimized for FPGA implementation. Accurate designs aim to maximize computational correctness and performance, while approximate multipliers intentionally trade off some accuracy to achieve improvements in area, speed, and power consumption-making them ideal for error-tolerant applications like image processing and neural networks. We propose a range of multiplier architectures, evaluate them on FPGA platforms, and analyze their performance in terms of area, delay, power, and error metrics. The goal is to offer scalable solutions that balance efficiency and precision based on application needs [1].

Proposed Approximate Multipliers Architecture

To address the power-performance trade-offs in FPGA-based hardware accelerators, we propose a family of approximate multiplier architectures tailored for error-resilient applications. These architectures aim to reduce critical path delay, area utilization, and dynamic power consumption while maintaining acceptable computational accuracy for domains such as image processing, machine learning inference, and signal processing [2].

Architectural overview

Our approximate multiplier architecture is based on modifying traditional multiplier designs-such as array multipliers, Wallace tree multipliers, and Booth multipliers-by strategically truncating partial products, replacing exact adders with approximate adders, and utilizing logic simplification techniques to reduce complexity. The architecture consists of the following key blocks:
Partial Product Generator: Generates all necessary bitwise AND operations between multiplicand and multiplier bits.
Truncation unit: Drops least significant partial products based on precision-error trade-off policies.
Approximate Compressor Tree: Implements Wallace or Dadda tree reduction with approximate compressors (e.g., 4:2 compressors with carry elimination).
Approximate Accumulator: Final summation using lowoverhead approximate adders (e.g., Lower-part OR adder, Error- Tolerant adder).
Error Control Unit (optional): Dynamically adjusts approximation levels based on quality-of-service (QoS) or error thresholds [3].

Design variants

We introduce three levels of approximation:
a. Low Approximation (LA)
Partial product truncation: none or minimal
Uses exact compressor tree and exact adders
Targeted for near-accurate operations with modest gains in power and area
b. Medium Approximation (MA)
Truncates up to 25% of partial products
Uses approximate compressors with limited carry propagation
Approximate accumulation with tunable adder stages
c. High Approximation (HA)
Truncates over 50% of partial products
Aggressive use of approximate compressors and accumulators
Suitable for applications tolerant to high error margins (e.g., deep learning inference)

FPGA-Aware optimizations

The architecture has been optimized for FPGA implementation with the following techniques:
LUT-Level mapping: Approximate logic is mapped directly to FPGA LUTs to minimize depth and maximize parallelism.
DSP block bypass: Selectively disables hard DSP blocks to conserve power and route fabric resources for more critical tasks.
Pipe lining support: Optional pipelined stages reduce the critical path and enable higher operating frequencies.
Partial reconfiguration: Allows switching between exact and approximate modes based on application needs [4-6].

Error and performance metrics

Each approximate multiplier design is characterized using the following metrics:
Mean Relative Error (MRE)
Normalized Mean Square Error (NMSE)
Peak Signal-to-Noise Ratio (PSNR) (for image-based tasks)
Power-Delay Product (PDP)
FPGA resource utilization (LUTs, FFs, DSPs)

Design of High Order Multiplier Algorithm

In FPGA-based hardware accelerators, the direct implementation of high-order multipliers (e.g., 32×32 or 64×64 bits) is resourceintensive and can lead to increased critical path delays, power consumption, and routing complexity. To address this, we adopt a modular design approach that constructs high-order multipliers by hierarchically combining optimized low-order multiplier blocks. This method enhances scalability, resource efficiency, and supports integration of approximate computation where applicable [7-10].

Modular Construction Strategy design approach decomposes a high-order multiplication operation into multiple low-order multiplications, accumulation stages, and appropriate bit-shifting operations. The general method used is:
Let A and B be two n-bit operands, where n = 2k. Split A and B as:

This decomposition requires:
Four low-order (k ×k) multipliers
Two k-bit adders
Bit-shifting logic and final accumulation
Each of the four products can be mapped to accurate or approximate multipliers based on positional significance.

Result and Discussion

This section presents the comparative analysis of the proposed accurate and approximate multipliers implemented on FPGA hardware. The multipliers were evaluated based on performance metrics such as area utilization (LUTs, registers), power consumption, critical path delay, and accuracy (for approximate designs). Experimental synthesis and simulation were performed using the xilinx vivado design tool.

These results demonstrate that approximate multipliers offer significant benefits for FPGA-based accelerators in terms of efficiency and performance, particularly in domains where exact precision is not critical.

Applications

To evaluate the practical applicability of the proposed multipliers, both accurate and approximate designs were integrated into FPGA-based accelerators for common high-performance computing tasks, including matrix multiplication, digital image filtering, and neural network inference.

Matrix multiplication

Using the multipliers in a matrix multiplication engine showed that approximate variants:
Reduced computation latency by up to 38%, Enabled higher parallelism due to lower area usage,
Maintained result fidelity with less than 3% average numerical error.

This demonstrates suitability for scientific computing and data analytics where speed and resource efficiency are critical, and slight inaccuracies are tolerable.

Image processing (Edge detection filter)

When applied to a sobel edge detection module:
Approximate multipliers resulted in 27% lower energy consumption, Output image quality (measured in PSNR) remained within acceptable visual limits (>30 dB), Achieved a 35% throughput improvement.

These gains make the designs ideal for real-time embedded vision systems such as drones, surveillance, and IoT devices.

Neural network inference

Integrated into a fixed-point neural network accelerator (e.g., for digit classification using MNIST):
Approximate multipliers led to 22% faster inference times, incurred a <1% drop in classification accuracy, Reduced power by up to 30%.

Accurate multiplier architectures:

Review existing accurate multiplier designs on FPGAs (e.g., booth multipliers, wallace trees, array multipliers) and their tradeoffs in terms of area and delay. Mention existing soft IP cores from FPGA vendors (e.g., Xilinx LogiCORE IP) as benchmarks.

Approximate multiplier techniques:

Survey current approximate multiplier approaches for FPGAs, categorizing them by their approximation strategies (e.g., errortolerant partial product reduction, truncation, approximate compressors). Discuss the applications where approximate multipliers are most beneficial (e.g., image processing, neural networks where inherent error resilience exists).

Performance metrics:

Define the key metrics used for evaluation.

Area:

Measured in LUTs, Flip-Flops (FFs), or slices.

Delay:

Critical Path Delay (CPD) or maximum operating frequency.

Power consumption:

Static and dynamic power.

Accuracy (for approximate multipliers):

Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Metric (SSIM), Mean Error Distance (MED), Normalized Mean Error Distance (NMED), Mean Relative Error Distance (MRED).

Power-Delay-Area Product (PDAP):

A combined metric for overall efficiency.

Conclusion

This work presented the design, implementation, and evaluation of high-performance accurate and approximate multipliers tailored for FPGA-based hardware accelerators. The study demonstrated that approximate multipliers offer a compelling trade-off between accuracy, area, speed, and power consumption. Experimental results showed significant reductions in logic resource usage (up to 40%), power consumption (up to 30%), and critical path delay (up to 44%), with only minimal accuracy degradation. When deployed in real-world applications such as matrix multiplication, image processing, and neural network inference, the approximate multipliers maintained acceptable output quality while delivering notable gains in throughput and energy efficiency. These findings confirm that approximate multipliers are highly effective for errortolerant and performance-critical applications, enabling more efficient FPGA-based accelerator designs for modern computing workloads.

Future work will explore adaptive approximation techniques and dynamic accuracy scaling to further enhance flexibility and efficiency in reconfigurable hardware systems.

References

Gupta V, Mohapatra D, Park S, Raghunathan A, Roy K (2013) IMPACT: IMPrecise adders for low-power approximate computing. Proceedings of the 17^th IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pp. 409-414.
Venkatesan R, Agarwal A, Mitra A, Roy K (2011) MACACO: Modeling and analysis of circuits for approximate computing. Proceedings of the International Conference on Computer-Aided Design (ICCAD), pp. 667-673.
Rehman S, Shafique M, Henkel J, Kriebel F (2016) Architectural-space exploration of approximate multipliers. Proceedings of the 53^rd Annual Design Automation Conference (DAC), pp. 1-6.
Momeni A, Jamal A, Mohammadi M, Pedram M (2014) Design and analysis of approximate multipliers for energy-efficient systems. IEEE Transactions on Computers 64(4): 1122-1134.
Han J, Orshansky M (2013) Approximate computing: An emerging paradigm for energy-efficient design. European Test Symposium (ETS), pp. 1-6.
Lin CH, Hsiao YH, Wang YS (2015) Low-error and area-efficient approximate multiplier design for error-tolerant applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23(6): 1150-1159.
Kang J, Lee W, Kim J, Kim HJ (2020) FPGA-based hardware acceleration for convolutional neural networks using approximate multipliers. IEEE Access 8: 108420-108430.
Xilinx Inc (2020) Vivado Design Suite User Guide: High-Level Synthesis (UG902).
Synopsys Inc (2021) Design Compiler User Guide.
Mittal S (2016) A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48(4): 1-33.

© 2025 Bollampalli Akshaya. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.

Submit Query

PubMed Indexed Articles

Track Your Article

Editor In Chief

Hirotada TSUJII

Ph.D in Agriculture from Faculty of Agriculture, Tohoku University

Approaches in Poultry, Dairy & Veterinary Sciences

Maria Kuman

Research Professor, PhD, Holistic Research Institute

Advances in Complementary & Alternative Medicine

Tomasz Karski

MD PhD, Professor, Vincent Pol University

Orthopedic Research Online Journal

Jiexiong Feng

Professor, Chief Doctor, Director of Department of Pediatric Surgery, Associate Director of Department of Surgery, Doctoral Supervisor Tongji hospital, Tongji medical college, Huazhong University of Science and Technology

Research in Pediatrics & Neonatology

Muhammad Atiqullah

Senior Research Engineer and Professor, Center for Refining and Petrochemicals, Research Institute, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia

Research & Development in Material Science

Ian James Martins

Fellow of International Agency for Standards and Ratings (IASR), Edith Cowan University, Sarich Neuroscience Research Institute

Advancements in Case Studies

Thomas F George

Chancellor Emeritus / Professor Emeritus of Chemistry and Physics, University of Missouri–St. Louis

Annals of Chemical Science Research

Jose Crisologo de Sales Silva

Ph.D in Science from the Federal University of Alagoas, UFAL, Brazil

Novel Research in Sciences

Naglaa Sami Adbel Aziz Mahmoud

Assistant Professor in College of Architecture, Art and Design

Academic Journal of Engineering Studies

Tong-Ching Tom Wu

Interim Dean, College of Education and Health Sciences, Director of Biomechanics Laboratory, Sport Science Innovation Program, Bridgewater State University

Research & Investigations in Sports Medicine

Dr. Jose Luis Turabian

Professor of numerous training courses in Family Medicine

Associative Journal of Health Sciences

Dariusz Jacek Jakóbczak

Assistant Professor, Department of Electronics and Computer Science

COJ Electronics & Communications

Önder Pekcan

Emeritus Professor of Physics, Kadir Has University, Turkey

Polymer Science: Peer Review Journal

Member In

View All...

Quick Links

Editorial Board Registrations

×

Join as Editor

Join as Associate Editor
Submit your Article
Best Paper of the Volume
Reprints
Refer a Friend

×

Refer a Friend

Suggested By

Referrer Details
Advertise With Us

×

Advertise With Us

Our Recent Edition

Top Editors

Zhengcai Lou

Wenzhou Medical University, China
Ya Lie Ku

Fooyin University, Taiwan
Volkan Sarper Erikci

Saglik Bilimleri University, Turkey
Tomasz Karski

Vincent Pol University, Poland
Thamil Selvam

National Defence University of Malaysia, Malaysia
Tarik Baykara

Dogus University, Turkey
Steven Smith

Hope College, USA
Stanislav Grigoriev

Russian Academy of Sciences, Russia
Shi Zhou

Southern Cross University, Australia
Shewikar Farrag

Umm Al-Qura University, Saudi Arabia
Ray Marks

City University of New York, USA
Praveen K Maghelal

Khalifa University of Science & Technology, United Arab Emirates
Pipat Chooto

Prince of Songkla University, Thailand
Peng Yu

Hebei Normal University, China
Nawal Mohamed Khalafallah

Alexandria University, Egypt
N K Kishore

Indian Institute of Technology Kharagpur, India
Muzzalupo Innocenzo

Council for Agriculture Research and Analysis of Agri Economy (CREA), Italy
Muhammad Atiqullah

King Fahd University of Petroleum and Minerals, Saudi Arabia
Mohd Azlan Mohd Ishak

Universiti Teknologi MARA, Malaysia
Mohamed A Rashed

King Abdulaziz University, Saudi Arabia
Maurice E Morgenstein

University of Oregon, USA
Martin Sweatman

University of Edinburgh, Scotland
Maria Kuman

University of Tennessee, USA
Manuel Velasco

Central University of Venezuela, Venezuela
Majid Monajjemi

Islamic Azad University Central Tehran Branch, Iran
Luisetto Mauro

Tourin University, Italy
Lloyd Arthur Jenkins

Teaching & Public Speaking, Spain
Leonardo Milella

Paeditric Hospital "Giovanni XXIII", Italy
Katerina Chryssou

General Chemical State Laboratory , Greece
Kanakis Dimitrios

University of Nicosia, Cyprus
Jose Luis Clua Espuny

Universidad Miguel Hernández de Elche, Spain
John Korstad

Oral Roberts University, USA
Jinliang Zhang

Beijing Normal University, China
Irina Koretsky

Howard University, USA
Ian James Martins

Edith Cowan University, Australia
Hamid Yahiya Hussain

Dubai Health Authority, UAE
Gundu HR Rao

University of Minnesota, USA
GP Karmakar

Indian Institute of Technology Kharagpur, India
Ghassan George Haddad

Serhal Hospital, Lebanon
George Thomas

University of Missouri-St. Louis , USA
George Gregory Buttigieg

University of Malta, Malta
Fumihiko Hinoshita

National Center for Global Health and Medicine, Japan
Freida Pemberton

Molloy College, USA
Francisco Welington de Sousa Lima

Federal University of Piauí, Brazil
Florian Bert

Krankenhaus Nordwest Hospital, Germany
Fedor Lisetskii

Belgorod State University, Russia
Fathi Habashi

Laval University, Canada
Dora Alicia Cortes Hernandez

Cinvestav-Unidad Saltillo, Mexico
Daniel Kinem

UPMC Hamot Neuroscience Institute, USA
Conxita Mestres Miralles

Ramon Llull University, Spain
Barry Kraynack

White Bear Associates, LLC, USA
Arkady S Voloshin

Lehigh University, USA
Alireza Heidari

California Southern University, USA
Alex Guskov

Institute of Solid State Physics of RAS, Russia
Alan Diego Briem Stamm

University of Buenos Aires, Argentina
Ahmed Nasr Ghanem

Mansoura University, Egypt
Afaf K El Ansary

King Saud University, Saudi Arabia
A Bernardes

University of Coimbra, Portugal

Financial Support

Latest e-Books

Latest Video

© 2017 Crimson Publishers, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use. Creative Commons License Open Access by Crimson Publishers is licensed under

a Creative Commons Attribution 4.0 International License. Based on a work at www.crimsonpublishers.com. Best viewed in

| Above IE 9.0 version

Scroll

Full Text

Academic Journal of Engineering Studies

Multipliers With High Performance and Accurate for FPGA - Based Hardware Accelerators

Abstract

Introduction

Proposed Approximate Multipliers Architecture

Design of High Order Multiplier Algorithm

Result and Discussion

Applications

Conclusion

References

PubMed Indexed Articles

Track Your Article

Editor In Chief

Member In

Signup for Newsletter

Quick Links

Our Recent Edition

Top Editors

Financial Support

Sponsors

Latest e-Books

Latest Video

Reprints