Inproceedings,

Full-Stack Optimization for CAM-Only DNN Inference

J. de Lima, A. Khan, L. Carro, and J. Castrillon.
Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), page 1-6. IEEE, (March 2024)

Abstract

The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von-Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. This is because, even in CIM systems, data movement and processing still require considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing the arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data transfers within the memory while addressing accuracy, energy efficiency, and reliability concerns. Concretely, our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators while retaining software accuracy

BibTeX key: delima_date24
entry type: inproceedings
booktitle: Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE)
year: 2024
month: mar
pages: 1-6
publisher: IEEE
series: DATE'24
location: Valencia, Spain
url: https://ieeexplore.ieee.org/document/10546805

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{delima_date24, abstract = {The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von-Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. This is because, even in CIM systems, data movement and processing still require considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing the arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data transfers within the memory while addressing accuracy, energy efficiency, and reliability concerns. Concretely, our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators while retaining software accuracy}, added-at = {2025-01-02T10:32:49.000+0100}, author = {de Lima, João Paulo C and Khan, Asif Ali and Carro, Luigi and Castrillon, Jeronimo}, biburl = {https://puma.scadsai.uni-leipzig.de/bibtex/2fae45901f64615d1a2cd198c6bcc60f0/joca354e}, booktitle = {Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE)}, editor = {IEEE}, interhash = {f614d8cd066cf1bc525bc852b4b4f03e}, intrahash = {fae45901f64615d1a2cd198c6bcc60f0}, keywords = {NN accelerator computing-in-memory myown}, location = {Valencia, Spain}, month = mar, pages = {1-6}, publisher = {IEEE}, series = {DATE'24}, timestamp = {2025-01-02T10:38:29.000+0100}, title = {Full-Stack Optimization for CAM-Only DNN Inference}, url = {https://ieeexplore.ieee.org/document/10546805}, year = 2024 }

PUMA

Full-Stack Optimization for CAM-Only DNN Inference

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on