Abstract
The speed of modern computing systems has improved significantly, thanks to advances in CMOS technology. However, the memory bandwidth of DRAM has not kept pace with these improvements in terms of latency and energy consumption, which is known as the memory wall 1. FPGAs with high-bandwidth memory (HBM) provide significantly improved performance on memory-intensive tasks, such as graph processing and machine learning. By leveraging 3D-stacked DRAM memory on FPGAs, it is possible to realize the Near-Memory Computing (NMC) paradigm, which involves offloading some kernels to be processed close to the memory. While there have been many studies on NMC accelerators, there is no established method for determining which application kernels are suitable for execution near the HBM. To fully realize the potential of FPGA-HBM architectures, it is important to identify offloading candidates without relying on programmers' knowledge. However, this is a non-trivial task due to the complexity of modern applications. To address this issue, we propose a compiler-assisted tool-flow for the automatic selection of kernels to be offloaded.
Users
Please
log in to take part in the discussion (add own reviews or comments).