Abstract
We present a C++14 library for performance portability of scientific computing codes across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic, reusable algorithms like convolutions, sorting, prefix sum, reductions, and scan. The memory layout of the data structures is adapted at compile-time using tuples with optional memory mirroring between CPU and GPU. We combine this transparent memory mapping with generic algorithms under two alternative programming interfaces: a CUDA-like kernel interface for multi-core CPUs, Nvidia GPUs, and AMD GPUs, as well as a lambda interface. We validate and benchmark the presented library using micro-benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art.
Users
Please
log in to take part in the discussion (add own reviews or comments).