Zusammenfassung
We present a design and implementation of distributed sparse block grids that transparently scale from a single CPU to multi-GPU clusters. We support dynamic sparse grids as, e.g., occur in computer graphics with complex deforming geometries and in multi-resolution numerical simulations. We present the data structures and algorithms of our approach, focusing on the optimizations required to render them computationally efficient on CPUs and GPUs alike. We provide a scalable implementation in the OpenFPM software library for HPC. We benchmark our implementation on up to 16 Nvidia GTX 1080 GPUs and up to 64 Nvidia A100 GPUs showing state-of-the-art scalability (68% to 96% parallel efficiency) on three benchmark problems. On a single GPU, our implementation is 14 to 140-fold faster than on a multi-core CPU.
Nutzer