The available benchmarks are:
- Sparse Matrix-Vector Products: Compares the performance of ViennaCL with CUBLAS and CUSP for a collection of different sparse matrices.
- Sparse Matrix-Matrix Products: Compares the performance of ViennaCL against CUBLAS, CUSP, and INTEL's MKL library.
- Algebraic Multigrid: A performance overview of algebraic multigrid preconditioners for different hardware.
- Parallel ILU: Conventional incomplete LU factorization preconditioners are sequential by nature and thus hard to parallelize. ViennaCL now includes a recently proposed parallel variant of ILU, which maps much better to massively parallel hardware than conventional ILU factorization preconditioners.
- Pipelined Solvers: Compares iterative solvers without preconditioners against implementations in the libraries CUSP, MAGMA, and PARALUTION.
Benchmark results have been obtained on Linux-based machines using ViennaCL 1.7.0. The devices under consideration are: A dual-socket INTEL Xeon system with E5-2670 v3 CPUs, a system equipped with an AMD FirePro W9100, a system with an NVIDIA Tesla K20m, and another system equipped with an INTEL Xeon Phi 7120. The OpenMP backend was used for the INTEL devices, the CUDA backend for the NVIDIA GPU, and the OpenCL backend for the AMD GPU. The performance of the OpenCL backend and the CUDA backend on NVIDIA GPUs is about the same, so the results obtained for NVIDIA GPUs mostly reflect the performance of both the CUDA and the OpenCL backend.