Sparse Matrix-Vector Products
Sparse matrix-vector products are the most important operations for sparse matrices. In particular, iterative solvers extensively rely on fast sparse matrix-vector products for good performance, both in the case of unpreconditioned as well as preconditioned solvers.
Let us first compare the performance of sparse matrix products in double precision from ViennaCL in the standard compressed sparse row (CSR) format. The respective sparse matrix type in ViennaCL is compressed_matrix<T>. A comparison using the CUDA backend of ViennaCL with the vendor-tuned cuSPARSE library from the CUDA 7 SDK and the CUSP 0.5.0 library for different sparse matrices from the Florida Sparse Matrix Collection with about one to ten million nonzero entries is as follows:
All benchmarks have been carried out on Linux-based machines equipped with an NVIDIA GeForce GTX 750 Ti and an NVIDIA Tesla K20m, respectively. The upper nine matrices have a more regular pattern, whereas the lower eleven matrices have a more irregular pattern. Overall, ViennaCL's implementation is in the geometric mean 33 percent faster on the GeForce GTX 750 Ti than the one in cuSPARSE and CUSP. Similarly, ViennaCL's implementation on the Tesla K20m is 25 percent faster than cuSPARSE and 32 percent faster than CUSP.
ViennaCL provides several different sparse matrix formats, each of them best suited for a different nonzero pattern. It is generally hard to predict the sparse matrix format with the best performance upfront. Our overall recommendation is to start with a CSR format (compressed_matrix), as this is the most versatile format overall. A performance comparison for sparse matrix-vector products for different sparse matrix types on an AMD FirePro W9100 (Hawaii) and an NVIDIA Tesla K20m is as follows:
The results clearly show that there is no universal answer for sparse matrices: Either of the sparse matrix formats works best for at least one sparse matrix. Also, the AMD GPU work well for some sparse matrices, and the NVIDIA GPU for others. This clearly emphasizes that the multiple backends and different sparse matrix formats in ViennaCL are a powerful tool to achieve the best performance possible given the devices available on the market.