Pipelined Iterative Solvers
ViennaCL provides pipelined implementations with kernel fusion of the CG, BiCGStab, and GMRES iterative solvers. The benefit is improved performance for smaller matrices, because less communication across the PCI-Express bus is necessary than for conventional implementations.
Performance vs. Problem Size
In the following we present some results for finite-element discretizations of the Poisson equation on the unit square for unstructured triangular meshes. Other libraries in the comparison include recent versions of CUSP, MAGMA, and PARALUTION. PETSc is used as a baseline for an MPI-based solution on the CPU on a shared memory machine, illustrating that CPUs are the best pick for small system sizes.
Performance for Large Systems
The finite element example in the previous section only reflects one particular sparse matrix pattern. In the following we compare the solver performance per iteration for a selection of sparse matrices from the Florida Sparse Matrix Collection. These matrices are large enough to hide any kernel launch latencies and demonstrate that the pipelined iterative solvers with kernel fusion in ViennaCL are also very competitive for large problem sizes.