The basic types have been introduced in the previous chapter, so we move on with the description of the basic BLAS operations. Almost all operations supported by Boost.uBLAS are available, including element-wise operations on vectors. Thus, consider the Boost.uBLAS documentation as a reference as well.

Elementary Vector and Matrix Operations (BLAS Level 1)

ViennaCL provides all vector-vector operations defined at level 1 of BLAS as well as the most important element-wise operations:

Verbal	Mathematics	ViennaCL Code
swap	$x \leftrightarrow y$	`swap(x,y);`
stretch	$x \leftarrow \alpha x$	`x *= alpha;`
assignment	$y \leftarrow x$	`y = x;`
multiply add	$y \leftarrow \alpha x + y$	`y += alpha * x;`
multiply subtract	$y \leftarrow \alpha x - y$	`y -= alpha * x;`
inner dot product	$\alpha \leftarrow x^{\mathrm{T}} y$	`inner_prod(x,y);`
norm	$\alpha \leftarrow \Vert x \Vert_1$	`alpha = norm_1(x);`
norm	$\alpha \leftarrow \Vert x \Vert_2$	`alpha = norm_2(x);`
$L^\infty$ norm	$\alpha \leftarrow \Vert x \Vert_\infty$	`alpha = norm_inf(x);`
$L^\infty$ norm index	$i \leftarrow \max_i \vert x_i \vert$	`i = index_norm_inf(x);`
plane rotation	$(x,y) \leftarrow (\alpha x + \beta y, -\beta x + \alpha y)$	`plane_rotation(a, b, x, y);`

elementwise product	$y_i \leftarrow x_i \cdot z_i$	`y = element_prod(x,z);`
elementwise division	$y_i \leftarrow x_i \cdot z_i$	`y = element_div(x,z);`
elementwise power	$y_i \leftarrow x_i^{z_i}$	`y = element_pow(x,z);`

elementwise modulus (ints)	$y_i \leftarrow `x_i`$	`y = element_abs(x);`
elementwise modulus (floats)	$y_i \leftarrow `x_i`$	`y = element_fabs(x);`
elementwise acos	$y_i \leftarrow \textrm{acos}(x_i)$	`y = element_acos(x);`
elementwise asin	$y_i \leftarrow \textrm{asin}(x_i)$	`y = element_asin(x);`
elementwise atan	$y_i \leftarrow \textrm{atan}(x_i)$	`y = element_atan(x);`
elementwise ceil	$y_i \leftarrow \lceil x_i \rceil$	`y = element_ceil(x);`
elementwise cos	$y_i \leftarrow \textrm{cos}(x_i)$	`y = element_cos(x);`
elementwise cosh	$y_i \leftarrow \textrm{cosh}(x_i)$	`y = element_cosh(x);`
elementwise exp	$y_i \leftarrow \textrm{exp}(x_i)$	`y = element_exp(x);`
elementwise floor	$y_i \leftarrow \lfloor x_i \rfloor$	`y = element_floor(x);`
elementwise log (base e)	$y_i \leftarrow \textrm{ln}(x_i)$	`y = element_log(x);`
elementwise log (base 10)	$y_i \leftarrow \textrm{log}_{10}(x_i)$	`y = element_log10(x);`
elementwise sin	$y_i \leftarrow \textrm{sin}(x_i)$	`y = element_sin(x);`
elementwise sinh	$y_i \leftarrow \textrm{sinh}(x_i)$	`y = element_sinh(x);`
elementwise sqrt	$y_i \leftarrow \textrm{sqrt}(x_i)$	`y = element_sqrt(x);`
elementwise tan	$y_i \leftarrow \textrm{tan}(x_i)$	`y = element_tan(x);`
elementwise tanh	$y_i \leftarrow \textrm{tanh}(x_i)$	`y = element_tanh(x);`

BLAS level 1 routines mapped to ViennaCL. Note that the free functions reside in namespace viennacl::linalg

The function interface is compatible with Boost.uBLAS, thus allowing quick code migration for Boost.uBLAS users. Element-wise operations and standard operator overloads are available for dense matrices as well. The only dense matrix norm provided is norm_frobenius() for the Frobenius norm.

Note: Mixing operations between objects of different scalar types is not supported. Convert the data manually on the host if needed.

Warning: The operator overloads make extensive use of expression templates. Do not use the C++11 keyword auto for the result type, as this might result in unexpected performance regressions or dangling references.

Matrix-Vector Operations (BLAS Level 2)

The interface for level 2 BLAS functions in ViennaCL is similar to that of Boost.uBLAS:

Note: Mixing operations between objects of different scalar types is not supported. Convert the data manually on the host if needed.

Verbal	Mathematics	ViennaCL Code
matrix vector product	$y \leftarrow A x$	`y = prod(A, x);`
matrix vector product	$y \leftarrow A^\mathrm{T} x$	`y = prod(trans(A), x);`
inplace mv product	$x \leftarrow A x$	`x = prod(A, x);`
inplace mv product	$x \leftarrow A^\mathrm{T} x$	`x = prod(trans(A), x);`

scaled product add	$y \leftarrow \alpha A x + \beta y$	`y = alpha * prod(A, x) + beta * y`
scaled product add	$y \leftarrow \alpha A^{\mathrm T} x + \beta y$	`y = alpha * prod(trans(A), x) + beta * y`

tri. matrix solve	$y \leftarrow A^{-1} x$	`y = solve(A, x, tag);`
tri. matrix solve	$y \leftarrow A^\mathrm{T^{-1}} x$	`y = solve(trans(A), x, tag);`
inplace solve	$x \leftarrow A^{-1} x$	`inplace_solve(A, x, tag);`
inplace solve	$x \leftarrow A^\mathrm{T^{-1}} x$	`inplace_solve(trans(A), x, tag);`

rank 1 update	$A \leftarrow \alpha x y^{\mathrm T} + A$	`A += alpha * outer_prod(x,y);`
symm. rank 1 update	$A \leftarrow \alpha x x^{\mathrm T} + A$	`A += alpha * outer_prod(x,x);`
rank 2 update	$A \leftarrow \alpha (x y^{\mathrm T} + y x^{\mathrm T}) + A$	`A += alpha * outer_prod(x,y);` `A += alpha * outer_prod(y,x);`

BLAS level 2 routines mapped to ViennaCL. Note that the free functions reside in namespace viennacl::linalg.
tag is one out of lower_tag, unit_lower_tag, upper_tag, and unit_upper_tag.

Warning: The operator overloads make extensive use of expression templates. Do not use the C++11 keyword auto for the result type, as this might result in unexpected performance regressions or dangling references.

Matrix-Matrix Operations (BLAS Level 3)

While BLAS levels 1 and 2 are mostly memory-bandwidth-limited, BLAS level 3 is mostly limited by the available computational power of the respective device. Hence, matrix-matrix products regularly show impressive performance gains on mid- to high-end GPUs when compared to a single CPU core (which is apparently not a fair comparison...)

Again, the ViennaCL API is identical to that of Boost.uBLAS and comparisons can be carried out immediately, as is shown in the tutorial located in examples/tutorial/blas3.cpp.

As for performance, ViennaCL yields decent performance gains at BLAS level 3 on mid- to high-end GPUs. However, highest performance is usually obtained only with careful tuning to the respective target device. ViennaCL provides kernels that are device-specific and thus achieve high efficiency, without sacrificing the portability of OpenCL. Best performance is usually obtained with the OpenCL backend. The CUDA backend provides good performance for NVIDIA GPUs. The OpenMP backend, however, currently provides only moderate performance and is not recommended for BLAS level 3 operations (as well as algorithms depending on them).

Note: Mixing operations between objects of different scalar types is not supported. Convert the data manually on the host if needed.

Verbal	Mathematics	ViennaCL
matrix-matrix product	$C \leftarrow A \times B$	`C = prod(A, B);`
matrix-matrix product	$C \leftarrow A \times B^\mathrm{T}$	`C = prod(A, trans(B));`
matrix-matrix product	$C \leftarrow A^\mathrm{T} \times B$	`C = prod(trans(A), B);`
matrix-matrix product	$C \leftarrow A^\mathrm{T} \times B^\mathrm{T}$	`C = prod(trans(A), trans(B));`

tri. matrix solve	$C \leftarrow A^{-1} B$	`C = solve(A, B, tag);`
tri. matrix solve	$C \leftarrow A^\mathrm{T^{-1}} B$	`C = solve(trans(A), B, tag);`
tri. matrix solve	$C \leftarrow A^{-1} B^\mathrm{T}$	`C = solve(A, trans(B), tag);`
tri. matrix solve	$C \leftarrow A^\mathrm{T^{-1}} B^\mathrm{T}$	`C = solve(trans(A), trans(B), tag);`

inplace solve	$B \leftarrow A^{-1} B$	`inplace_solve(A, trans(B), tag);`
inplace solve	$B \leftarrow A^\mathrm{T^{-1}} B$	`inplace_solve(trans(A), x, tag);`
inplace solve	$B \leftarrow A^{-1} B^\mathrm{T}$	`inplace_solve(A, trans(B), tag);`
inplace solve	$B \leftarrow A^\mathrm{T^{-1}} B^\mathrm{T}$	`inplace_solve(trans(A), x, tag);`

BLAS level 3 routines mapped to ViennaCL. Note that the free functions reside in namespace viennacl::linalg

Warning: The operator overloads make extensive use of expression templates. Do not use the C++11 keyword auto for the result type, as this might result in unexpected performance regressions or dangling references.

Sparse Matrix Operations

Sparse Matrix-Vector Products

The most important operation on sparse matrices is the sparse matrix-vector product, available for all sparse matrix types in ViennaCL. An example for compressed_matrix<T> is as follows:

#include "viennacl/vector.hpp"
#include "viennacl/compressed_matrix.hpp"
#include "viennacl/linalg/prod.hpp"
int main() {
  viennacl::compressed_matrix<T> A;
  viennacl::vector<T> x;
  // fill A and x with data here
  viennacl::vector<T> y = viennacl::linalg::prod(A, x);
}

For best performance we recommend compressed_matrix<T>, hyb_matrix<T>, or sliced_ell_matrix<T>. Unfortunately it is not possible to predict the fastest matrix type in general, thus a certain amount of trial-and-error by the library user is required.

Have a look at the ViennaCL Benchmark Section for a performance comparison of sparse matrix-vector products for different storage formats.

Sparse Matrix-Matrix Products

Several graph algorithms or multigrid methods require the multiplication of two sparse matrices. ViennaCL provides an implementation based on the row-merge algorithm developed by Gremse et al. for NVIDIA GPUs [15]. We extended the row-merge algorithm and derived high-performance implemented for CPUs, OpenCL-enabled GPUs from AMD, and INTEL's Xeon Phi. The calling code, however, remains as simple as possible:

#include "viennacl/compressed_matrix.hpp"
#include "viennacl/linalg/prod.hpp"
int main() {
  viennacl::compressed_matrix<T> A;
  viennacl::compressed_matrix<T> B;
  // fill A and B with data here
  viennacl::compressed_matrix<T> C = viennacl::linalg::prod(A, B);
}

Have a look at the ViennaCL Benchmark Section for a performance comparison of sparse matrix-matrix products.

Note: Sparse Matrix times Sparse Matrix products are only available for compressed_matrix<T>. Other sparse matrix formats do not allow for a high-performance implementation.

Row, Column, and Diagonal Extraction

For many algorithms it is of interest to extract a single row or column of a dense matrix, or to access the matrix diagonal. This is provided in the same way as for Boost.uBLAS through the free functions row(), column(), and diag():

// A is a viennacl::matrix<T>
// Extract 5-th row of A, then overwrite with 6-th diagonal:
viennacl::vector<T> r = viennacl::row(A, 4);
r = viennacl::row(A, 5);
// Extract 4-th column of A, then overwrite with second column:
viennacl::vector<T> c = viennacl::column(A, 3);
c = viennacl::column(A, 1);
// Extract diagonal:
viennacl::vector<T> d = viennacl::diag(A);

The function diag can also be used to create a matrix which has the provided vector entries in the off-diagonal:

// Create the matrix
// 0 1 0 0
// 0 0 2 0
// 0 0 0 3
viennacl::vector<float> v(3);
v[0] = 1.0f; v[1] = 2.0f; v[2] = 3.0f;
viennacl::matrix<float> A = viennacl::diag(v, 1);

This is similar to MATLAB's diag() function.

Conversion Operations

Operations such as the vector operation x = y + z; in ViennaCL require that the numeric type (i.e. the first template parameter for the type) is the same. Thus, it is not possible to e.g. mix single-precision floating point vectors with integer vectors within the same operation. However, it is possible to convert vectors and dense matrices by direct assignment. An example code snippet for vectors is as follows:

viennacl::vector<long> x_long(10);
viennacl::vector<int>  x_int = viennacl::scalar_vector<int>(10, 1);
viennacl::vector<float> x_float(10);
x_long = x_int;   // conversion from int to long
x_float = x_long; // conversion from long to float