-
-
Notifications
You must be signed in to change notification settings - Fork 1k
README_linalg
Linear algebra operations form the backbone for most of the computation components in any Machine Learning library. However, writing all of the required linear algebra operations from scratch is rather redundant and undesired, especially when we have some excellent open source alternatives. In Shogun, we prefer
-
Eigen3
for its speed and simplicity at the usage level, -
ViennaCL
version 1.5 for GPU powered linear algebra operations.
For Shogun maintainers, however, the usage of different external libraries for different operations can lead to a painful task.
- For example, consider some part of an algorithm originally written using
Eigen3
API. But a Shogun user wishes to useViennaCL
for that algorithm instead, hoping to obtain boosted performance utilizing a GPU powered platform. There is no way of doing that without having the algorithm rewritten by the developers usingViennaCL
, which leads to duplication of code and effort. - Also, there is no way to do a performance comparison for the developers while using different external linear algebra libraries for the same algorithm in Shogun code.
- It is also somewhat frustrating for a new developer who has to invest significant amount of time and effort to learn each of these external APIs just to add a new algorithm in Shogun.
Shogun's internal linear algebra library (will be referred as linalg
hereinafter) is a work-in-progress attempt to overcome these issues. We designed linalg
as a modularized internal library in order to
- provide a uniform API for Shogun developers to choose any supported backend without having to worry about the syntactical differences in the external libraries' operations,
- have the backend set for each operations at compile-time (for lesser runtime overhead) and therefore intended to be used internally by Shogun developers,
- allow Shogun developers to add new linear algebra backend plug-ins easily.
Users can switch between linalg
backends via global variable sg_linalg
.
- Shogun uses
Eigen3
backend as default linear algebra backend. - Enabling of GPU backend allows the data transfer between CPU and GPU, as well as the operations on GPU.
ViennaCL
(GPU) backend can be enabled by assigning newViennaCL
backend class tosg_linalg
or canceled by:
sg_linalg->set_gpu_backend(new LinalgBackendViennaCL());
sg_linalg->set_gpu_backend(nullptr);
- Though backends can be extended, only one CPU backend and one GPU backend are allowed to be registered each time.
linalg
library works for both SGVectors
and SGMatrices
. The operations can be called by:
#include <shogun/mathematics/linalg/LinalgNamespace.h>
shogun::linalg::operation(args)
- To use
linalg
operations on GPU data (vectors or matrices) and transfer data between GPU, one can callto_gpu
andfrom_gpu
methods. Users should pre-allocate memory to complete the transfer.
// Pre-allocate SGVectors a, b, and c
SGVector<int32_t> a(size), b, c;
// Initialize SGVector a
// SGVectors and SGMatrices are initialized on CPU by default
a.range_fill(0);
// Copy values from CPU SGVector(a) to GPU SGVector(b) by pre-assigned GPU backend
// If SGVector a is already on GPU: no operation will be done
// If there is no GPU backend available: shallow copy a to b
to_gpu(a, b);
// Copy values from GPU SGVector(b) to CPU SGVector(c) by pre-assigned GPU backend
// If SGVector a is already on CPU: no operation will be done
// If there is no GPU backend available: raise error
from_gpu(b, c);
// Transfer values from CPU to GPU
to_gpu(a);
-
to_gpu
method andfrom_gpu
methods are atomic. -
The operations will be carried out on GPU only if the data passed to the operations are on GPU and GPU backend is registered:
sg_linalg->get_gpu_backend() == true
. Thelinalg
will be conducted on CPU if the data is on CPU. -
The operations will be carried out on GPU only if the data passed to the operations are on GPU and GPU backend is registered:
sg_linalg->get_gpu_backend() == true
. Thelinalg
will be conducted on CPU if the data is on CPU. -
linalg
will report errors if the data is on GPU but no GPU backend is available anymore. Errors will also occur when an operation requires multiple inputs but the inputs are not on the same backend. -
The status of data can be checked by:
data.on_gpu()
.True
means the data is on GPU andfalse
means the data is on CPU.
Here we show how to do vector dot with linalg
library operations on CPU and GPU.
// CPU dot operation
#include <shogun/lib/SGVector.h>
#include <shogun/mathematics/linalg/LinalgNamespace.h>
using namespace shogun;
// Create SGVectors
const index_t size = 3;
SGVector<int32_t> a(size), b(size);
a.range_fill(0);
b.range_fill(0);
auto result = linalg::dot(a, b);
// GPU dot operation
#include <shogun/lib/SGVector.h>
#include <shogun/mathematics/linalg/LinalgNamespace.h>
#include <shogun/mathematics/linalg/LinalgBackendViennaCL.h>
using namesapce shogun;
// Set gpu backend
sg_linalg->set_gpu_backend(new LinalgBackendViennaCL());
// Create SGVectors
const index_t size = 3;
SGVector<int32_t> a(size), b(size), a_gpu, b_gpu;
a.range_fill(0);
b.range_fill(0);
// Transfer vectors to GPU
linalg::to_gpu(a, a_gpu);
linalg::to_gpu(b, b_gpu);
// run dot operation
auto result = linalg::dot(a_gpu, b_gpu);
If the result is a vector or matrix, it needs to be transferred back
#include <shogun/lib/SGVector.h>
#include <shogun/mathematics/linalg/LinalgNamespace.h>
#include <shogun/mathematics/linalg/LinalgBackendViennaCL.h>
using namesapce shogun;
// set gpu backend
sg_linalg->set_gpu_backend(new LinalgBackendViennaCL());
// Create a SGVector
SGVector<float32_t> a(5), a_gpu;
a.range_fill(0);
// Transfer the vector to gpu
linalg::to_gpu(a, a_gpu);
// Run scale operation and transfer the result back to CPU
auto result_gpu = linalg::scale(a_gpu, 0.3);
SGVector<float32_t> result;
from_gpu(result_gpu, result);
linalg
consists of three groups of components:
- The interface that decides which backend to use for each operation (
LinalgNameSpace.h
) - The structure serves as interface of GPU backend libraries (
GPUMemory*.h
) - The operation implementations in each backend (
LinalgBackend*.h
).
-
LinalgNamespace.h
defines multiplelinalg
operation interfaces in namespacelinalg
. All operation methods will callinfer_backend()
method on the inputs, and decide the backend to call.
-
GPUMemoryBase
class is a generic base class serving as GPU memory library interface. The GPU data is referred asGPUMemoryBase
pointer once it is generated byto_GPU()
method, and is cast back to specific GPU memory type during operations. -
GPUMemoryViennaCL
isViennaCL
specific GPU memory library interface, which defines the operations to access and manipulate data on GPU withViennaCL
operations.
-
LinalgBackendBase
is the base class for operations on all different backends. The macros inLinalgBackendBase
class defined thelinalg
operations and data transfer operations available in at least one backend. -
LinalgBackendGPUBase
has two pure virtual methods:to_gpu()
andfrom_gpu()
.LinalgBackendViennaCL
and other user-defined GPU backend classes are required to be derived fromLinalgBackendGPUBase
class, and thus GPU transfer methods are required to be implemented. -
LinalgBackendEigen
andLinalgBackendViennaCL*
classes provide the specific implementations of linear algebra operations withEigen3
library andViennaCL
library.
Current linalg
framework allows easy addition of external linear algebra libraries. To add CPU-based algebra libraries, users just need to derive from LinalgBackendBase
and re-implement the methods with new library. For GPU-based libraries, users need to add new class derived from LinalgBackendGPUBase
, as well as the GPU memory library interface class derived from GPUMemoryBase
class.