Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster matmul for case sparse.coo * np.ndarray #653

Open
HatPdotS opened this issue Mar 22, 2024 · 1 comment
Open

Faster matmul for case sparse.coo * np.ndarray #653

HatPdotS opened this issue Mar 22, 2024 · 1 comment
Labels
enhancement Indicates new feature requests

Comments

@HatPdotS
Copy link

HatPdotS commented Mar 22, 2024

I am using this module quite heavily in a recent project and found that sparse.matmul could work faster.
I have implemented a faster version for the case matmul(sparse.coo, np.ndarray) using numba for the case of 2D arrays (cut be generalized). I also implemented a parallellized Version. The single threaded version is roughly 3 times faster than sparse.matmul in this case and scales linearly with the number of entries in the sparse matrix. The multi-threaded Version has some overhead but scales well for very large matrices (10000000+ entries).

Here is the code:

@numba.njit()
def matmul_fast(dim1:np.ndarray,dim2:np.ndarray,V:np.ndarray,Arr: np.ndarray,out: np.ndarray):
    for i in range(V.shape[0]):
        out[dim1[i]] += V[i] * Arr[dim2[i]]
    return out
@numba.njit(parallel=True,fastmath=True,boundscheck=False)
def matmul_fast_parallel(dim1:np.ndarray,dim2:np.ndarray,V:np.ndarray,Arr: np.ndarray,out: np.ndarray,n: np.int64 = 100000):
    f = V.shape[0]
    s = out.shape
    out_loc = np.zeros(s,dtype=np.float64)
    g = 0
    i = 0
    for i_0 in numba.prange(0,f//n+1):
        i = i_0 * n
        out_loc = np.zeros(s,dtype=np.float64)
        for g in range(i,i+n):
            if g>=f:
                break
            out_loc[dim1[g]] += V[g] * Arr[dim2[g]]
        out += out_loc
    return out

These functions can be driven by a non numba function that unpacks the sparse array object into coordinates, data and shape.

def matmul_sparse(Arr_sparse, Arr):
    out = np.zeros((Arr_sparse.shape[0],Arr.shape[1]))
    return matmul_fast_parallel(Arr_sparse.coords[0],Arr_sparse.coords[1],Arr_sparse.data,Arr,out)

Are there any optimizations planned that improve the performance of sparse.matmul?
If not would there be interest in introducing specific solutions like this as an option in the current implementation?

@HatPdotS HatPdotS added the enhancement Indicates new feature requests label Mar 22, 2024
@hameerabbasi
Copy link
Collaborator

Hello, there's no precedent for multithreaded ops in this library, but please don't hesitate to make a PR for the single-threaded version. I'd be quite happy to accept it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

2 participants