python - Improving Performance of Multiplication of Scipy Sparse Matrices -
given scipy csc sparse matrix "sm" dimensions (170k x 170k) 440 million non-null points , sparse csc vector "v" (170k x 1) few non-null points, there can done improve performance of operation:
resul = sm.dot(v)
?
currently it's taking 1 second. initializing matrices csr increased time 3 seconds, csc performed better.
sm matrix of similarities between products , v vector represents products user bought or clicked on. every user sm same.
i'm using ubuntu 13.04, intel i3 @3.4ghz, 4 cores.
researching on read ablas package. typed terminal:
~$ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so
which resulted in:
linux-vdso.so.1 => (0x00007fff56a88000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f888137f000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8880fb7000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8880cb1000) /lib64/ld-linux-x86-64.so.2 (0x00007f888183c000)
and understood means i'm using high performance package ablas. i'm still not sure though if package implements parallel computing looks doesn't.
could multi-core processing boost performance? if so, there library helpful in python?
i considering idea of implementing in cython don't know if lead results.
thanks in advance.
the sparse matrix multiplication routines directly coded in c++, , far quick @ source reveals, there doesn't seem hook optimized library. furthermore, doesn't seem taking advantage of fact second matrix vector minimize calculations. can speed things quite bit accessing guts of sparse matrix, , customizing multiplication algorithm. following code in pure python/numpy, , if vector has "a few non-null points" matches speed of scipy's c++ code: if implemented in cython, speed increase should noticeable:
def sparse_col_vec_dot(csc_mat, csc_vec): # row numbers of vector non-zero entries v_rows = csc_vec.indices v_data = csc_vec.data # matrix description arrays m_dat = csc_mat.data m_ind = csc_mat.indices m_ptr = csc_mat.indptr # output arrays sizes = m_ptr.take(v_rows+1) - m_ptr.take(v_rows) sizes = np.concatenate(([0], np.cumsum(sizes))) data = np.empty((sizes[-1],), dtype=csc_mat.dtype) indices = np.empty((sizes[-1],), dtype=np.intp) indptr = np.zeros((2,), dtype=np.intp) j in range(len(sizes)-1): slice_ = slice(*m_ptr[[v_rows[j] ,v_rows[j]+1]]) np.multiply(m_dat[slice_], v_data[j], out=data[sizes[j]:sizes[j+1]]) indices[sizes[j]:sizes[j+1]] = m_ind[slice_] indptr[-1] = len(data) ret = sps.csc_matrix((data, indices, indptr), shape=csc_vec.shape) ret.sum_duplicates() return ret
a quick explanation of going on: csc matrix defined in 3 linear arrays:
data
contains non-zero entries, stored in column major order.indices
contains rows of non-zero entries.indptr
has 1 entry more number of columns of matrix, , items in columnj
found indata[indptr[j]:indptr[j+1]]
, in rowsindices[indptr[j]:indptr[j+1]]
.
so multiply sparse column vector, can iterate on data
, indices
of column vector, , each (d, r)
pair, extract corresponding column of matrix , multiply d
, i.e. data[indptr[r]:indptr[r+1]] * d
, indices[indptr[r]:indptr[r+1]]
.
Comments
Post a Comment