Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinAlgError: Singular matrix error when running sctransform on multi-sample data #351

Open
aadimator opened this issue Nov 7, 2024 · 0 comments

Comments

@aadimator
Copy link
Contributor

Hi Team, I have multi-sample data, and previously I was trying to run sctransform on that. It was giving out-of-memory error on the server, even when I allocated 500GB memory to it. I also noticed around that time, that I haven't filtered out the non-protein-coding genes. So I proceeded with that first, and filtered out those genes, which reduced my gene count from 26k to 17k. Now, when I'm running the scTransform on it again, I'm getting the following error:

# Let's read the saved data
file_name = "01-msdata_filtered_protein_coding.h5ms"
ms_data = st.io.read_h5ms(here() / config.data_processed / file_name)

# set scope and mode
ms_data.tl.set_scope_and_mode(
    scope=slice_generator[:],
    mode='integrate'
)

ms_data.tl.sctransform()

---

{
	"name": "LinAlgError",
	"message": "Singular matrix",
	"stack": "---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
\"\"\"
Traceback (most recent call last):
  File \"/N/u/mraadam/BigRed200/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/_utils.py\", line 72, in __call__
    return self.func(**kwargs)
  File \"/N/u/mraadam/BigRed200/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py\", line 598, in __call__
    return [func(*args, **kwargs)
  File \"/N/u/mraadam/BigRed200/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py\", line 598, in <listcomp>
    return [func(*args, **kwargs)
  File \"<__array_function__ internals>\", line 180, in inv
  File \"/N/u/mraadam/BigRed200/.conda/envs/stereo_pdac/lib/python3.8/site-packages/numpy/linalg/linalg.py\", line 552, in inv
    ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
  File \"/N/u/mraadam/BigRed200/.conda/envs/stereo_pdac/lib/python3.8/site-packages/numpy/linalg/linalg.py\", line 89, in _raise_linalgerror_singular
    raise LinAlgError(\"Singular matrix\")
numpy.linalg.LinAlgError: Singular matrix
\"\"\"

The above exception was the direct cause of the following exception:

LinAlgError                               Traceback (most recent call last)
Cell In[6], line 7
      1 # set scope and mode
      2 ms_data.tl.set_scope_and_mode(
      3     scope=slice_generator[:],
      4     mode='integrate'
      5 )
----> 7 ms_data.tl.sctransform()

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:223, in MSDataPipeLine.__getattr__.<locals>.temp(*args, **kwargs)
    220     kwargs[\"mode\"] = self.__mode
    222 if kwargs[\"mode\"] == \"integrate\":
--> 223     return self._use_integrate_method(item, *args, **kwargs)
    224 elif kwargs[\"mode\"] == \"isolated\":
    225     self._run_isolated_method(item, *args, **kwargs)

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:136, in MSDataPipeLine._use_integrate_method(self, item, *args, **kwargs)
    133             return new_attr(*args, **kwargs)
    135 logger.info(f'data_obj(idx=0) in ms_data start to run {item}')
--> 136 return new_attr(
    137     ms_data_view.merged_data.__getattribute__(self.__class__.ATTR_NAME),
    138     *args,
    139     **kwargs
    140 )

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/core/st_pipeline.py:43, in logit.<locals>.wrapped(*args, **kwargs)
     41 logger.info('start to run {}...'.format(func.__name__))
     42 tk = tc.start()
---> 43 res = func(*args, **kwargs)
     44 logger.info('{} end, consume time {:.4f}s.'.format(func.__name__, tc.get_time_consumed(key=tk, restart=False)))
     45 return res

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/core/st_pipeline.py:583, in StPipeline.sctransform(self, n_cells, n_genes, filter_hvgs, var_features_n, inplace, res_key, exp_matrix_key, seed_use, filter_raw, **kwargs)
    581 from ..preprocess.filter import filter_genes
    582 data = self.data if inplace else copy.deepcopy(self.data)
--> 583 self.result[res_key] = sc_transform(data, n_cells, n_genes, filter_hvgs, var_features_n,
    584                                     exp_matrix_key=exp_matrix_key, seed_use=seed_use, **kwargs)
    585 key = 'sct'
    586 self.reset_key_record(key, res_key)

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/preprocess/sc_transform.py:25, in sc_transform(data, n_cells, n_genes, filter_hvgs, var_features_n, do_correct_umi, exp_matrix_key, seed_use, **kwargs)
     22     data.exp_matrix = csr_matrix(data.exp_matrix)
     24 # set do_correct_umi as False for less memory cost
---> 25 res = SCTransform(
     26     data.exp_matrix.T.tocsr(),
     27     data.gene_names,
     28     data.cell_names,
     29     n_genes=n_genes,
     30     n_cells=n_cells,
     31     do_correct_umi=do_correct_umi,
     32     return_only_var_genes=filter_hvgs,
     33     variable_features_n=var_features_n,
     34     seed_use=seed_use,
     35     **kwargs
     36 )
     37 new_exp_matrix = res[0][exp_matrix_key]
     38 if issparse(new_exp_matrix):
     39     # data.sub_by_index(gene_index=res[1]['umi_genes'])

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/algorithm/sctransform/sctransform.py:133, in SCTransform(umi, genes, cells, reference_sct_model, do_correct_umi, n_cells, residual_features, variable_features_n, variable_features_rv_th, vars_to_regress, do_scale, do_center, clip_range, conserve_memory, return_only_var_genes, seed_use, **kwargs)
    130     sct_method = 'default'
    132 if sct_method == 'default':
--> 133     vst_out = vst(**vst_args)
    134 else:
    135     raise NotImplementedError

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/algorithm/sctransform/vst.py:132, in vst(umi, genes, cells, latent_var, batch_var, latent_var_nonreg, n_genes, n_cells, method, do_regularize, theta_regularization, res_clip_range, bin_size, min_cells, residual_type, return_cell_attr, return_gene_attr, return_corrected_umi, min_variance, bw_adjust, gmean_eps, theta_estimation_fun, theta_given, exclude_poisson, use_geometric_mean, use_geometric_mean_offset, fix_intercept, fix_slope, scale_factor, vst_flavor, seed_use)
    130 genes_step1_bool_list = np.isin(genes, genes_step1)
    131 start_time = time.time()
--> 132 model_pars = get_model_pars(
    133     genes_step1, bin_size, umi[genes_step1_bool_list,][:, cells_step1_bool_list],
    134     model_str, cells_step1, method, data_step1, theta_given,
    135     theta_estimation_fun, exclude_poisson, fix_intercept, fix_slope, use_geometric_mean,
    136     use_geometric_mean_offset)
    137 logger.info(f'get_model_pars finished, cost {time.time() - start_time} seconds')
    139 min_theta = 1e-07

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/algorithm/sctransform/vst.py:235, in get_model_pars(genes_step1, bin_size, umi, model_str, cells_step1, method, data_step1, theta_given, theta_estimation_fun, exclude_poisson, fix_intercept, fix_slope, use_geometric_mean, use_geometric_mean_offset)
    233     raise NotImplementedError
    234 if method == 'poisson':
--> 235     model_pars = fit_poisson(umi=umi, model_str=model_str, data=data_step1,
    236                              theta_estimation_fun=theta_estimation_fun)
    237 else:
    238     raise NotImplementedError

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/stereo/algorithm/sctransform/utils.py:68, in fit_poisson(umi, model_str, data, theta_estimation_fun)
     65 def fit_poisson(umi, model_str, data, theta_estimation_fun=\"theta.ml\") -> pd.DataFrame:
     66     # TODO: ignore `theta_estimation_fun`
     67     regressor_data = dmatrix(\"~log_umi\", data, return_type='dataframe')
---> 68     results = Parallel(n_jobs=cpu_count(), backend=\"threading\")(
     69         delayed(one_row_fit_poission)(regressor_data, y.toarray()[0], theta_estimation_fun)
     70         for y in umi
     71     )
     72     return pd.DataFrame(results, columns=[\"theta\", \"Intercept\", \"log_umi\"])

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py:2007, in Parallel.__call__(self, iterable)
   2001 # The first item from the output is blank, but it makes the interpreter
   2002 # progress until it enters the Try/Except block of the generator and
   2003 # reaches the first `yield` statement. This starts the asynchronous
   2004 # dispatch of the tasks to the workers.
   2005 next(output)
-> 2007 return output if self.return_generator else list(output)

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py:1650, in Parallel._get_outputs(self, iterator, pre_dispatch)
   1647     yield
   1649     with self._backend.retrieval_context():
-> 1650         yield from self._retrieve()
   1652 except GeneratorExit:
   1653     # The generator has been garbage collected before being fully
   1654     # consumed. This aborts the remaining tasks if possible and warn
   1655     # the user if necessary.
   1656     self._exception = True

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py:1754, in Parallel._retrieve(self)
   1747 while self._wait_retrieval():
   1748 
   1749     # If the callback thread of a worker has signaled that its task
   1750     # triggered an exception, or if the retrieval loop has raised an
   1751     # exception (e.g. `GeneratorExit`), exit the loop and surface the
   1752     # worker traceback.
   1753     if self._aborting:
-> 1754         self._raise_error_fast()
   1755         break
   1757     # If the next job is not ready for retrieval yet, we just wait for
   1758     # async callbacks to progress.

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py:1789, in Parallel._raise_error_fast(self)
   1785 # If this error job exists, immediately raise the error by
   1786 # calling get_result. This job might not exists if abort has been
   1787 # called directly or if the generator is gc'ed.
   1788 if error_job is not None:
-> 1789     error_job.get_result(self.timeout)

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py:745, in BatchCompletionCallBack.get_result(self, timeout)
    739 backend = self.parallel._backend
    741 if backend.supports_retrieve_callback:
    742     # We assume that the result has already been retrieved by the
    743     # callback thread, and is stored internally. It's just waiting to
    744     # be returned.
--> 745     return self._return_or_raise()
    747 # For other backends, the main thread needs to run the retrieval step.
    748 try:

File ~/.conda/envs/stereo_pdac/lib/python3.8/site-packages/joblib/parallel.py:763, in BatchCompletionCallBack._return_or_raise(self)
    761 try:
    762     if self.status == TASK_ERROR:
--> 763         raise self._result
    764     return self._result
    765 finally:

LinAlgError: Singular matrix"
}

Before filtering, this step was working fine, and the OOM error was thrown later on. But now, it's throwing this error. Am I missing some step here, or is this a bug? I'd really appreciate it if someone can tell me a fix for this.

Thanks, and best regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant