Skip to content

Python bindings for sparcehc distance matrix clustering algorithm

License

Notifications You must be signed in to change notification settings

mdimura/sparsehc-dm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparsehc-dm is a python wrapper for sparcehc distance matrix clustering algorithm, integrated with STXXLDOI for Citing STXXL for on-disk sorting. SparseHC DOI for Citing SparseHC is a memory-efficient hierarchical agglomerative clustering implementation. It has close to linear memory complexity, enabling clustering of ~900000 structures/points on 32GB RAM.

Usage example:

import mdtraj as md
from sparsehc_dm import sparsehc_dm

traj_filename='traj.nc'
top_filename='top.pdb'

traj=md.load(traj_filename,top=top_filename)

m=sparsehc_dm.InMatrix()
N=traj.n_frames
for i in range(0,Nframes-1):
  rmsds=md.rmsd(traj, traj, i)
  for j in range(i+1,Nframes):
    m.push(i,j,float(rmsds[j]))

Z=sparsehc_dm.linkage(m,"complete")

Instalation

Prerequisites: boost graph and stxxl library

sudo apt-get install libboost-graph-dev libstxxl-dev libstxxl1

Building:

git clone https://github.com/Burning-Daylight/sparsehc-dm.git sparsehc-dm
cd sparsehc-dm
mkdir build
cd build
cmake ..
make
sudo make install

About

Python bindings for sparcehc distance matrix clustering algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published