Sparsehc-dm is a python wrapper for sparcehc distance matrix clustering algorithm, integrated with STXXL for on-disk sorting. SparseHC is a memory-efficient hierarchical agglomerative clustering implementation. It has close to linear memory complexity, enabling clustering of ~900000 structures/points on 32GB RAM.
import mdtraj as md
from sparsehc_dm import sparsehc_dm
traj_filename='traj.nc'
top_filename='top.pdb'
traj=md.load(traj_filename,top=top_filename)
m=sparsehc_dm.InMatrix()
N=traj.n_frames
for i in range(0,Nframes-1):
rmsds=md.rmsd(traj, traj, i)
for j in range(i+1,Nframes):
m.push(i,j,float(rmsds[j]))
Z=sparsehc_dm.linkage(m,"complete")
sudo apt-get install libboost-graph-dev libstxxl-dev libstxxl1
git clone https://github.com/Burning-Daylight/sparsehc-dm.git sparsehc-dm
cd sparsehc-dm
mkdir build
cd build
cmake ..
make
sudo make install