-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bias potential matrix #11
Comments
Hi @andt88, thanks for opening the issue. I'm sorry for the slow response, I have been travelling :). Yes, you're right about the bias matrix. The cell A0 should contain:
For umbrella sampling, the easiest thing is to define the bias energy of the bin/micro state via the center/median of the bin/micro state, as you have done. See also the following paragraph from the paper on DHAM (which preceded DHAMed) https://pubs.acs.org/doi/full/10.1021/ct500719p
Does you bias matrix have 3 rows and 4 columns, for your 3 states and 4 windows? Do you get a sensible free energy profile? |
Hi Lukas, thanks for your response. My matrix is actually larger. I am trying to get this to run and provide feedback. |
I am now running into a zero division error: dhamed.py:20: RuntimeWarning: divide by zero encountered in log
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
58
loglike-start nan
nan
Traceback (most recent call last):
File "dhamed.py", line 20, in <module>
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 182, in run_dhamed
numerical_gradients=numerical_gradients, **kwargs)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 220, in min_dhamed_bfgs
fprime=fprime, **kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 943, in _minimize_bfgs
gfk = myfprime(x0)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 95, in grad_dhamed_likelihood_ref_0
grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero |
Hi @andt88, |
I am using Amber, so the unit of the force constant should be kcal/(mol*A²) and thus that of the potential kcal/mol. |
Could you provide a minimum example which shows the division by zero error? I'd love to get to the bottom of this and a minimum example for debugging would help a lot! |
Hi Lukas, the error appears when I try to calculate the uncertainty. Here's a test case: import numpy as np
import matplotlib.pyplot as plt
from pydhamed import count_matrix
from pydhamed import run_dhamed
import count_transitions as count_transitions
num_umbrella_windows=4
bin_list=[[25.79675, 26.041050000000002], [26.04105, 26.28535], [26.285349999999998, 26.52965], [26.52965, 26.773950000000003], [26.77395, 27.018250000000002], [27.01825, 27.26255], [27.262549999999997, 27.50685], [27.506849999999996, 27.75115], [27.75115, 27.99545], [27.995449999999998, 28.23975]]
n_states=10
bias_mat=[[2.00000000e+00 3.91720050e+00 6.47352162e+00 9.66856338e+00]
[1.14216498e+00 2.66897408e+00 4.83480608e+00 7.63935872e+00]
[5.23059920e-01 1.65947762e+00 3.43482050e+00 5.84888402e+00]
[1.42684820e-01 8.88711120e-01 2.27356488e+00 4.29713928e+00]
[1.03968000e-03 3.56674580e-01 1.35103922e+00 2.98412450e+00]
[9.81245000e-02 6.33680000e-02 6.67243520e-01 1.90983968e+00]
[4.33939280e-01 8.79138000e-03 2.22177780e-01 1.07428482e+00]
[1.00848402e+00 1.92944720e-01 1.58420000e-02 4.77459920e-01]
[1.82175872e+00 6.15828020e-01 4.82361800e-02 1.19364980e-01]
[2.87376338e+00 1.27744128e+00 3.19360320e-01 0.00000000e+00]]
states_list=[[2, 3, 2, 3, 3], [6, 6, 5, 8, 7], [8, 9, 9, 9], [9, 9, 8]]
c_l=[]
print(bin_list)
print(n_states)
print(bias_mat)
print(states_list)
for i in states_list:
traj=np.array(i)
c_ = count_matrix(traj,n_states=n_states)
c_l.append(c_)
v_ar=bias_mat.reshape(n_states,num_umbrella_windows)
og = run_dhamed(c_l, -np.log(v_ar), numerical_gradients=False, jit_gradient=True)
n_blocks = 3
bl_l = []
for bl in np.split(traj[1:],n_blocks):
c_l = [count_matrix(bl,n_states=9)]
bl_g = run_dhamed(c_l, -np.log(v_ar))
bl_l.append(np.exp(-bl_g) / np.sum(np.exp(-bl_g)))
bl_ar = np.column_stack((bl_l))
bl_ln_ar = np.log(bl_ar)
# minmum and maximum values of the PMF (from the five reconstructions) to indicate the uncertainty.
min_pmf = np.min( bl_ln_ar - bl_ln_ar[-1], axis=1)
max_pmf = np.max( bl_ln_ar - bl_ln_ar[-1], axis=1)
bl_ar.shape
dh_log_err = np.std(-np.log(bl_ar), axis=1) / np.sqrt(n_blocks-1)
dh_log_err.shape
fig, ax = plt.subplots(figsize=(5,3))
# error bar is barely visible
#plt.errorbar(rc, rna_pmf, yerr=dh_log_err, fmt=".", capthick=1,capsize=4
rc = np.arange(0,9)
rna_pmf = og*-1
rna_pmf -= rna_pmf[-1]
plt.plot(rc, rna_pmf, ".", label="US DHAMed")
plt.fill_between(rc, min_pmf, max_pmf, alpha=0.4, label=r"min-max PMF $\Delta t=10^4 \, \mathrm{MC}$ ")
ax.set_xlabel("Distance")
ax.set_ylabel("PMF ($\mathrm{k_BT}$)")
ax.set_ylim(-1,13)
ax.set_xlim(-0.4,8.4)
ax.set_yticks([0,5,10])
ax.set_xticks([0,2,4,6,8])
ax.tick_params(direction='out')
plt.legend(borderaxespad=0.1)
plt.tight_layout()
plt.savefig("block_pmf.pdf")``` |
Fantastic, I'm looking into it right now :-) |
Hey, I think the error comes from some frames lying outside of the most left and most right bin. I fixed this by changing the bin distribution. Now I get another errror:
|
Thanks for the update! Two questions 1 You are loop over your umbrella windows, right? for i in states_list:
traj=np.array(i)
c_ = count_matrix(traj,n_states=n_states)
c_l.append(c_) The list of the count matrices grows, while you're iterating. For umbrella sampling, I'd expect one list of count matrices, with one entry for each umbrella window, for instance, for the analysis of umbrella sampling of ion permeation through the GLIC channel, I had: len(transition_count_l) # one transition count matrix for each of the 153 umbrella windows
153 See https://github.com/bio-phys/PyDHAMed/blob/master/pydhamed/glic-ion-channel/glic_ion_channel_permeation.ipynb for reference. I'd thus run the optimization once for all four umbrella windows, i.e., I'm not sure whether 2 Are you sure you have to the take the log of your bias array? The RNA duplex formation example is a bit misleading in that respect. There the bias returned by the simulations was w_i^a = e^(-beta u_i^a). Thus I have to take the log to get u_i^a= - ln(w_i^\a) k_BT$. Note to myself, I need to highlight this in the RNA example. In the GLIC ion channel example, I can directly input the bias u_i^a=k^a*(x_i-x_0^a)^2. |
Thanks for your feedback! I ran an umbrella sampling to calculate the free energy of binding for a ligand to a protein. for i in states_list:
traj=np.array(i)
c_ = count_matrix(traj,n_states=n_states)
c_l.append(c_)
v_ar=bias_mat.reshape(n_states,num_umbrella_windows)
og = run_dhamed(c_l, v_ar, numerical_gradients=False, jit_gradient=True)
n_blocks = 2
bl_l = []
for bl in np.split(traj[1:],n_blocks):
c_l = [count_matrix(bl,n_states=n_states)]
bl_g = run_dhamed(c_l, v_ar)
bl_l.append(np.exp(-bl_g) / np.sum(np.exp(-bl_g)))
bl_ar = np.column_stack((bl_l))
bl_ln_ar = np.log(bl_ar) Traceback (most recent call last):
File "dhamed_block.py", line 25, in <module>
og = run_dhamed(c_l, v_ar, numerical_gradients=False, jit_gradient=True)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 182, in run_dhamed
numerical_gradients=numerical_gradients, **kwargs)
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 220, in min_dhamed_bfgs
fprime=fprime, **kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 889, in fmin_bfgs
res = _minimize_bfgs(f, x0, args, fprime, callback=callback, **opts)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 964, in _minimize_bfgs
old_fval, old_old_fval, amin=1e-100, amax=1e100)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 783, in _line_search_wolfe12
**kwargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 101, in line_search_wolfe1
c1=c1, c2=c2, amax=amax, amin=amin, xtol=xtol)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 175, in scalar_search_wolfe1
derphi1 = derphi(stp)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/linesearch.py", line 90, in derphi
gval[0] = fprime(xk + s*pk, *newargs)
File "/usr/lib64/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
return function(*(wrapper_args + args))
File "/storage/home/local/andtos-loc/Dropbox_2/PhD/IFN/IFN_2018_20/DHAMed/pydhamed/optimize_dhamed.py", line 95, in grad_dhamed_likelihood_ref_0
grad = _loop_grad_dhamed_likelihood_0_jit(grad,g, ip, jp, ti, tj, vi, vj, nijp)
ZeroDivisionError: division by zero |
I think you have to divide your bias array by RT to convert from kcal/mol to kBT. N.B., assuming that your temperature was 300 K and dividing by RT, lets Do you still get the division by zero error on your full data set? Comparison to WHAM provides a quick test, whether your DHAMed results are sensible. You'd expect that many/some features of the WHAM PMF are reproduced by the DHAMed PMF. |
I am currently running it on the full data set. I'll let you know once it's finished. |
Hi Lukas I am still getting the division by zero error on the full data set. Should I send you the complete input? |
Yes, this would be great! I'm very keen to get to the bottom of this problem :) |
I attached a file that contains: |
Hi Lukas, did the file work for you? Do you prefer another format? |
The file is hard to parse for us. Could you please store the arrays using a native numpy function. np.savez('dhamed.npz', bias_matrix=bias_matrix, bin_list=bin_list) This is much easier for us to read again into a python process. The number of states is 1000, right? |
Here it is. I had to label it txt to upload it but the format is npz. Thanks a lot!! |
Thanks a lot! I can open your file and I'm trying to figure out what's going on right now. |
I managed to get DHAMed to work for your data. I made a trivial mistake in my code that affects the setup of calculations when there are empty bins/states. I'm fixing this right now and I will try to update the repository later today. |
d42f2a6 should fix the problem- the problem was a trivial mistake in the handling of empty bins. Have a look at the Jupyter notebook (had to rename it to .txt), whether the result makes sense to you. |
Thank you so much for fixing this! I am running it right now and will let you know then. |
I'm still getting the same error. I downloaded the Master Branch yesterday. Did I get the wrong version? |
Did you update your installation? Just in case you can run |
I'm not sure what it is I am doing wrong but I still get the error. I cloned the master branch and ran |
I'll look into that :) Thanks a lot for letting me know! |
Is there a possibility to check whether the correct version is installed and called? |
You could use ?? to for instance on For nk = check_total_transition_counts(n_out, n_in, paired_ar, n_actual) ` |
I checked thoroughly that I have the correct version. Here's the error I get (the "bin exclusion" seems new to me, so I assume this is from the changes in the code):
|
I noticed that I get different results when running from the jupyter notebook than from the command line:
Jupyter Notebook this:
|
Thanks for the update :) To check, I also ran the notebook on my laptop installing the package again from GitHub git pull origin Output Number of transition pairs 169579 Check that I'm running the correct version. prepare_dhamed.generate_dhamed_input?? |
I realized that I get the error when dividing the bias potential matrix by RT (0.0019). Can you reproduce the error? |
I'll try to have a look at this tmr morning :) |
Where is the 0.0019 coming from? If your energies are in kcal/mol, I'd expect something like 1.98x10^-3 kcal K^-1 mol^-1 x 300 K ~ 0.596 kcal/mol for the factor to divide the bias energies. Do tell me what, I'm missing. |
Thanks for your comment! Looks to me as if I forgot to multipy by T=300K. I will look at it later.
|
Hi @andt88, can I close the issue or are there still problems? |
Sorry I was on vacation until just now. It's working in principle but the PMF looks quite different from the one I get using a TI approach. I will have a closer look in the coming days and keep you posted. Thanks a lot for your help so far!! |
Let me know how it goes :) One thing that would be interesting would be to compare the PMF from DHAMed to WHAM. In our paper we write
|
That looks pretty good to me, thanks for the update! In recent (unpublished) work, I have also found scatter at the ends of DHAMed profiles. I found that all the structures outside the studied range of the reaction coordinate/order parameter to the would be mapped to the structural states/bins with the smallest and largest value of the order parameter. To fix that I added one more state on either end of the PMF, included the state in the DHAMed or WHAM calculations, but left out the state in the final plot (plotting over just the range of interest). |
I tried following your advice, but I'm not sure what it is I am doing wrong. I therefore defined the bins between 21 to 54 Angstrom. As expected, each structure is now assigned to a bin. However the scattering at the edges of the plot remains unchanged. |
Hm, I need to think about this in more detail. At the edge of your reaction coordinate you may have fewer events, which may be one reason for the scattering. |
Thank a lot for your update! That looks very good :) It seems that the padding did not help in your case, which is interesting for the further development of DHAMed. |
Thanks a lot to you and @kain88-de for your help!! |
Hi there again :) |
Hi, I'm sorry for the slow response. pyDHAMed does not include such functions yet, but I'm working on including them.
|
Thanks a lot! Let me know if I can help with testing. |
Hi,
I wanted to make sure that I understand the definition of the bias potential matrix correctly.
Assuming an umbrella sampling experiment with 4 windows A, B, C, D and 3 states 0, 1, 2. The restraining potential defined as k*(x-x0)^2. Should the cell A0 contain the following?
x=median value of bin 0,
x0=target value of window A
The text was updated successfully, but these errors were encountered: