Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into memory issues when simulating many models with with different initial conditions #1225

Open
dalbabur opened this issue May 31, 2024 · 10 comments

Comments

@dalbabur
Copy link

I'm running simulations in parallel on Hyak using ipyparallel. I'm able to load and simulate models on many engines, but eventually I run out of memory. After doing a couple of tests, I believe the memory leak is related to roadrunner and not ipyparallel.

Here is what I'm seeing:
initial load:
image
and the load eventually after some iterations:
image

I'm doing something like this in a loop with different parameter sets:

      for r in many_r :  # iterate through models     
            # first, set parameters (models have ~1k parameters, this takes a while)
             [r.setValue('init('+l+')',v) for l, v in zip(parameter_labels, parameter_values)]

            # now, iterate over conditions and set variables
            # across conditions, models have same structure but just a couple of different variables (~20 variables)
            rb = r.saveStateS() # this is convenient to keep the newly set parameters, instead of resetting the model to how it was first loaded
            for conditions in conditions:
                r2 = RoadRunner()
                r2.loadStateS(rb) # this has the new parameters
                # set variable
                [r.setValue('init('+l+')',v) for l, v in zip(variable_labels, variable_values)]
                results[condition] = r2.simulate()
            all_results.append(results)

       # what I've tried to deal with the memory issues
        r2.clearModel()
        del r2, rb

        return all_results

And i have these config flags:

from roadrunner import Config, RoadRunner
Config.setValue(Config.ROADRUNNER_DISABLE_PYTHON_DYNAMIC_PROPERTIES, False)
Config.setValue(Config.LOADSBMLOPTIONS_RECOMPILE, False) 
Config.setValue(Config.LLJIT_OPTIMIZATION_LEVEL, 4)
@hsauro
Copy link

hsauro commented Jun 3, 2024 via email

@dalbabur
Copy link
Author

dalbabur commented Jun 3, 2024

I'm using libroadrunner 2.5.0

haven't tried running it locally yet

@luciansmith
Copy link

I can say that the saveState/loadState functions had a bug that was fixed with 2.7.0. It was causing crashes, not memory leaks, though. But it can't hurt to try the latest version, at least?

@dalbabur
Copy link
Author

dalbabur commented Jun 3, 2024

sure ill update and report back

@dalbabur
Copy link
Author

dalbabur commented Jun 4, 2024

same thing with 2.7.0...

@hsauro
Copy link

hsauro commented Jun 4, 2024 via email

@luciansmith
Copy link

Thanks for checking!

I just ran all of roadrunner's C-based tests through valgrind and there were no errors/leaks there, so the problem must lie either in Python directly or in the Python bindings. If you could manage to get something that illustrated the problem and could be run locally, that would be ideal.

@dalbabur
Copy link
Author

dalbabur commented Jun 5, 2024

Thanks for checking that Lucian. Working on a minimal example that will show the issue locally...

What are some way I could check for memory leaks in python or bindings?

@luciansmith
Copy link

It's possible to run valgrind on python, but that's going to find issues on even the blandest of scripts. It should also find the leak we're looking for, though. The main thing I can think of is to just have the exact same script as ran on Hyak, but locally (and maybe simpler) and watch it eat memory?

@VivianeKlingel
Copy link

I've also had this/a similar problem when simulating many times in parallel. My model is of a population of cells, so each model simulation contains ~500 simulations of individual cells. For parameter optimization this is then simulated again 10.000x. I have this issue on my local machine but also on a cluster (where I first noticed the problem, because it used up all the memory, ~90 GB). libroadrunner version is 2.7.0.

The only way that I found to prevent this memory leak, was to use joblib to dump the loaded model and only load it within each child process (short of reloading it every time, which just takes way to long). It's been some time since I looked into it, but I tested many different ways of either resetting/clearing the model, simulation settings and also different multiprocessing setups, but none worked. The most it did was, that the memory went down after one process was done, but went up immediately when it started the next simulation.

I made two minimal examples, one with and one without using joblib. I examined and plotted the memory usage with Memory Profiler. libroadrunner version is 2.7.0.

mprof run --multiprocess --include-children -o "./mprofile_$(date +"%F-%H%M").dat" ./RunMinMemTest.py
mprof plot -o  "./MemPlot_Standard_$(date +"%F-%H%M").png"

Standard Parallel Simulation
MemPlot_Standard_2024-09-04-1317

Simulation where loaded model is dumped and loaded in child process
MemPlot_Dump_2024-09-04-1337

Code:

Classic Simulation

from memory_profiler import profile
from roadrunner.tests import TestModelFactory as tmf
from joblib import dump, load
import time
import tellurium as te
from concurrent.futures import ProcessPoolExecutor, as_completed


def SimModel(m):
    m.resetAll()
    start_t = 0
    end_t = 250
    steps = 250*10
    result = m.simulate(start_t, end_t, steps)
    return

  
def pSim(r): # Population Simulation -  Many individual simulations
    nSims = 480
    executor = ProcessPoolExecutor()
    Results =[]
    futures = (executor.submit(SimModel, r) for n in range(nSims))
    for future in as_completed(futures):
        Results.append(future.result())
        future = []
        Results = []
    return


@profile
def run_sim(r): # Represents Optimzation with many simulations of the model

    for i in range(10):
        pSim(r)        
    

def main():
    sbml = tmf.Brown2004().str()
    
    r = te.loadSBMLModel(sbml)
    t1 = time.perf_counter()
    run_sim(r)
    elapsed_time = time.perf_counter() - t1
    print('Time:', elapsed_time, 'sec')
    
if __name__ == '__main__':
    main()

Simulation With Dumping

from memory_profiler import profile
from roadrunner.tests import TestModelFactory as tmf
from joblib import dump, load
import time
import tellurium as te
from concurrent.futures import ProcessPoolExecutor, as_completed


def SimModel(r_loc):
    m = load(r_loc)
    start_t = 0
    end_t = 250
    steps = 250*10
    result = m.simulate(start_t, end_t, steps)
    return
   
      
def pSim(r_loc): # Population Simulation -  Many individual simulations
    nSims = 480
    executor = ProcessPoolExecutor()
    Results =[]
    futures = (executor.submit(SimModel, r_loc) for n in range(nSims))
    for future in as_completed(futures):
        Results.append(future.result())
        future = []
        Results = []
    return

@profile
def run_sim(r_loc): # Represents Optimzation with many simulations of the model

    for i in range(10):
        pSim(r_loc)
        
    

def main():
    sbml = tmf.Brown2004().str()
    
    r = te.loadSBMLModel(sbml)
    r_loc = 'rrmodel_test.joblib'
    dump(r, r_loc)
    r = []
    t1 = time.perf_counter()
    
    run_sim(r_loc)
    elapsed_time = time.perf_counter() - t1
    print('Time:', elapsed_time, 'sec')
    
if __name__ == '__main__':
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants