Running into memory issues when simulating many models with with different initial conditions #1225

dalbabur · 2024-05-31T19:11:10Z

I'm running simulations in parallel on Hyak using ipyparallel. I'm able to load and simulate models on many engines, but eventually I run out of memory. After doing a couple of tests, I believe the memory leak is related to roadrunner and not ipyparallel.

Here is what I'm seeing:
initial load:

and the load eventually after some iterations:

I'm doing something like this in a loop with different parameter sets:

      for r in many_r :  # iterate through models     
            # first, set parameters (models have ~1k parameters, this takes a while)
             [r.setValue('init('+l+')',v) for l, v in zip(parameter_labels, parameter_values)]

            # now, iterate over conditions and set variables
            # across conditions, models have same structure but just a couple of different variables (~20 variables)
            rb = r.saveStateS() # this is convenient to keep the newly set parameters, instead of resetting the model to how it was first loaded
            for conditions in conditions:
                r2 = RoadRunner()
                r2.loadStateS(rb) # this has the new parameters
                # set variable
                [r.setValue('init('+l+')',v) for l, v in zip(variable_labels, variable_values)]
                results[condition] = r2.simulate()
            all_results.append(results)

       # what I've tried to deal with the memory issues
        r2.clearModel()
        del r2, rb

        return all_results

And i have these config flags:

from roadrunner import Config, RoadRunner
Config.setValue(Config.ROADRUNNER_DISABLE_PYTHON_DYNAMIC_PROPERTIES, False)
Config.setValue(Config.LOADSBMLOPTIONS_RECOMPILE, False) 
Config.setValue(Config.LLJIT_OPTIMIZATION_LEVEL, 4)

The text was updated successfully, but these errors were encountered:

hsauro · 2024-06-03T19:08:37Z

What version are you running? Can this be reproduced on a desktop machine?

dalbabur · 2024-06-03T19:30:40Z

I'm using libroadrunner 2.5.0

haven't tried running it locally yet

luciansmith · 2024-06-03T19:38:39Z

I can say that the saveState/loadState functions had a bug that was fixed with 2.7.0. It was causing crashes, not memory leaks, though. But it can't hurt to try the latest version, at least?

dalbabur · 2024-06-03T19:54:47Z

sure ill update and report back

dalbabur · 2024-06-04T19:38:26Z

same thing with 2.7.0...

hsauro · 2024-06-04T21:55:44Z

We'll probably need a desktop example that shows the effect in order to pin down the leak.

luciansmith · 2024-06-04T21:58:49Z

Thanks for checking!

I just ran all of roadrunner's C-based tests through valgrind and there were no errors/leaks there, so the problem must lie either in Python directly or in the Python bindings. If you could manage to get something that illustrated the problem and could be run locally, that would be ideal.

dalbabur · 2024-06-05T17:22:16Z

Thanks for checking that Lucian. Working on a minimal example that will show the issue locally...

What are some way I could check for memory leaks in python or bindings?

luciansmith · 2024-06-05T17:56:22Z

It's possible to run valgrind on python, but that's going to find issues on even the blandest of scripts. It should also find the leak we're looking for, though. The main thing I can think of is to just have the exact same script as ran on Hyak, but locally (and maybe simpler) and watch it eat memory?

VivianeKlingel · 2024-09-04T12:03:06Z

I've also had this/a similar problem when simulating many times in parallel. My model is of a population of cells, so each model simulation contains ~500 simulations of individual cells. For parameter optimization this is then simulated again 10.000x. I have this issue on my local machine but also on a cluster (where I first noticed the problem, because it used up all the memory, ~90 GB). libroadrunner version is 2.7.0.

The only way that I found to prevent this memory leak, was to use joblib to dump the loaded model and only load it within each child process (short of reloading it every time, which just takes way to long). It's been some time since I looked into it, but I tested many different ways of either resetting/clearing the model, simulation settings and also different multiprocessing setups, but none worked. The most it did was, that the memory went down after one process was done, but went up immediately when it started the next simulation.

I made two minimal examples, one with and one without using joblib. I examined and plotted the memory usage with Memory Profiler. libroadrunner version is 2.7.0.

mprof run --multiprocess --include-children -o "./mprofile_$(date +"%F-%H%M").dat" ./RunMinMemTest.py
mprof plot -o  "./MemPlot_Standard_$(date +"%F-%H%M").png"

Standard Parallel Simulation

Simulation where loaded model is dumped and loaded in child process

Code:

Classic Simulation

from memory_profiler import profile
from roadrunner.tests import TestModelFactory as tmf
from joblib import dump, load
import time
import tellurium as te
from concurrent.futures import ProcessPoolExecutor, as_completed


def SimModel(m):
    m.resetAll()
    start_t = 0
    end_t = 250
    steps = 250*10
    result = m.simulate(start_t, end_t, steps)
    return

  
def pSim(r): # Population Simulation -  Many individual simulations
    nSims = 480
    executor = ProcessPoolExecutor()
    Results =[]
    futures = (executor.submit(SimModel, r) for n in range(nSims))
    for future in as_completed(futures):
        Results.append(future.result())
        future = []
        Results = []
    return


@profile
def run_sim(r): # Represents Optimzation with many simulations of the model

    for i in range(10):
        pSim(r)        
    

def main():
    sbml = tmf.Brown2004().str()
    
    r = te.loadSBMLModel(sbml)
    t1 = time.perf_counter()
    run_sim(r)
    elapsed_time = time.perf_counter() - t1
    print('Time:', elapsed_time, 'sec')
    
if __name__ == '__main__':
    main()

Simulation With Dumping

from memory_profiler import profile
from roadrunner.tests import TestModelFactory as tmf
from joblib import dump, load
import time
import tellurium as te
from concurrent.futures import ProcessPoolExecutor, as_completed


def SimModel(r_loc):
    m = load(r_loc)
    start_t = 0
    end_t = 250
    steps = 250*10
    result = m.simulate(start_t, end_t, steps)
    return
   
      
def pSim(r_loc): # Population Simulation -  Many individual simulations
    nSims = 480
    executor = ProcessPoolExecutor()
    Results =[]
    futures = (executor.submit(SimModel, r_loc) for n in range(nSims))
    for future in as_completed(futures):
        Results.append(future.result())
        future = []
        Results = []
    return

@profile
def run_sim(r_loc): # Represents Optimzation with many simulations of the model

    for i in range(10):
        pSim(r_loc)
        
    

def main():
    sbml = tmf.Brown2004().str()
    
    r = te.loadSBMLModel(sbml)
    r_loc = 'rrmodel_test.joblib'
    dump(r, r_loc)
    r = []
    t1 = time.perf_counter()
    
    run_sim(r_loc)
    elapsed_time = time.perf_counter() - t1
    print('Time:', elapsed_time, 'sec')
    
if __name__ == '__main__':
    main()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running into memory issues when simulating many models with with different initial conditions #1225

Running into memory issues when simulating many models with with different initial conditions #1225

dalbabur commented May 31, 2024

hsauro commented Jun 3, 2024 via email •

edited by luciansmith

Loading

dalbabur commented Jun 3, 2024

luciansmith commented Jun 3, 2024

dalbabur commented Jun 3, 2024

dalbabur commented Jun 4, 2024

hsauro commented Jun 4, 2024 via email •

edited by luciansmith

Loading

luciansmith commented Jun 4, 2024

dalbabur commented Jun 5, 2024

luciansmith commented Jun 5, 2024

VivianeKlingel commented Sep 4, 2024

Running into memory issues when simulating many models with with different initial conditions #1225

Running into memory issues when simulating many models with with different initial conditions #1225

Comments

dalbabur commented May 31, 2024

hsauro commented Jun 3, 2024 via email • edited by luciansmith Loading

dalbabur commented Jun 3, 2024

luciansmith commented Jun 3, 2024

dalbabur commented Jun 3, 2024

dalbabur commented Jun 4, 2024

hsauro commented Jun 4, 2024 via email • edited by luciansmith Loading

luciansmith commented Jun 4, 2024

dalbabur commented Jun 5, 2024

luciansmith commented Jun 5, 2024

VivianeKlingel commented Sep 4, 2024

hsauro commented Jun 3, 2024 via email •

edited by luciansmith

Loading

hsauro commented Jun 4, 2024 via email •

edited by luciansmith

Loading