Numerical structural identifiability analysis #234

formersbach · 2022-09-22T09:55:02Z

formersbach
Sep 22, 2022
Collaborator

I believe identifiability analysis is a very useful tool in model development and experiment design!
Based on the work of Joubert and Stigter I have implemented the numerical part of their algorithm in a proof of concept script.
Structural identifiability is a structural property of any model. To quote Joubert et al:
"One way of characterizing structural identifiability is to say that at least 1 parameter has a confidence interval that spans the interval negative infinity to positive infinity. Any form of unidentifiability [...] calls into question the predictive power of a model and urges its user to interpret all results with caution"
The idea of their algorithm is to analyze the so called parametric output sensitivity matrix (OSM). Full rank of the OSM is a sufficient condition for structural identifiability. Rank deficiency indicates non-influence by parameters or correlation between parameters.
The rank of the OSM is calculated using singular value decomposition. Since all calculations are numerical and non-exhaustive there is a non-zero possibility of false negatives (identifiable parameter is deemed unidentifiable).

In principle no user input is required. A user defined observable matrix can be supplied however.

The script is ~ 100 lines long and requires the sympy package for symbolic calculations.

The only information required is a Text2Model object.
Where in Pasmopy/Biomass do you think this should go? Should I just provide you the proof of concept script?

himoto · 2022-09-22T10:37:30Z

himoto
Sep 22, 2022
Maintainer

Hi @formersbach, sounds fantastic to me!
Regarding implementation, it would be nice to add your script to text2model.py as a method of Text2Model class.
Or you can create something like identifiability_analysis.py in construction/ folder and implement as follows:

from .text2model import Text2Model

# Example
class IdentifiabilityAnalysis(Text2Model):
    ...

Anyways it depends on the usage.
If you need sympy, please add it to requirements.txt.
Can you please show me a usage example of it in the reply of this or Show and tell in a Discussion?

0 replies

formersbach · 2022-09-28T09:42:49Z

formersbach
Sep 28, 2022
Collaborator Author

Hey @himoto !
This is the proof of concept script for identifiability:

from biomass import Text2Model
import sympy as sym
import numpy as np
from scipy.integrate import solve_ivp
import os
import re
import matplotlib.pyplot as plt

class ident_analysis:
    def __init__(self, model_file) -> None:
        self.model = Text2Model(model_file)
        self.model.convert(overwrite=True)
        self.symbol_list = self.create_symbols()
        self.v = self.get_v()
        self.N = sym.Matrix(self.model.stoichiometry_matrix.toarray())
        self.dydt = self.N * self.v

    def create_symbols(self):
        #Creates list of sympy symbols for integration
        temp_text = "import sympy\nsymbol_list=[\n"
        for species in self.model.species:
            temp_text += f"\tsympy.Symbol(\"{species}\"),\n"
        for parameter in self.model.parameters:
            temp_text += f"\tsympy.Symbol(\"{parameter}\"),\n"
        temp_text += "]"
        with open("tmp_symbols.py", "w") as tmp:
            tmp.writelines(temp_text)
        try:
            from tmp_symbols import symbol_list
        except ImportError:
            raise ImportError("The temporary symbols file could not be created.")
        os.remove("tmp_symbols.py")
        return symbol_list

    def get_v(self):
        #Turn reaction velocities to sympy equation vector
        v = []
        for reaction in self.model.kinetics:
            v += [sym.parsing.sympy_parser.parse_expr(reaction.rate)]
        return sym.Matrix(v)

    def get_values(self, perturbation, t_steps=100, perturb_species=None):
        #Performs the integration by substituting numerical values into the equations
        params, y0 = [], []
        for symbol in self.symbol_list:
            value=1
            default_values = self.model.param_info + self.model.init_info
            for default in default_values:
                if str(symbol) in default:
                    value = float(default.split("=")[1].strip())
            if str(symbol) == perturb_species:
                value += perturbation
            if str(symbol) in self.model.species:
                y0.append(value)
            elif str(symbol) in self.model.parameters:
                params.append([symbol, value])
        system = self.dydt.subs(params)
        def int_step(t, y):
            assert len(y) == len(self.model.species)
            replace = list(zip(self.symbol_list[:len(self.model.species)], y))
            return np.array(system.subs(replace)).flatten().astype(np.float64)
        t_eval = np.linspace(0, 100, num=t_steps, endpoint=False)
        integrated = solve_ivp(int_step, t_span=(0, 100), y0=y0, t_eval=t_eval)
        assert integrated.y.shape == (len(self.model.species), len(t_eval))
        return integrated

    def default_output_matrix(self):
        # Generates a default output matrix from the observable description
        observables = self.model.obs_desc
        output = np.zeros((len(observables), len(self.model.species)))
        indx_dict = {species: indx for indx, species in enumerate(self.model.species)}
        for i, observable in enumerate(observables):
            assert len(observable) == 2
            _, equation = observable
            re_find_u = r"(?<=u\[)(.+?)(?=\])"
            summands = re.split(r"\+|\-",equation)
            for summand in summands:
                spec = re.findall(re_find_u, summand)
                assert len(spec) == 1, "Observable equation is non-linear."
                spec = spec[0]
                output[i, indx_dict[spec]] = 1
        return output

    def apply_output(self, raw_sim, output_matrix=None):
        # Drops the non-observed species concentrations
        if output_matrix is None:
            output_matrix = self.default_output_matrix()
        return np.matmul(output_matrix, raw_sim.y)

    def run_perturbation(self, output_matrix=None, init_cond=True, const=None, perturbation=0.001, t_steps=100):
        # Performs unperturbed integration, perturbed integration and calculates sensitivities
        if init_cond:
            perturb = self.model.species + self.model.parameters
        else:
            perturb = self.model.parameters
        if const:
            perturb = [element for element in perturb if element not in const]
        base = self.apply_output(self.get_values(perturbation=perturbation, t_steps=t_steps))
        if output_matrix:
            num_obs = output_matrix.shape[0]
        else:
            num_obs = self.default_output_matrix().shape[0]
        sensitivity = np.zeros((num_obs*t_steps, len(perturb)))
        for i, param in enumerate(perturb):
            res = self.apply_output(self.get_values(perturbation, t_steps=t_steps, perturb_species=param))
            sensitivity[:, i] = ((res - base)/perturbation).flatten()
        assert sensitivity.shape == (num_obs * t_steps, len(perturb))
        return sensitivity

    def analyze_sensitivities(self):
        # Singular value decomposition of sensitivity matrix
        sensitivity = self.run_perturbation()
        U, E, Vstar = np.linalg.svd(sensitivity)
        return U, E, Vstar.T

def plot_sing_values(E):
    # Plots log10 transformed singular values
    log10E = np.log10(E)
    x = list(range(1, len(E) + 1))
    plt.stem(x, log10E, use_line_collection=True)
    plt.savefig("Singular_values")
    return

def plot_unid_components(V, index, ident_analys):
    # Plots given column of V
    column_vec = V[:, index]
    x = [str(symb) for symb in ident_analys.symbol_list]
    plt.stem(x, column_vec, use_line_collection=True)
    plt.savefig("Column_vector")

if __name__ == "__main__":
    import time
    start = time.time()
    test = ident_analysis("test.txt")
    U, E, V = test.analyze_sensitivities()
    end = time.time()
    print(f"Analysis took {end - start} seconds.")
    # plot_sing_values(E)
    #or
    # plot_unid_components(V, 7, test)

You can try it out with this simple unidentifiable text model:

A --> B
B --> P
A --> C
C --> P

@obs A: u[A]
@obs P: u[P]

If you add an observation for C the unidentifiability is resolved.

0 replies

formersbach · 2023-01-20T10:06:54Z

formersbach
Jan 20, 2023
Collaborator Author

Hey @himoto!
I would like to close this issue, since it's currently not planned to add this to biomass right?
In any case discussions is more fitting for this, sorry for opening it as an issue!

0 replies

himoto · 2023-01-20T10:24:43Z

himoto
Jan 20, 2023
Maintainer

No problem.
New idea to improve BioMASS is always more than welcome!
I would like to transfer this issue to Discussions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical structural identifiability analysis #234

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Numerical structural identifiability analysis #234

formersbach Sep 22, 2022 Collaborator

Replies: 4 comments

himoto Sep 22, 2022 Maintainer

formersbach Sep 28, 2022 Collaborator Author

formersbach Jan 20, 2023 Collaborator Author

himoto Jan 20, 2023 Maintainer

formersbach
Sep 22, 2022
Collaborator

himoto
Sep 22, 2022
Maintainer

formersbach
Sep 28, 2022
Collaborator Author

formersbach
Jan 20, 2023
Collaborator Author

himoto
Jan 20, 2023
Maintainer