GEMbench: Benchmark of Omics Data and Integration Methods

Integration Methods used in this Study

Since Covert and Palsson published their study on transcriptional regulation in constraints-based metabolic models in 2002, more than fifty methods have been developed to investigate how the integration of gene expression could affect the content and predictive accuracy of a GEM. In 2004, Akesson et al. used gene expression data as an additional constraint on the metabolic fluxes in yeast. Afterwards, different algorithms were devised e.g., GIMME, E-Flux, MBA, Moxley, MADE, RELATCH, INIT, mCADRE, tINIT, CORDA, E-Flux2, SPOT and FASTCORE. These algorithms differ in assumptions and mathematical formulations. However, they can be classified in different ways:

In 2014, Machado and Herrgård classified the methods regarding three main criteria:
- the gene expression levels (discrete vs continuous, absolute vs relative),
- the model functionality (e.g. predicting the growth rate), and
- the ability to build models.
In 2014, Kim and Lun distinguished between algorithms that require
- multiple gene expression datasets as input,
- gene expression threshold,
- an objective function, and
- validation of predicted fluxes.
In 2015, Estévez and Nikoloski grouped the methods in:
- GIMME- and GIM3E-like methods — methods with two-step approaches. First, a metabolic functionality (objective function) is optimized through FBA. Then, the obtained optimal value is employed to minimize the discrepancies between fluxes and data.
- iMAT-and INIT-like methods — methods, that determine the binary reaction status (i.e., active or inactive) which is in good agreement with the associate data.
- MBA-, mCADRE and fastCORE-like methods — methods, that define a core set of active reactions and then find the minimum essential set of reactions to satisfy the model consistency condition.

In this study we focussed on the following integration methods:

FASTCORE

Requires a core set of reactions.

The FASTCORE algorithm starts with a core set of reactions that forced to be active in the final model. Then, FACTCORE finds the minimum number of possible reactions to support this core. Every iteration of the algorithm computes a new sparse mode that aim to maximize the support of the mode inside the core set and minimize that quantity outside the core set.

GIMME

Requires one set of expression data
Requires thresholding
Requires objective function

The GIMME algorithm first runs FBA to calculate the maximum possible flux through the stated functionalities (Growth rate). Then, GIMME eliminates reactions whose mRNA transcription levels are below a given threshold. However, if the subsequent model is not functional (able to achieve the desired objective function), GIMME adds sets of the removed reactions back into the model to minimize deviation from the expression data.

INIT

Requires discretization of data (lowly, and highly expressed)
Requires optional minimum flux threshold for "expressed" reactions

The INIT algorithm maximizes the activation of certain reactions associated with highly expressed genes, while minimizing the utilization of reactions associated with absent proteins. One of the new features of INIT is the relaxation of the steady-state condition to allow the accumulation rate for internal species. In fact, this accumulation avoids the removal of the reactions that are essential for its synthesis.

iMAT

Requires one set of expression data
Requires discretization of data (lowly, moderately, and highly expressed)
Does not require an objective function

The iMAT algorithm categorizes gene expression data into three classes: highly, moderately, and lowly expressed genes. Then, iMAT solves a mixed integer linear programming (MILP) problem to maximize the reactions associated with the highly expressed genes and minimize the reactions associated with the lowly expressed genes. Like GIMME, the presence of reactions is allowed to result in a functional model. However, unlike GIMME, iMAT does not need an objective function. Instead, iMAT requires that highly expressed reactions carry a minimum flux.

In a Nutshell:	FASTCORE	GIMME	INIT	iMAT
Optimization	LP	LP	MILP	MILP
Function required
Omics required
Computational cost