Comparison of Methods for Data Integration

Information about the Metric

The AFR score represents the ratio of glycolytic to oxidative ATP flux. It was proposed by Yizhak et al. in 2014. In fact, AFR measures the extent of the Warburg effect, which is found to be highly positively associated with cancer cell migration.

The EOR score represents the ratio of the glycolytic versus oxidative capacity. It computes the fraction of extracellular acidification rate (i.e. lactate secretion flux value), over the oxygen consumption rate (i.e. oxygen consumption flux value) which was proposed by Yizhak et al. in 2014.

The Hallmark represents a set of genes, whose presence is considered a signature of cancer. It is a collection of 197 genes proposed by Liberzon et al. in 2015. Accordingly, a Hallmark score of 144 means that the used integration methods reports that 144 of these genes are active (i.e. the reactions facilitated by these genes have a non-zero flux).

BlandAltman represents the (dis-)agreement between two quantitative measurements. Here, the BlandAltman represents the p-value of two models reporting the same flux. Thus, it shows the disagreement between flux distributions of two models. If the value for two models is <0.05, these two models can be considered in disagreement.

The Jaccard index measures the similarity between finite sample sets. It is defined as the size of the intersection divided by the size of the union of the sample sets, see Fuxman Bass et al. 2013.

The Clusterability analyses the similarity of CGEMs: Can we distinguish groups of similar CGEMs? The plots below arrange the CGEMs according to the first two principal components, as reported by PCA applied on the FBA results. We used random forests to identify significant changes in the flux of reactions among the clusters and classified the reactions in cellular subsystems, as shown in the colored bar codes.

Disclaimer: The boxplots where computed while neglecting infinite values. A wavy line indicates a lot of outliers. Drawing them would be a torture for your browser and probably wouldn't help much. Click any box to get a list of outliers.

Download SVG

The barcodes below the plots represent the metabolic subsystems of Recon3. Colored segments show subsystems with significantly different fluxes among the clusters of CGEMs, as reported by the random forest analyses. The segments are in the following order:

The width of a colored segment in a barcode corresponds to the number of reactions with different fluxes. Grey segments represent subsystems that do not show significant changes between clusters of CGEMs.
Click one of the barcodes to get more details about the random forest results.

Correlation of

Click the title to get a larger version of the plot. Infinities were not considered in the correlation plots. scales from green to green over yellow. NaNs are white