23.3: Metabolic Flux Analysis - Biology

23.3: Metabolic Flux Analysis - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Metabolic flux analysis (MFA) is a way of computing the distribution of reaction fluxes that is possible in a given metabolic network at steady state. Once again, this analysis is independent of the particular biology of the system; rather, it will only depend on the (universal) stoichiometries of the reactions in question.

Mathematical Representation

Consider a system with m metabolites and n reactions. Let xi be the concentration of substrate i, so that the rate of change of the substrate concentration is given by the time derivative of xi . Let x be the column vector (with m components) with elements xi . For simplicity, we consider a system with m = 4 metabolites A, B, C, and D. This system will consist of many reactions between these metabolites, resulting in a complicated balance between these compounds.

Once again, consider the simple reaction A + 2B ( ightarrow) 3C. We can represent this reaction in vector form as (-1 -2 3 0). Note that the first two metabolites (A and B) have negative signs, since they are consumed in the reaction. Moreover, the elements of the vector are determined by the stoichiometry of the reaction, as in Section 2.1. We repeat this procedure for each reaction in the system. These vectors become the columns of the stoichiometric matrix S. If the system has m metabolites and n reactions, S will be a m n matrix. Therefore, if we define v to be the n-component column vector of fluxes in each reaction, the vector Sv describes the rate of change of the concentration of each metabolite. Mathematically, this can be represented as the fundamental equation of metabolic flux analysis:

[frac{d x}{d t}=S v onumber]

The matrix S is an extraordinarily powerful data structure that can represent a variety of possible scenarios in biological systems. For example, if two columns c and d of S have the property that c = d, the columns represent a reversible reaction. Moreover, if a column has the property that only one component is nonzero, it represents in exchange reaction, in which there is a flux into (or from) a supposedly infinite sink (or source), depending on the sign of the nonzero component.

We now impose the steady state assumption, which says that the left size of the above equation is identically zero. Therefore, we need to find vectors v that satisfy the criterion Sv = 0. Solutions to this equation will determine feasible fluxes for this system.

Null Space of S

The feasible flux space of the reactions in the model system is defined by the null space of S, as seen above. Recall from elementary linear algebra that the null space of a matrix is a vector space; that is, given two vectors y and z in the nullspace, the vector ay + bz (for real numbers a, b) is also in the null space. Since the null space is a vector space, there exists a basis bi, a set of vectors that is linearly independent and spans the null space. The basis has the property that for any flux v in the null space of S, there exist real numbers (alpha)i such that

[v=Sigma_{i} alpha_{i} b_{i} onumber]

How do we find a basis for the null space of a matrix? A useful tool is the singular value decomposition (SVD) [4]. The singular value decomposition of a matrix S is defined as a representation S = UEV*, where U is a unitary matrix of size m, V is a unitary matrix of size n, and E is a mxn diagonal matrix, with the (necessarily positive) singular values of S in descending order. (Recall that a unitary matrix is a matrix with orthonormal columns and rows, i.e. U * U = U U * = I the identity matrix). It can be shown that any matrix has an SVD. Note that the SVD can be rearranged into the equation (S v=sigma u), where u and v are columns of the matrices U and V and is a singular value. Therefore, if (sigma) = 0, v belongs to the null space of S. Indeed, the columns of V that correspond to the zero singular values form an orthonormal basis for the null space of S. In this manner, the SVD allows us to completely characterize the possible fluxes for the system.

Constraining the Flux Space

The first constraint mentioned above is that all steady-state flux vectors must be in the null space. Also negative fluxes are not thermodynamically possible. Therefore a fundamental constraint is that all fluxes must be positive. (Within this framework we represent reversible reactions as separate reactions in the stoichiometric matrix S having two unidirectional fluxes.)

These two key constraints form a system that can be solved by convex analysis. The solution region can be described by a unique set of Extreme Pathways. In this region, steady state flux vectors v can be described as a positive linear combination of these extreme pathways. The Extreme Pathways, represented in slide 25 as vectors bi, circumscribe a convex flux cone. Each dimension is a rate for some reaction. In slide 25, the z-dimension represents the rate of reaction for v3 . We can recognize that at any point in time, the organism is living at a point in the flux cone, i.e. is demonstrating one particular flux distribution. Every point in the flux cone can be described by a possible steady state flux vector, while points outside the cone cannot.

One problem is that the flux cone goes out to infinity, while infinite fluxes are not physically possible. Therefore an additional constraint is capping the flux cone by determining the maximum fluxes of any of our reactions (these values correspond to our Vmax parameters). Since many metabolic reactions are interior to the cell, there is no need to set a cap for every flux. These caps can be determined experimentally by measuring maximal fluxes, or calculated using mathematical tools such as diffusivity rules.

We can also add input and output fluxes that represent transport into and out of our cells (Vin and Vout). These are often much easier to measure than internal fluxes and can thus serve to help us to generate a more biologically relevant flux space. An example of an algorithm for solving this problem is the simplex algorithm [1]. Slides 24-27 demonstrate how constraints on the fluxes change the geometry of the flux cone. In reality, we are dealing with problems in higher dimensional spaces.

Linear Programming

Linear programming is a generic solution that is capable of solving optimization problems given linear constraints. These can be represented in a few different forms.

Canonical Form :

• Maximize: (c^{T} x)
• Subject to: (A x leq b)

Standard Form :

• Maximize (Sigma c_{i} * x_{i})
• Subject to (a_{i j} X_{i} leq b_{i} ext { foralli, } j)
• Non-negativity constraint: (X_{i} geq 0)

A concise and clear introduction to Linear Programming is available here: www.purplemath. com/modules/linprog.htm The constraints described throughout section 3 give us the linear programming problem described in lecture. Linear programming can be considered a first approximation and is a classic problem in optimization. In order to try and narrow down our feasible flux, we assume that there exists a fitness function which is a linear combination of any number of the fluxes in the system. Linear programming (or linear optimization) involves maximizing or minimizing a linear function over a convex polyhedron specified by linear and non-negativity constraints.

We solve this problem by identifying the flux distribution that maximizes an objective function:

The key point in linear programming is that our solutions lie at the boundaries of the permissible flux space and can be on points, edges, or both. By definition however, an optimal solution (if one exists) will lie at a point of the permissible flux space. This concept is demonstrated on slide 30. In that slide, A is the stoichiometric matrix, x is the vector of fluxes, and b is a vector of maximal permissible fluxes.

Linear programs, when solved by hand, are generally done by the Simplex method. The simplex method sets up the problem in a matrix and performs a series of pivots, based on the basic variables of the problem statement. In worst case, however, this can run in exponential time. Luckily, if a computer is available, two other algorithms are available. The ellipsoid algorithm and Interior Point methods are both capable of solving any linear program in polynomial time. It is interesting to note, that many seemingly dicult problems can be modeled as linear programs and solved eciently (or as eciently as a generic solution can solve a specific problem).

In microbes such as E. coli, this objective function is often a combination of fluxes that contributes to biomass, as seen in slide 31. However, this function need not be completely biologically meaningful.

For example, we might simulate the maximization of mycolates in M. tuberculosis, even though this isnt happening biologically. It would give us meaningful predictions about what perturbations could be performed in vitro that would perturb mycolate synthesis even in the absence of the maximization of the production of those metabolites.Flux balance analysis (FBA) was pioneered by Palssons group at UCSD and has since been applied to E. coli, M. tuberculosis, and the human red blood cell [? ].

Studying metabolic flux adaptations in cancer through integrated experimental-computational approaches

The study of tumorigenic rewiring of metabolic flux is at the heart of cancer metabolic research. Here, we review two widely used computational flux inference approaches: isotope tracing coupled with Metabolic Flux Analysis (13C-MFA) and COnstraint-Based Reconstruction and Analysis (COBRA). We describe the applications of these complementary modeling techniques for studying metabolic adaptations in cancer cells due to genetic mutations and the tumor microenvironment, as well as for identifying novel enzymatic targets for anti-cancer drugs. We further highlight the advantages and limitations of COBRA and 13C-MFA and the main challenges ahead.

Effect of feed and bleed rate on hybridoma cells in an acoustic perfusion bioreactor: Metabolic analysis

  • APA
  • Author
  • Harvard
  • Standard
  • RIS
  • Vancouver

In: Biotechnology Progress , Vol. 23, No. 3, 2007, p. 560-569.

Research output : Contribution to journal › Article › Academic › peer-review

T1 - Effect of feed and bleed rate on hybridoma cells in an acoustic perfusion bioreactor: Metabolic analysis

N2 - For the development of optimal perfusion processes, insight into the effect of feed and bleed rate on cell growth, productivity, and metabolism is essential. In the here presented study the effect of the feed and bleed rate on cell metabolism was investigated using metabolic flux analysis. Under all tested feed and bleed rates the biomass concentration as calculated from the nitrogen balance (biomass-nitrogen) increased linearly with an increase in feed rate, as would be expected. However, depending on the size of the feed and bleed rate, this increase was attained in two different ways. At low feed and bleed rates (Region I) the increase was obtained through an increase in viable-cell concentration, while the cellular-nitrogen content remained constant. At high feed and bleed rates (Region II) the increase was attained through an increase in cellular-nitrogen content, while the cell concentration remained constant. Per gram biomass-nitrogen, the specific consumption and production rates of the majority of the nutrients and products were identical in both regions, as were most of the fluxes. The major difference between the two regions was an increased flux from pyruvate to lactate and a decreased flux of pyruvate toward citrate in region II. The decreased in-flux at the level of citrate can either be balanced by a decreased out-flux toward lipid biosynthesis leading to a lower fraction of lipids in the cell, by a decreased out-flux toward the citric acid cycle resulting in a decreased energy generation, or by a combination of these. Finally, the specific productivity increases less than the nitrogen content per cell in region II, which implies that for obtaining maximum production rates it is important to increase the cell density and not only the biomass density.

AB - For the development of optimal perfusion processes, insight into the effect of feed and bleed rate on cell growth, productivity, and metabolism is essential. In the here presented study the effect of the feed and bleed rate on cell metabolism was investigated using metabolic flux analysis. Under all tested feed and bleed rates the biomass concentration as calculated from the nitrogen balance (biomass-nitrogen) increased linearly with an increase in feed rate, as would be expected. However, depending on the size of the feed and bleed rate, this increase was attained in two different ways. At low feed and bleed rates (Region I) the increase was obtained through an increase in viable-cell concentration, while the cellular-nitrogen content remained constant. At high feed and bleed rates (Region II) the increase was attained through an increase in cellular-nitrogen content, while the cell concentration remained constant. Per gram biomass-nitrogen, the specific consumption and production rates of the majority of the nutrients and products were identical in both regions, as were most of the fluxes. The major difference between the two regions was an increased flux from pyruvate to lactate and a decreased flux of pyruvate toward citrate in region II. The decreased in-flux at the level of citrate can either be balanced by a decreased out-flux toward lipid biosynthesis leading to a lower fraction of lipids in the cell, by a decreased out-flux toward the citric acid cycle resulting in a decreased energy generation, or by a combination of these. Finally, the specific productivity increases less than the nitrogen content per cell in region II, which implies that for obtaining maximum production rates it is important to increase the cell density and not only the biomass density.


Genome sequencing, assembly, and annotation

The DNA for genome sequencing of wintersweet was obtained from an accession planted in the campus of Huazhong Agricultural University. DNA was extracted and sequenced by combining three different sequencing methods that include Illumina HiSeq, 10X Genomics, and PacBio SMRT sequencing. A total of 76.96 Gb of PacBio long reads were achieved (Additional file 1: Table S1), approximately 98.83-fold high-quality sequence coverage of the 778.71 Mb genome (size estimated by k-mer frequency analysis) (Additional file 2: Fig. S1a and Additional file 1: Table S2). Flow cytometry determined an estimated haploid genome size of 805.88 Mb (Additional file 2: Fig. S1b), which was consistent with the k-mer method. After interactive error correction, the PacBio reads were assembled into primary contigs using FALCON [18]. The primary generated contigs were then polished with Quiver, yielding 1623 contigs with an N50 length of 2.19 Mb (Table 1). The sequence error correction of the final contigs were performed using 36.48 Gb (46.85X) Illumina short reads by pilon [19]. The consensus sequences were further scaffolded by integrating with 156.26 Gb (200.67X) 10X Genomics linked reads (Additional file 1: Table S1). The final assembly consists of 1259 scaffolds totalling 695.31 Mb with a scaffold N50 size of 4.49 Mb, covering 89.2% of the genome size estimated by genome survey (Table 1 and Additional file 1: Table S3). In order to improve the assembly, we used 93 × Hi-C data to assist the assembly correction and anchored 1027 of 1259 scaffolds into 763 super-scaffolds (Additional file 1: Table S4 and S5). All the super-scaffolds were accurately clustered and ordered into 11 pseudochromosomes (Additional file 2: Fig. S2), covering 99.42% of the original 695.31 Mb assembly, with a super-scaffold N50 of 65.35 M and a maximum scaffold length of 85.71 Mb (Additional file 1: Table S5). The number of groups corresponded well with the experimentally determined number of chromosomes in somatic cells (2n = 22) (Additional file 2: Fig. S3). In addition, 185.93 Gb Illumina sequence data was also generated and used to assemble the Calycanthus chinensis (a close relative of wintersweet belonging to the same family) genome (Additional file 1: Table S1). The size of the assembled C. chinensis draft genome was 767.4 Mb, representing

92.78% of estimated genome size (Additional file 1: Table S2), with 291,991 contigs (N50 = 38.7 kb) and 241,923 scaffolds (N50 = 20.34 Mb) respectively (Additional file 1: Table S3).

To assess the genome assembly quality, we performed BUSCO and CEGMA analysis and found that 95% and 92.74% complete eukaryotic conserved genes were identified in wintersweet genome respectively (Additional file 1: Table S6), suggesting a high degree of completeness of the final assembly. In addition, the high-quality short reads generated from Illumina were mapped to the assembled genome, which exhibits excellent alignments with a mapping rate of 99.95%. Taken together, the above results indicate a high degree of contiguity and completeness of the wintersweet genome.

Based on de novo and homology-based predictions and transcriptome data, a total of 23,591 protein-coding genes were predicted with an average length of 9017 bp and an average CDS length of 1250 bp, which were comparable to that in Amborella and Lotus (Additional file 1: Table S7). The spatial distribution of these protein-coding genes along the chromosome was uneven with higher densities located at the ends of the chromosomal arms (Fig. 1b). A total of 21,940 (93.1%) predicted protein-coding possessed functional annotation (Additional file 1: Table S8). A total of 2749 non-coding RNAs (ncRNAs) including 245 ribosomal RNAs (rRNAs), 567 transfer RNAs (tRNAs), 909 microRNA, and 1028 small nuclear RNAs (snRNAs) (Additional file 1: Table S9) were also identified.

Comparative evolutionary analyses of wintersweet and other typical flowering plant species

The expansion or contraction of gene families has a profound role in driving phenotypic diversity and adaptive evolution in flowering plants [20]. In comparison with gene families in its relative species C. chinensis, wintersweet exhibited significant enrichment and reduction of 12 and 45 gene families respectively (Fig. 2a). KEGG functional enrichment analysis of the expanded gene families demonstrates that they were mainly assigned in “Sesquiterpenoid and triterpenoid biosynthesis,” “Monoterpenoid biosynthesis,” “Flavonoid biosynthesis,” and “Phenylpropanoid biosynthesis” pathways (Additional file 2: Fig. S4a and Table S10), which are responsible for the major trait (strong fragrance) specific to wintersweet.

Evolution of the wintersweet genome and gene families. a Phylogenetic tree of 17 plant species. The blue numbers denote divergence time of each node (MYA, million years ago), and those in brackets are 95% confidence intervals for the time of divergence between different clades. The red numbers on the branch represent bootstrap value. The pie diagram on each branch of the tree represents the proportion of gene families undergoing gain (red) or loss (green) events. The numbers below the pie diagram denote the total number of expansion and contraction gene families. Basal angiosperm (Ba). b The distribution of single-copy, multiple-copy, unique, and other orthologs in the 17 plant species. c Venn diagram represents the shared and unique gene families among five closely related magnoliids (C. praecox, C. kanehirae, L. chinense, P. nigrum, and C. chinensis). Each number represents the number of gene families

Defining the relationship of gene families among flowering plant species has been a powerful approach in investigating the genetic basis of plant evolution. Based on pairwise sequence similarities, we applied the predicted proteomes of wintersweet and 16 other sequenced species to identify putative orthologous gene clusters. A total of 37,137 orthologous gene families composed of 554,042 genes were identified from 17 plant species, of which 5339 clusters of genes were shared by all investigated species, representing ancestral gene families (Fig. 2b). On the other hand, 8733 gene families were present across wintersweet, C. chinensis, L. chinensis, and C. kanehirae, which most likely represent the “core” proteome of the magnoliids (Fig. 2c). There are 339 gene families containing 507 proteins specific to the wintersweet genome (Additional file 1: Table S11). Gene Ontology (GO) term enrichment analyses of wintersweet-specific genes revealed that the functional categories termed “oxidoreductase activity” and “pectinesterase activity” involved in metabolism were enriched (Additional file 2: Fig. S4b).

Repetitive content and recent burst of LTR retrotransposons

In the wintersweet genome, repetitive elements occupied 45.73% of the genome, of which 96.69% were annotated as transposable elements (TEs) (Additional file 1: Table S12). Long terminal repeat retrotransposons (LTRs) were the major class of TEs that accounts for 36.2% of the assembly. Among the LTRs, the LTR/Gypsy elements were the most abundant, composing 23.3% of the genome, followed by LTR/Copia elements (8.6%, Additional file 1: Table S12). Besides the main groups of LTR elements, 3.65% of the genome was annotated as DNA elements and 3.45% as long interspersed nuclear elements, whereas the rest were assigned to other repeat families or could not be assigned (Additional file 1: Table S12). Transposable elements are unevenly distributed across the chromosomes and found to be particularly abundant in centromeric regions (Fig. 1b). Further comparative analysis of the distribution of TEs indicated a higher proportion in intergenic regions (79.19%) when compared to genic regions (16.04%) and regions adjoining genes (4.77%) (Additional file 2: Fig. S5a). Within genic regions, the TEs exhibited unequal distribution between exons and introns. 98.98% of TEs in the genic regions occurred in introns and constituted 25% of the total length of introns (Additional file 2: Fig. S5b). Comparison of gene structure with other species revealed that the average length and number of exons is similar, while the average length of introns is slightly longer and to some extent can be attributed to repeat accumulation. Moreover, the time of the LTR-RT burst in wintersweet was estimated using the 8812 putative complete LTR-RTs and revealed a peak substitution rate at around 0.03 (Additional file 2: Fig. S6). We assumed a mutation rate of 1.51 × 10 − 9 per base per year [14], resulting in an insertion time of approximately 9.9 Ma.

In order to investigate the evolution of TEs in Magnoliids, phylogenetic trees of domains in reverse transcriptase genes were constructed for both Ty1/Copia and Ty3/Gypsy superfamily. In the phylogenetic tree of Ty3/Gypsy superfamily, the majority of LTR-RTs from wintersweet were clustered into the tork clade (Additional file 2: Fig. S7a). Compared with L. chinensis and C kanehirae, the LTR-RTs in wintersweet and C. chinensis exhibited higher diversity and abundance within the tork clade, indicating greater expansion and divergence in wintersweet and C. chinensis genome. The Copia superfamily displayed a different pattern, with four major clades consisting of elements from all these four species (Additional file 2: Fig. S7b), suggesting a conserved evolution pattern of the Copia superfamily, as described previously [21, 22].

Phylogenomic placement of Magnoliids sister to eudicots

The phylogenetic relationships of Magnoliids, monocots, and eudicots have been somewhat controversial in plant taxonomy. In an effort to infer the phylogenetic position of the Magnoliids relative to monocots and eudicots, a set of 213 evaluated single-copy ortholog sets (OSCG) were first identified with OrthoMCL [23] using genome data from 17 flowering plant species that includes 5 monocots, 6 eudicots, 5 magnoliids, and 1 basal angiosperm. We applied both coalescent and concatenation approaches to reconstruct phylogenetic trees using the 213-gene dataset. Both coalescent and concatenation analyses yielded an identical highly supported topology with magnoliids as a sister group to eudicots after their divergence from monocots (Fig. 2a and Additional file 2: Fig. S8a). To avoid the potential errors in ortholog identification, we also used SonicParanoid [24] to extract single-copy genes (SSCG) from the 17 plant genomes described above. Only those genes sampled from at least 14 species were utilized for the construction of phylogenetic trees. On the basis of 216 single-copy genes, the phylogenetic trees were then similarly inferred by both coalescent and concatenation methods as those described above. The resulting species trees were topologically identical to the phylogenetic findings revealed by OrtholMCL described above (Additional file 2: Fig. S8b).

Although the same set of phylogenetic relationships among Magnoliids, monocots, and eudicots was consistently recovered, the topological conflicts were also observed among coalescent-based gene trees (Additional file 2: Fig. S8c). To estimate the discordance among gene trees in OSCG and SSCG datasets, we took advantage of the quartet score in ASTRAL [25] to display the proportions of gene trees in support of three different branching orders for Magnoliids, monocots, and eudicots (Additional file 2: Fig. S8d) and found that the percentages of gene trees supporting Magnoliids and eudicots together forming a sister group with monocots is higher than the other two topologies (Additional file 2: Fig. S8c). However, in the phylogenetic analyses of a concatenated sequence alignment of 38 chloroplast single-copy genes for 26 taxa, the magnoliids were placed as a sister group to the clade consisting of eudicots and monocots (Additional file 2: Fig. S9). Furthermore, the short phylogenetic branches among magnoliids, eudicots, and monocots clades, representing rapid speciation events, were also observed in these phylogenetic genes. The phylogenetic incongruence between nuclear and plastid genomes may be caused by incomplete lineage sorting (ILS), which appears more frequently during the rapid divergence of early mesangiosperms. As the inadequate taxon sampling could result in incongruent phylogeny, we improved the taxon sampling by adding additional genome data from 11 phylogenetically pivotal species and a transcriptome data set of chloranthales to reconstruct the phylogenetic tree. This approach recovered the same phylogenetic relationships among magnoliids, eudicots, and monocots (Additional file 2: Fig. S10). Thus, from these results, we believe the phylogenetic relationship proposed in our study is relatively accurate under the current dataset. Based on the high-confidence phylogenetic tree and calibration points selected from articles and TimeTree website, the divergence time between the magnoliids and the eudicots were estimated to be 113.0–153.1 Ma (95% confidence interval) (Fig. 2a), which overlaps with three recent estimates (114.6–164 Ma, 117–189 Ma, and 136.0–209.4 Ma) [13, 25, 26].

Whole-genome duplication and genome evolution analysis

Whole-genome duplication (WGD) has long been regarded as the major driving force in plant evolution [27]. To investigate WGD events during the evolutionary course of wintersweet, we first searched for genome-wide duplications and assigned them into four different modes with MCScanX analysis (Additional file 2: Fig. S11). The WGD/segmental duplication was identified as the dominant type that includes 4511 paralogous gene pairs in 265 syntenic blocks. Among these syntenic blocks, 36.09% were found to share relationships with three other blocks across the genome (Additional file 2: Fig. S12). The widespread synteny and well-maintained one-versus-three syntenic blocks suggest that two WGD events might have occurred during wintersweet genome evolution. It is well accepted that Amborella is a single living species that is the sister lineage to all other groups within the angiosperms, and there is no evidence of lineage-specific polyploidy after it diverged from the last common ancestor of angiosperms [28]. Collinearity and synteny analysis between the wintersweet and Amborella genome also provided clear structural evidence for two WGDs in wintersweet with a 1:4 syntenic depth ratio in Amborella-wintersweet comparison (Fig. 3a). To further elucidate the polyploidy of wintersweet genomes, we performed a comparative genomic analysis of wintersweet with C. kanehirae and L. chinensis. Syntenic depth ratios of 4:4 and 2:4 were inferred in the wintersweet-Cinnamomum and wintersweet-Liriodendron comparisons, respectively (Fig. 3b). Based on the syntenic relationships between and within each species, our analyses collectively indicate that wintersweet underwent two WGD events.

Comparative genomics analyses. a Synteny patterns between genomic regions from wintersweet and Amborella. This pattern shows that some typical ancestral regions in the basal angiosperm Amborella have four corresponding copy regions in wintersweet. This collinear relationship is highlighted by one syntenic set shown in red and green colors. b Syntenic blocks between genomes. Dot plots of orthologs show a 4–4 chromosomal relationship between the wintersweet genome and C. kanehirae genome, and 2–4 chromosomal relationship between wintersweet genome and L. chinense genome. c Distribution of synonymous substitution levels (Ks) of syntenic orthologous (dashed curves) and paralogous genes (solid curves) after evolutionary rate correction. d Evolutionary model of the Laurales genomes. The Laurales ancestral chromosomes are represented by ten colors. Polyploidization events are shown by 3 dots of different colors, along with the chromosome fusions (Fu) and fissions (Fi). The modern structure of the Laurales genomes is illustrated at the bottom of the figure. In some regions, we could not determine which ancestral chromosome they derived from, and those regions were represented as white spaces

To estimate the timing of the two WGD events in the wintersweet genome, we characterized synonymous substitutions on synonymous nucleotide sites (Ks) between collinear homoeologs within or between wintersweet and other three species including C. chinensis, Cinnamomum kanehirae, and Liriodendron chinensis from Magnoliids. The Ks distributions of one-to-one orthologs identified between Amborella and the other four species show different Ks peaks, suggesting divergent evolutionary rates among these four species (Additional file 2: Fig. S13). After correction for evolutionary rate [29], the synonymous substitutions per site per year as 4.21 × 10 − 9 for Laurales were calculated using the mean Ks values of syntenic blocks, resulting in the estimated time of the WGD event at approximately 77.8 million and 112.1 million years ago (Ma), respectively (Fig. 3c). Previous analysis of the genome of Cinnamomum suggested that the ancient WGD event seems shared by Magnoliales and Laurales [13], and the absolute dating of the identified WGD events in Liriodendron tulipifera also supported this hypothesis [14]. In our study, we also detected two and one polyploidization events in Cinnamomum and Liriodendron respectively, but no common WGD event was shared by these two species. Furthermore, the wintersweet genome shares an ancient WGD event with Cinnamomum but not with Liriodendron. Moreover, the trees of the syntenic gene groups of wintersweet and Liriodendron vs. Amborella indicated that wintersweet and Liriodendron experienced a WGD event respectively after their divergence from a common ancestor (Additional file 2: Fig. S14 and Additional file 3: Supplementary Note 3). Thus, from these results, we conclude that the ancient wintersweet WGD event has occurred before the divergence of Calycantaceae and Lauraceae but after the divergence of Calycantaceae and Magnoliaceae.

We also used orthologous and paralogous genes derived from the intergenomic and intragenomic analysis of the wintersweet and C. chinensis as well as C. kanehirae genomes to construct a putative ancestral genome of the Laurales, and proposed an evolutionary scenario where these three lineages were derived from a putative ancestor (Fig. 3d and Additional file 2: Fig. S15), which consisted of ten chromosomes and 4216 genes. This ancestor went through a WGD event to reach a 20-chromosome intermediate and then experienced chromosomal rearrangements to form present-day karyotypes. In wintersweet, all the chromosomes underwent rearrangements and every chromosome came from at least two ancient chromosomes. A minimum of 49 chromosomal fissions and 48 chromosomal fusions were predicted to have occurred in wintersweet to reach its current structure of 11 chromosomes (Fig. 3d).

Genetic basis of floral transition, floral organ specification, and early blooming in winter

Wintersweet is one of the perennial trees that bloom in the deep winter. It took approximately 10 months for C. praecox to complete its reproductive development. To investigate this whole process of the flower development that may influence the final flowering time, we first performed a systematic study on the floral ontogeny and developmental patterns by paraffin sections through observation. The results indicated that the floral bud was initiated in April, floral patterning and floral organ specification occurred from April to July, slow growth in summer, the male and female gametophytes were formed in October and December respectively, the flower bud transitioned into dormancy, then break occurred in December, and the flower bloomed in deep winter (Fig. 4a). To investigate the molecular mechanisms underlying the critical flower developmental stages, we generated and analyzed RNA-seq data for representative flower developmental stages from the timing of floral initiation to maturation.

Schematic depiction of the key developmental stages of flower bud and the analysis of floral organ identity and flowering-time related genes. a Flower developmental stages including floral transition, meristem specification, floral organ specification, floral bud dormancy and release, and blooming. Abbreviations for the flower bud developmental stages: undifferentiated flower bud stage (FBS1) flower primordium formation stage (FBS2) tepal primordium formation stage (FBS3) stamen primordium formation stage (FBS4) pistil primordium formation stage (FBS5) flower organ development and differentiation stage (FBS6) slow growth stage (FBS7) ovule appearance stage (FBS8) pollen formation stage (FBS9), blooming in winter. ap, shoot apex fp, floral primordium t, tepals s, stamens p, pistil a, anther o, ovule po, pollen es, embryonic sac. b The expression patterns of genes related to flowering time and floral organ identity at different flower developmental stages. c The phylogenetic tree of MADS-box genes and gene expression patterns of B-/C-function genes from various floral organs. The MADS-box proteins in wintersweet are marked by the yellow box

Floral initiation is controlled by the spatial and temporal expression of flowering-time-related genes in multiple pathways [11]. Many genes from these pathways have been identified and characterized in various herbaceous and perennial species and reported to have the conserved function [30,31,32]. A database of flowering-time gene networks was recently constructed in Arabidopsis thaliana [30]. Taking advantage of this database, we identified 594 flowering-time genes in eight pathways (Additional file 1: Table S13). Analysis of RNA-seq data shown that during the floral transition the flowering-time genes related to gibberellin biosynthesis and signaling transduction pathway were significantly activated (padj < 0.01), and the expression of some genes in photoperiodic and circadian clock pathways were also upregulated (Fig. 4b and Additional file 1: Table S14) suggesting the endogenous hormone (gibberellin) and environmental factor (photoperiod) may play a major role in the switch from vegetative to reproductive growth in spring.

After floral patterning and floral organ specification from April to July, the floral organ development processes slowly from Summer to Autumn, during which the temperature is very high and maximum temperature could reach up to 39 °C (Fig. 4a and Additional file 1: Table S15). Therefore, the temperature may be a key factor that affects the flower organ slow development. The direct reflection is the significantly increased expression level of heat shock protein genes (Fig. 4b). In addition, comparing with other developmental stages, many genes associated with cell division downregulated significantly (Fig. 4b and Additional file 1: Table S16). Heat stress transcriptional factors (HSFs) and heat responsive genes play an essential role in heat stress response [33]. We have found 21 members in HSF family in C. praecox. Six HsfA1s, which serve as the master transcriptional regulators in the heat stress response, were identified (Fig. 4b and Additional file 1: Table S17). The DEHYDRATION-RESPONSIVE ELEMENT BINDING PROTEIN2A (DREB2A) regulated by the HsfA1s. Both HsfA1-1 and DREB2A-1 displayed an opposite expression pattern with the genes related to cell division (Fig. 4b and Additional file 1: Table S15). The DRE sequence (CTAGA motif) which is recognized by DREB2A was also detected in the promoter of the cell division genes (Additional file 2: Fig. S16a). These results may suggest that the heat signals can be integrated into transcriptional regulatory networks by the HSFs then to regulate expression of the genes related to the cell division, finally resulting in the slow growth of flower organs.

In total, 58 MADS-box genes were identified in the wintersweet genome, 31 of which are MIKC-type MADS-box genes (Additional file 1: Table S13). Phylogenetic and collinearity analyses of these genes indicated that the homologs of ABCE model prototype genes, except for AP1/FUL, were all found to be duplicated (Fig. 4c). Four AGL6s, generated by WGD events, were identified in wintersweet assembly, the number of which is larger than that in Arabidopsis and rice. Among these genes, the CpAGL6a has been reported to promote flowering when overexpressed in Arabidopsis [34]. Meanwhile, the FLOWERING LOCUS C (FLC), which serves as a flowering repressor [35], was lost in wintersweet (Fig. 4c). The selective expanded promoter and loss of repressor of flowering-time related genes may be associated with the earlier flowering of wintersweet. The morphology between the inner and outer perianth in wintersweet displayed a slight difference, which is the same as in some basal eudicots (Ranunculus and Aquilegia). “Sliding boundary” model was proposed to explain this morphology in Ranunculus, in which the B-function homologs are expressed in those whorls that produce petaloid organ [36]. The B-function homologs in wintersweet displayed broad expression pattern, with AP3b preferably expressed in the outer perianth whorl (Fig. 4c), which supported the “sliding boundary” model and may complement the absence of a clear morphological distinction between sepals and petals in wintersweet. The strong expression of C-function homologs is limited to stamen and carpel, which suggests that these genes have a conserved function in stamen and carpel specification. The expression profiles of wintersweet ABCE homologs during the flower development largely agree with the gradual formation of the floral organs they specify (Fig. 4b).

The relative earlier flowing time in winter suggested the shorter chilling requirement for dormancy release and earlier bud break in wintersweet. The members of the SHORT VEGETATIVE PHASE (SVP) clade of the MADS-box gene family, including SVP and DAM genes, are well known to be associated with dormancy release and bud break [37, 38]. Two homologs of SVP genes were identified in the wintersweet genome. Phylogenetic analysis of SVPs revealed that these two genes cluster close to PtSVL (Additional file 2: Fig. S16b), the function of which had been characterized as a repressor in the genetic network of temperature-mediated vegetative bud break in hybrid aspen [39]. The downstream genes of PtSVL in the network including TEOSINTE BRANCHED1, CYCLOIDEA, PCF/BRANCHED1 (TCP18/BRC1), and FLOWERING LOCUS T (FT), which function as negative and positive regulators of temperature-mediated control bud break, were also identified in the wintersweet genome (Additional file 2: Fig. S16c). During the transition from endodormancy to flush stage, the increase in expression of CpFT1 and downregulation of CpTCP18/BRC1 and CpSVL1 were noted (Additional file 2: Fig. S16d). Gibberellin acid (GA) and abscisic acid (ABA) acts as positive and negative regulators of bud break respectively [40], and the content of which to some extent is associated with the expression level of biosynthesis and catabolism genes. The increased expression of GA biosynthesis genes such as GA20 oxidase and the decreased expression of ABA biosynthesis genes such as the NCED (Additional file 2: Fig. S16d) was also observed at the bud break stage, which may coincide with their role in bud break.

Genetic basis of strong cold resistance

Wintersweet is one of the perennial trees that bloom in the deep winter, during which the temperature always falls below the freezing point. Therefore, wintersweet has evolved a systematic mechanism to withstand cold stress. Volatile glycosylation is a common form of plant volatile compounds and plays an important role in response to abiotic stress in plants [41]. Recent studies revealed that the volatile terpene glucosylation mediated by UDP-glycosyltransferases (UGTs) was involved in the modulation of cold stress tolerance in tea plants [42]. In wintersweet, abundant volatiles were present in glycosidically bound forms, such as linalool glucoside, benzaldehyde benzyl alcohol (Additional file 2: Fig. S17). The considerable expansion in the UGT family (Additional file 1: Table S10) and abundant terpene glycosides in wintersweet flowers lead to the hypothesis that the strong cold tolerance of wintersweet, to some extent, is related to the volatile glucosylation.

Evolution of terpene biosynthesis and regulation-related genes

Monoterpenes are the major components of floral volatile organic compounds (VOCs) in wintersweet, especially the linalool, which accounts for more than half of the floral scent [5]. In plants, monoterpenes/diterpenes and sesquiterpenes are usually generated via the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway and the mevalonate (MVA) pathway, respectively [43]. A total of 46 genes in these two pathways were identified (Additional file 1: Table S18). The key genes involved in the MEP pathway such as 1-deoxy- d -xylulose 5-phosphate synthase (DXS), 1-deoxy- d -xylulose 5-phosphate reductoisomerase (DXR), and isopentenyl diphosphate isomerase (IDI) were generated through WGD events (Fig. 5a). The high rate of paralog generation in these genes could increase the efficiency of catalytic reaction through dosage effects, thereby increasing the metabolic flux toward the MEP pathway. Terpene synthases (TPSs) are the enzymes responsible for the last catalytic reaction in the MVA and MEP pathway to generate terpenoid compounds. With the aid of the assembly genome, a total of 52 complete CpTPSs were identified (Additional file 1: Table S19), the number of which is approximately double that detected by transcriptomics in our previous study [5]. Phylogenetic analysis of TPS from four species revealed that CpTPSs were clustered into five of six subfamilies described for land plants (Fig. 5b). The majority of CpTPSs were placed in the TPS-a (18) and TPS-b (24) subfamilies, which is predominantly composed of angiosperm-specific sesquiterpene and monoterpene synthases respectively [44]. Comparative genomics analysis revealed that the TPS genes are significantly expanded, especially in the TPS-b subfamily (Fig. 5b and Additional file 1: Table S10). These lineage-specific gene expansions in the TPS-b subfamily may contribute to the monoterpene accumulation in floral VOCs in wintersweet.

Terpenes biosynthesis in wintersweet. a Expression profiles of genes encoding enzymes possibly involved in monoterpene and sesquiterpene biosynthesis. Abbreviations for enzymes in each catalytic step are shown in bold. The gradient color for each gene represents the gene expression levels in three floral developmental stages (S1: bud stage S4: full open flower stage S5: senescence stage). Homologous genes are represented by equal colored horizontal stripes and are termed from top to bottom in Arabic numerical order. The full names of enzymes are listed in the Additional file 1: Table S18. The genes circled in green box were generated by WGD event. The non-boxed genes did not undergo this event. b Phylogeny of TPS proteins identified in wintersweet and 5 other sequenced plant genomes showing the subfamilies from a–g. c The expression of CpTPS genes in three different floral development stages. These genes were expressed in at least one of the three developmental stages. d The chromosomes with more than one CpTPS gene. The red diamonds represent the functionally characterized genes. The green diamonds represent the genes generated by tandem duplication events. The cure with arrow linked two duplicated genes. e Identification of enzymatic products after incubating recombinant CpTPSs proteins with geranyl diphosphate (GPP)/farnesyl diphosphate (FPP). The volatile terpenes were analyzed by GC-MS analysis and comparing with authentic standards. f Increased linalool biosynthesis in the CpTPS4-overexpressed tobacco compared with wild type (WT) and empty vector control (EV). Data represent the mean ± SDs of three biological replicates

Expression analysis of the 52 CpTPS genes by RNA-seq revealed that six genes displayed similar expression patterns with the emission of major monoterpenes (Fig. 5c and Additional file 2: Fig. S18). Based on the expression pattern and phylogenetic analysis, we further selected three genes from TPS-b/g subfamily for functional characterization and found that all the genes encoded versatile enzymes with multiple products (Fig. 5e). Subcellular localization analysis showed that CpTPS4 and CpTPS9 were localized to the plastid whereas CpTPS42 was targeted to the cytosol (Additional file 2: Fig. S19). CpTPS42 was shown to be a sesquiterpene synthase, which mainly catalyzed the formation of nerolidol, together with other sesquiterpenes (Fig. 5e). CpTPS4 and CpTPS9 are both monoterpene synthases and produce β-pinene and linalool as its main product respectively (Fig. 5e). To further understand the function of CpTPS4, we also overexpressed the gene in tobacco. Enhanced levels of the monoterpenes including linalool, limonene, β-ocimene, and trans-β-ocimene were found in transgenic tobacco leaves in comparison with the wild type control (Fig. 5f). These results indicated that CpTPS4 plays a primary role in the biosynthesis of linalool, the main components of floral scent.

The available genome assembly allows for CpTPSs to be localized to either chromosomes or scaffold positions to consider a genomic context. The CpTPS genes are not uniformly distributed throughout the chromosomes with 44 genes located on six chromosomes and eight genes on seven scaffolds (Fig. 5d). Fourteen of the 52 CpTPS genes have at least two copies and each duplicated gene copy was located adjacent to the other. For example, CpTPS4 is located on scaffold662 and has three copies including CpTPS17, CpTPS18, and CpTPS19. These three genes were arranged as a tandem array on chromosome 6 and highly expressed at the full open flower stage, which may have a similar function as CpTPS4 and contribute equally to linalool production.

Terpenoid formation does not only depend on the biochemical properties of enzymes encoded by CpTPS genes but also requires the involvement of transcription factors (TFs). A total of 1313 TFs that show differential expression during petal development have been identified. Among these, 99 display positive correlation with the emission of terpenes and are predominantly distributed in MYB, bHLH, WRKY, and bZIP families (Additional file 2: Fig. S20 and Additional file 1: Table S20). The transcriptional control of terpene biosynthesis genes correlates with the presence of cis-elements in their promoter regions, which were recognized and bound by specific transcription factors. When screening the 2000-bp regions upstream of the 52 CpTPS genes, several defense and stress responsive elements were found to be significantly enriched, such as bHLH- and MYB-binding elements (Additional file 2: Fig. S21). The results indicated that the MYB/bHLH transcription factors may serve as key factors in regulating the CpTPS genes expression and provide us with the starting point for the further studies to reveal the cross-talk in the regulation of plant secondary metabolites and stress responses.

Evolution of benzenoid/phenylpropanoid biosynthesis-related genes

Benzenoids/phenylpropanoids are the second largest group of the floral VOCs in wintersweet, which are derived from the aromatic acid phenylalanine. Phenylalanine is synthesized via two pathways (phenylalanine pathway and aragenate pathway) [45], and these two pathways split from the plastidial shikimate pathway [46]. The genes involved in the shikimate pathway (20), phenylpyruvate pathway (7), and arogenate pathway (6) were identified as shown in Fig. 6a. In the wintersweet genome, both WGD and tandem duplication events have considerably impacted both the upstream genes in the phenylpropanoid pathway and downstream genes involved in specific benzenoid (benzyl acetate and methyl salicylate) biosynthesis (Fig. 6a,b), which lead to the high rate of paralog formation in 15 gene families (Additional file 1: Table S21).

Evolution and expression of key genes involved in benzenoid/phenylpropanoid biosynthesis. a Expression profiles of genes encoding enzymes possibly involved in the shikimate/benzenoid pathway in wintersweet. Abbreviations for enzymes in each catalytic step are shown in bold. The gradient color for each gene represents the gene expression levels in three petal developmental stages in wintersweet (S1: bud stage S4: full open flower stage S5: senescence stage). Homologous genes are represented by equal colored horizontal stripes and are termed from top to bottom in Arabic numerical order. The full names of enzymes are listed in the Additional file 1: Table S21. The genes circled in black and red boxes were generated by WGD and tandem duplication events respectively. The non-boxed genes did not undergo these events. b Schematic representation of the wintersweet chromosomes together with the positions of key genes involved in benzenoid/phenylpropanoid biosynthesis. The genes marked in brown and red were generated by WGD and tandem duplication events respectively. c Expression profiles of the 21 BEAT homologous genes in three different stages (S1, S4, and S5). These genes were expressed in at least one of the three developmental stages

Benzyl acetate, the dominant compounds of floral scent in wintersweet, is synthesized from benzyl alcohol via acetyl-CoA-dependent reaction catalyzed by acetyl-CoA: benzyl alcohol acetyltransferase (BEAT) [47]. Comparative genomic analysis revealed that the wintersweet genome harbors 33 BEAT homologous genes (Additional file 1: Table S21), the number of which is comparable to that in Prunus mume, in which benzyl acetate is also the major component of floral scent. Similar to P. mume, the expansion of the BEAT homologous genes was mainly attributed to tandem and WGD duplication events [47]. Of 33 BEAT homologous genes found in wintersweet genome, 8 were derived from the WGD event, and 14 were amplified via tandem duplication. Transcriptome and metabolite correlation analysis showed that the expression pattern of 4 CpBEATs coincided with benzyl acetate emission (Fig. 6c and Additional file 2: Fig. S18). These genes might be responsible for benzyl acetate biosynthesis in the wintersweet flower. Methyl salicylate is also the major composition of floral VOCs in wintersweet. Three tandem duplication-derived salicylic acid methyltransferase (SAMTs) were identified in the wintersweet (Additional file 2: Fig. S22), two of which were highly expressed in the flower and their expression patterns correlated with methyl salicylate emission, suggesting that these two genes may be primarily responsible for methyl salicylate biosynthesis (Fig. 6a). These observations above suggested that the expansion of specific genes and selective expression in flower could induce the heightened activity of the corresponding enzymes, which resulted in the abundant characteristic aroma formation in the flowers of wintersweet.


Tyr-Asp inhibition of GAPC activity is associated with the shift of the glycolytic flux toward the PPP and increased NADPH/NADP + ratio

The main aim of our work was to understand the biological significance of our recently reported in vitro interaction between the dipeptide Tyr-Asp and GAPC (Veyel et al, 2018 ). We began with the most straightforward hypothesis and tested GAPC activity in the absence and presence of Tyr-Asp. Indeed, Tyr-Asp application (100 µM) inhibited GAPC enzymatic activity, unlike treatment with single amino acids (Tyr and Asp) or chemically unrelated dipeptide (Ile-Glu) (Fig 1A). The reduction of 23% may appear modest, but it is important to note that GAPC activity assay accounts not only for cytosolic GAPC but also for GAPCp, and moreover GAPA/B, which has substantial activity with NAD + (Falini et al, 2003 ), whereas GAPCp activity is negligible in crude extracts from Arabidopsis rosettes harvested in the light, that is not the case for GAPA/B (Munoz-Bertomeu et al, 2009 ). To differentiate between GAPC and GAPA/B activities, we introduced gapc1 gapc2 double mutant, which is entirely devoid of the cytosolic GAPC activity (Guo et al, 2012 ). By doing so, we could demonstrate that the reduction in GAPC activity produced by the addition of Tyr-Asp is similar to that observed in the gapc1 gapc2 double mutant in the absence of the dipeptide (Fig 1B). Moreover, gapc1 gapc2 double mutant (Guo et al, 2012 ) was insensitive to Tyr-Asp inhibition of GAPC activity. Based on the obtained results, we conclude that GAPC1 and GAPC2 are primary targets of Tyr-Asp action and that 100 µM concentration of Tyr-Asp is sufficient to completely inhibit the activity of the glycolytic GAPC. To further substantiate our results, we tested the activity of GAPA/B and GAPN in crude extracts from wild-type plants and the gapc1 gapc2 double mutant (Guo et al, 2012 ). Fig 1C and D shows that neither GAPA/B nor GAPN activity was affected by Tyr-Asp. Tyr-Asp (100 µM) was sufficient to inhibit GAPC activity. In comparison, Tyr-Asp amount measured in the Arabidopsis seedlings (control conditions) varied from 0.23 to 2.7 nmol g −1 FW −1 with an average of 0.62 nmol g −1 FW −1 (n = 20 Appendix Fig S1). We then used data from Koffler et al ( 2013 ) to estimate the concentration of Tyr-Asp in planta. If we assume equal distribution of Tyr-Asp in all compartments, the intracellular concentration of Tyr-Asp would be approximately 1 µM, whereas it would rise to 26.5 µM if all Tyr-Asp were exclusively located in the cytosol.

Figure 1. Tyr-Asp inhibition of GAPC activity is associated with the shift of the glycolytic flux toward the PPP and increased NADPH/NADP + ratio

  • A. GAPC enzymatic activity was measured in wild-type crude extracts treated with H2O (mock), and 100 µM of Tyr-Asp, Tyr, Asp, and Ile-Glu.
  • B–D. The enzymatic activities of GAPC (B), GAPA/B (C), and GAPN (D) were measured in the wild-type and double k.o. mutant gapc1 gapc2 treated with H2O (mock), and 100 µM of Tyr-Asp.
  • E. Schematic representation of glycolysis and pentose phosphate pathways.
  • F. Glucose-labeled C6 and C1 flux experiment in 10-day-old Arabidopsis seedlings. C6/C1 ratio for control (water treated) and Tyr-Asp (100 µM)-treated wild-type seedlings after 35, 70, and 145 min is shown.
  • G. NADPH/NADP + ratio.
  • H. NADH/NAD + ratio. Ratios were calculated from the NAD + , NADH, NADP + , and NADPH measurements.

Data information: Data are mean ± SEM of n = 4 (A–D four technical replicates), n = 3 (F three independent flasks), and n = 5–6 (G–H five to six independent seedling flasks). For (A–D, F), significance was assessed using unpaired two-tailed Student’s t-test. For (G–H), significance was assessed using two-way ANOVA (P ≤ 0.05 letters show significance associated with the Tyr-Asp treatment). G6P: glucose 6-phosphate F6P: fructose 6-phosphate FBP: fructose 1,6-bisphosphate DHAP: dihydroxyacetone-phosphate G3P: glyceraldehyde 3-phosphate 1,3bisPGA: 1,3bis-phosphoglycerate 3PGA: 3-phosphoglycerate PPP: pentose phosphate pathway R5P: ribose 5-phosphate. In (G) and (H), (C) stands for control n.s.: not significant.

In animal and yeast cells, oxidative inactivation of the glycolytic GAPDH by redox modification of the catalytic cysteine was shown to increase the NADPH/NADP + ratio by redirecting the glycolytic flux to the PPP (Fig 1E) (Ralser et al, 2007 ). Moreover, Arabidopsis double gapc1 gapc2 knock-out mutant is characterized by an increase in the NADPH/NADP + ratio (Guo et al, 2014 ). To test whether Tyr-Asp-treated Arabidopsis plants would have similar effects and shift glycolytic intermediates toward the PPP due to inhibition of GAPC, we exploited the proven ability of Arabidopsis roots to take up dipeptides from the growth media (Komarova et al, 2008 ). Seedlings were fed with 14 C glucose labeled on position C1 or C6 in the absence or presence of 100 μM Tyr-Asp (Nunes-Nesi et al, 2005 ). We compared the rates at which 14 CO2 was released from carbons 1 (C1) and 6 (C6) at 35, 70, and 145 min following the addition of the 14 C glucose (Fig 1F Appendix Fig S2A). 14 CO2 release from C1 is related to the activity of both PPP and glycolysis, while from C6 only of glycolysis/TCA, and as consequence lower C6-to-C1 ratio is indicative of the more active PPP (Appendix Fig S2A). A significant decrease in the C6-to-C1 ratio measured in the Tyr-Asp-treated versus control seedlings argues for the G6P being redirected from glycolysis toward PPP (Fig 1F). To investigate whether observed changes in the PPP activity result in an altered NADPH/NADP + ratio, NAD + , NADH, NADP + , and NADPH cellular concentrations were measured in mock (H2O-treated control) and Tyr-Asp-treated 10-day-old Arabidopsis seedlings grown in liquid culture at 1 and 6 h using a targeted enzymatic assay (Appendix Fig S2B–F). Tyr-Asp supplementation produced a significant reduction in NAD + , NADH, and NADP + levels (Appendix Fig S2B–F), resulting in a significant increase in the deduced NADPH/NADP + ratio (Fig 1G), but an unchanged NADH/NAD + ratio (Fig 1H).

Finally, we used YFP and GFP marker lines of GAPC1 and GAPCp1/2 (Munoz-Bertomeu et al, 2009 Guo et al, 2012 ) to determine GAPDH subcellular localization upon Tyr-Asp treatment, and therefore its moonlight activities (Zaffagnini et al, 2013 Schneider et al, 2018 ). However, and at least in our experimental conditions, Tyr-Asp treatment did not change GAPDH subcellular localization (Appendix Figs S3 and S4).

Tyr-Asp supplementation confers resistance to oxidative stress in Arabidopsis and tobacco seedlings

Inactivation of GAPDH improves oxidative stress tolerance in both yeast and animal cells by supplying NADPH (Ralser et al, 2007 ). Here, we examined whether the reduction in GAPDH activity measured in the Tyr-Asp-treated plants would also confer an advantage under oxidative stress conditions. To test our hypothesis, we used two agents known to induce an oxidative stress response: hydrogen peroxide (H2O2:50 mM) and catechin (0.175 mM) (Scarpeci et al, 2008 Kaushik et al, 2010 ). To be consistent with the Tyr-Asp feeding experiments, we used plants grown on synthetic MS media, and then, we transferred them to liquid MS media for oxidative stress and Tyr-Asp treatments (see Methods section). Moreover, we tested both Arabidopsis and tobacco seedlings (in several experiments Dataset EV1), the latter characterized by a more homogeneous growth under normal conditions. Plants were germinated on solid MS media and transferred to a 24-well plate at 12 days after stratification (DAS). First, seedlings were incubated with either 100 µM Tyr-Asp or mock (H2O) for 1 h before applying the stress. Fresh weight, as a proxy for growth, was measured after two and four days of catechin treatment in Arabidopsis and tobacco, respectively, and after 36 h and 2 days of H2O2 treatment in Arabidopsis and tobacco, respectively. While Tyr-Asp supplementation did not affect the overall plant growth under control condition, it did lead to increased fresh weight under both oxidative stress regimes in both Arabidopsis and tobacco seedlings (Fig 2A–C and Appendix Fig S5A–D). The increase in fresh weight was further corroborated by the measurements of leaf area in tobacco plants treated with catechin (Fig 2C). Importantly, neither the combination of Tyr and Asp nor the two other tested dipeptides, Ser-Leu and Gly-Pro, exhibited the bioactivity of Tyr-Asp (Fig 2A and Appendix Fig S5A–D).

Figure 2. Tyr-Asp treatment improves growth performance of Arabidopsis and tobacco plants exposed to oxidative stress

  1. Fresh weight quantification of Arabidopsis and tobacco seedlings. Seedlings were pretreated with mock (water), Tyr-Asp (100 μM), Tyr and Asp (100 μM), Ser-Leu (100 μM), or Gly-Pro (100 μM) for 1 h prior an oxidative stress (catechin or H2O2). Each treatment was compared with its respective control, corresponding to plants grown in one 24-well plate (represented by the adjoined bars).
  2. Catechin- (left upper panel) and (catechin-) Tyr-Asp-treated tobacco plants (right upper panel) grown for four days in liquid media. H2O2- (left upper panel) and (H2O2-) Tyr-Asp-treated tobacco plants (right upper panel) grown for two days in liquid media. All plants were 2 weeks old before starting the stress treatment.
  3. Leaf series prepared from tobacco seedlings pretreated (1 h) with either mock or Tyr-Asp and exposed to catechin for 4 days.

Data information: Data are mean ± SEM of n = 10–12 (seedlings). Unpaired two-tailed Student’s t-test was performed to compare treatments with control. Scale bar: 10 cm.

To complement oxidative stress regimes, we decided to test the effect of Tyr-Asp supplementation on plant performance under salt stress. As described above, 12-day-old tobacco seedlings were incubated with either 100 µM Tyr-Asp or mock (H2O) for 1 h before applying the salt stress (50 mM NaCl). Plant performance was measured after 6 days of salt treatment by assessing the fresh weight of the plants. Again Tyr-Asp treatment, but neither the combination of amino acids nor the two other tested dipeptides did improve plant performance under stress conditions (Appendix Fig S5E and F).

To test whether improved stress tolerance is restricted to the use of liquid media and thus dipeptide uptake via both roots and shoots, we performed catechin and salt experiments using tobacco plants grown on nylon mesh overlaying solid MS media (see Methods section). Two-week-old plants were transferred first to plates containing Tyr-Asp and afterward to plates containing a combination of Tyr-Asp, salt and/or catechin. Fresh weight measurements were taken after one week of treatment (Appendix Fig S6A). The addition of Tyr-Asp had no effect on plant growth measured under control conditions, but it resulted in the overall higher biomass in plants subjected to the catechin and salt stresses (Appendix Fig S6B).

Finally, and to assess the contribution of the Tyr-Asp inhibition of GAPC activity to the improved stress tolerance associated with the Tyr-Asp treatment, we used the double gapc1 gapc2 double knock-out mutant (Guo et al, 2012 ), which showed a similar effect as the observed for Tyr-Asp treatment, in the reduction in the total GAPC activity (Fig 1B). Arabidopsis wild-type and gapc1 gapc2 plants were subjected to catechin stress as described above. As observed previously, Tyr-Asp supplementation increased biomass of catechin-treated wild-type seedlings (Fig 3A and D Appendix Fig S7A). By contrast, no such improvement was observed for the gapc1 gapc2 mutant line (Fig 3B and E, and Appendix Fig S7B), arguing for the Tyr-Asp-associated stress tolerance being dependent on the inhibition of the GAPC1 and GAPC2 activities. Moreover, whereas under control conditions, gapc1 gapc2 mutant line was significantly smaller compared with the wild type, the opposite was true under treatment with catechin (Fig 3C and F Appendix Fig S7C). Our results are in line with the previously reported improved drought tolerance of the gapc1 gapc2 mutant (Guo et al, 2012 ).

Figure 3. Tyr-Asp improvement of Arabidopsis stress tolerance is associated with the GAPC1/2 inhibition

  • A–C. Fresh weight measurements of Arabidopsis wild-type and gapc1 gapc2 double k.o. plants subjected to the oxidative stress (e.g., catechin) and Tyr-Asp treatments. Each treatment was compared with its respective control, corresponding to plants grown in one 24-well plate (represented by the adjoined bars).
  • D–F. Representative Arabidopsis wild-type and gapc1 gapc2 double k.o. seedlings from the different treatments. All plants were 10 days old at the stress onset. The seedlings were exposed to catechin for 3 days.

Data information: Data are mean ± SEM of n = 10–12 (seedlings). Unpaired two-tailed Student’s t-test was performed to compare treatments with control. Scale bar: 10 cm. Data from an independent experiment are shown in (Appendix Fig S7).

In silico prediction of Tyr-Asp binding to the Arabidopsis GAPC1 reveals two spatially close sites

Taking advantage of the available crystal structure for the Arabidopsis GAPC1-NAD + protein (PDB-ID 4z0h) (Zaffagnini et al, 2016 ), we performed an in silico docking analysis. Tyr-Asp was predicted to bind to two spatially close sites near the NAD + : at the catalytic site surrounded by amino acid residues “SCT”, “H”, “TAT”, and “R” (positions 148-150, 176, 179-181, and 231 of chain R), and close to the adenosine-moiety of NAD + . In the first pocket, Tyr-Asp binds with an associated Kd of 28.6 µM (Fig 4A–C, pocket 1). In the second pocket, lined by amino acid residues “KTVDGP” (sequence position 183-188 of chain O) and “F”, “A”, and “Q” (positions 34, 179, and 181 of chain R), in close proximity to the ribonucleotide and nicotinamide moieties of NAD + , Tyr-Asp binds with an associated Kd of 23.3 µM (Fig 4A–C, pocket 2). In addition, we found an alternative binding pocket with lower, but still appreciably, binding efficiency (Kd of 82.5 µM), relatively distant from the previous two pockets and lined by the amino acid residues “VGD” (sequence positions 284-286 of chain O) and “R”, “HGQ”, “K”, and “W” (positions 17, 49-51, 53, and 315 of chain R’) (Fig 4A, pocket 3). We subsequently removed the NAD + molecule from the crystal structure (PDB-ID 4z0h), resulting in Tyr-Asp binding at the position of the removed adenosine of NAD + , binding to the amino acid residues “I”, “SAP”, “ASC”, “T”, “R”, “NE”, “Y” (positions 11, 119-121, 147-149, 179, 231, 313-314, and 317, with a Kd of 5.1 µM Fig 4B). Docking Tyr-Asp to the S-glutathionylated GAPC1-NAD + structure (PDB-ID 6quq) (Berman et al, 2000 Zaffagnini et al, 2019 ) resulted in no binding pocket being detected at the catalytic site, and hence no predicted binding. The oxidized form of GAPC1-NAD + (PDB-ID 6qun) (Zaffagnini et al, 2019 ) showed a 13 times decreased affinity for the binding to the catalytic pocket, with a Kd of 304.3 µM. Finally, we compared the predicted Tyr-Asp binding sites with the published structure of the Arabidopsis GAPA protein with a small chloroplast protein Cp12-2 (PDB-ID 3qv1 Fig 4C) (Marri et al, 2008 Fermani et al, 2012 ). The Tyr-Asp motif present at the C-terminus of the Cp12-2 was previously reported to be involved in the stabilization of GAPA/Cp12 interaction with Tyr76 forming a hydrogen bond with phosphate group of NAD + . The C-terminal Tyr-Asp motif of Cp12 overlaps with the computationally docked Tyr-Asp at the catalytic site (Fig 4C, pocket 1). The predicted binding position of Tyr-Asp denoted as the second pocket (Fig 4A and C, pocket 2) overlaps with glutamic acid (sequence position 12) of Cp12-2. Hence, Tyr-Asp may interfere with Cp12 binding, posing an interesting hypothesis to be tested in the future.

Figure 4. In silico prediction of Tyr-Asp binding to GAPC

  1. Surface representation of the GAPC1 tetramer (dimer of O-R-dimer) with colors indicating the different chains and respective sequence identity (O = O' and R = R') and highlighting the conformations of Tyr-Asp (blue) docked to the identified pockets, labeled 1-3, as well as the catalytic site (Cys 149 and His 176 in red), and NAD + (pink).
  2. Predicted binding conformation of Tyr-Asp docked to GAPC1 without NAD + (but shown for reference and taken from 4z0h, purple).
  3. Zoom-in pockets 1 and 2 with overlaid Cp12-2 (taken from PDB-ID 3qv1, cartoon gray) with highlighted glutamic acid in sequence position 12 and C-terminal Tyr-Asp motif (both in stick representation), which sterically interfere with Tyr-Asp binding to pockets 1 and 2.

Proteome characterization of the Tyr-Asp feeding experiment revealed changes in protein and redox metabolism consistent with the Tyr-Asp protein interactions beyond that with GAPC

Tyr-Asp inhibition of the GAPC activity and the associated change in the NADPH/NADP + ratio provides an explanation for the improved stress resistance of the Tyr-Asp-supplemented seedlings. To explore alternative mechanisms, we decided to follow two additional experimental strategies. First, we characterized proteome and metabolome changes associated with the Tyr-Asp treatment. Specifically, 10-day-old Arabidopsis seedlings grown in liquid culture were supplemented with 100 µM Tyr-Asp and harvested at five different time-points (15, 30 min, 1, 6, and 24 h) prior to untargeted mass spectrometry-based metabolomic and proteomic analyses. As before, mock (water-)-treated seedlings were used as control. Statistical analysis of the 5257 proteins and 201 metabolites revealed a total of 212 proteins, but only three metabolites (Tyr-Asp, Tyr, and Asp-Pro) were significantly affected by Tyr-Asp (two-way ANOVA, FDR corrected P ≤ 0.05) (Fig 5A Dataset EV2). The Tyr-Asp content measured in the treated Arabidopsis seedlings was elevated within 15 min of Tyr-Asp supplementation, peaking at 1 h, followed by a decrease, and reduced accumulation at 24 h (Dataset EV2). Not surprisingly, considering the rapid turnover of the supplemented Tyr-Asp, tyrosine levels also increased upon Tyr-Asp feeding (Dataset EV2). Altogether, 164 of the 212 proteins affected by Tyr-Asp treatment (38 up- and 126 down-regulated) to 29 functional MapMan bins (Fig 5B Dataset EV3), revealing changes in protein and redox metabolism. While ribosomal subunits were enriched among up-regulated proteins, chaperones (e.g., HSP70, HSP90.7, HSP90.1, ROF1, and CPN10) and proteases (e.g., cysteine proteases RD21 and cathepsin) were down-regulated by the Tyr-Asp treatment. Decreased abundance was also measured for enzymes involved in redox metabolism (e.g., thioredoxin H1 and H3, glutathione reductase, ferredoxin-thioredoxin reductase, and ascorbate oxidase). In fact, 22 of the differential proteins bind NADP(H), including glutathione reductase, which is involved in replenishing the pool of reduced glutathione, a small-molecule indispensable for the oxidative stress response (Noctor et al, 2018 ). Further down-regulated proteins included enzymes from amino acid and carbon metabolism, notably plastidial G6PDH, abscisic acid receptors PYR1 and PYL1, and cell wall-associated proteins.

Figure 5. Tyr-Asp affects redox and protein metabolism

  1. Proteins and metabolites differentially accumulating in response to the Tyr-Asp feeding were delineated using two-way ANOVA embedded in MeV software (Howe et al, 2011 ), followed by false discovery rate (FDR) correction. Heat map representation of the, respectively, 212 and three differential (two-way ANOVA FDR corrected P ≤ 0.05) proteins and metabolites across five time-points. Data are presented as log2 fold change between control and treatment. Red indicates down- while blue up-regulation. 1: log2 fold change (FC) 15 min 2: log2 FC 30 min 3: log2 FC 1 h 4: log2 FC 6 h 5: log2 FC 24 h M: median log2 FC of all time-points.
  2. Median of the log2 fold change between control and treatment calculated from the five time-points was used for MapMan visualization (Thimm et al, 2004 ) of the differential proteins (squares) and metabolites (circles). Blue indicates up- while red down-regulation. MapMan automatically assigned bincodes and proteins were classified into different cellular functions/processes. Bind.: binding CHO: carbohydrates Misc.: miscellaneous Modif.: modification N: nitrogen OPP: oxidative pentose phosphate pathway Reg.: regulation S: sulfur Targ.: targeting TCA: tricarboxylic acid cycle.
  3. Venn diagram comparison of putative Tyr-Asp interactors identified in the AP experiments using different sources of starting material.
  4. Venn diagram comparison of putative Tyr-Asp interactors identified in the AP, TPP, and PROMIS experiments.
  5. Cytoscape (Shannon et al, 2003 ) visualization of the Tyr-Asp interactome. Nodes represent Tyr-Asp interactors, and edges were imported from the STRING database (Szklarczyk et al, 2017 ) using experimental, database, and literature evidence (score set at 0.4). Functionality was assigned based on the UniProt (Apweiler et al, 2004 ). Disconnected nodes were removed from the network.

In parallel to the omics analysis of the Tyr-Asp feeding experiment, we revisited the Tyr-Asp protein interactome. Previously, agarose beads coupled to Tyr-Asp were used to capture protein binders from the native cellular lysate prepared from Arabidopsis cell cultures (Veyel et al, 2018 ). Here, we performed two additional affinity purification (AP) experiments, this time using Arabidopsis and tobacco leaves. A total of 13 and 29 proteins were identified in tobacco and Arabidopsis AP experiments, respectively, in comparison with the 108 previously identified using cell cultures (Fig 5C Dataset EV4 and EV5). A Venn diagram overlap of the three lists contained just one protein, cytosolic GAPC1 (Fig 5C), supporting a conserved role of Tyr-Asp in the regulation of the NAD + -dependent GAPDH activity.

To complement the AP experiments, we decided to investigate the Tyr-Asp interactome by an independent small-molecule-centered approach, namely thermal proteome profiling (TPP) (Savitski et al, 2014 ). In the TPP experiment, putative protein targets are delineated by monitoring changes in protein thermal stability caused by small-molecule binding. Arabidopsis native cellular lysate (prepared from the cell suspension culture grown in the light) was treated with either mock or 10 µM Tyr-Asp, followed by temperature gradient and proteomic analysis. Tyr-Asp treatment altered thermal stability of 177 out of 3092 quantified proteins (Dataset EV6). The list of putative Tyr-Asp interactors was enriched in proteins associated with protein metabolism and abiotic stress response: chaperones (e.g., CPN10, CPN20, CPN60A), ribosomal proteins (e.g., RPL12-C, RPL10, RPS24/35), proteases and peptidases (e.g., CLPR3, peptidase C15, DEG1, peptidase M20/M25/M40), and enzymes involved in oxidative stress defense (glutaredoxins and thioredoxins Appendix Fig S8).

In a final step, we compared the lists of putative Tyr-Asp interactors from the TPP and AP experiments. We also included a third list obtained from the published PROMIS experiment and comprising proteins co-separating with Tyr-Asp in size-based fractionation (Veyel et al, 2018 ). We refer to 47 proteins identified by at least two independent approaches as high confidence putative Tyr-Asp interactors (Fig 5D Dataset EV7). By querying the STRING database for reported protein–protein interactions (Szklarczyk et al, 2017 ) and using Cytoscape as a visualization tool (Shannon et al, 2003 ), we built the final Tyr-Asp interactome (network) comprising 36 interconnected proteins. Proteins were grouped into four functional classes (i) carbon metabolism (GAPC1, GAPC2, and TKL1, TKL2), (ii) chaperones (e.g., CPN10, CPN20), (iii) translation and ATP production (e.g., RRF, RPL12-C), and (iv) protein degradation (e.g., RAD23B, DSK2 Fig 5E).

AthPEPCK1 activity is inhibited by the branched-chain and polar amino acid dipeptides

Next, we wondered whether GAPC is the only glycolytic/gluconeogenic enzyme targeted by proteogenic dipeptides. The PROMIS dataset (Veyel et al, 2018 ), used in the identification of the Tyr-Asp–GAPDH interaction, contains 92 additional proteogenic dipeptides, spanning the whole protein separation range (Dataset EV8). To test the likelihood of proteogenic dipeptides binding to other glycolytic/gluconeogenic enzymes, we analyzed co-elution of all dipeptides and all glycolytic/gluconeogenic enzymes present in the PROMIS dataset (Veyel et al, 2018 ). Our analysis identified numerous putative interactions (Appendix Fig S9 Dataset EV8). We focused our attention on one of the co-elution clusters containing the gluconeogenic enzyme phosphoenolpyruvate carboxykinase 1 (AthPEPCK1) and a group of six dipeptides characterized by the presence of either polar or branched-chain amino acids (Ile-Gln, Ile-Ala, Phe-Gln, Leu-Thr, Ser-Tyr, and Ser-Val). The activity of recombinant AthPEPCK1 (Rojas et al, 2019 ) was analyzed at increasing concentrations of the selected dipeptides (Fig 6A). Remarkably, all six tested dipeptides inhibited AthPEPCK1 activity with I0.5 values ranging from mid to high micromolar, while no effect was observed for Tyr-Asp (Fig 6B). The Ala-Ile dipeptide was the most potent inhibitor, followed by Ile-Gln, Ser-Tyr, Phe-Gln, Ser-Val, and Leu-Thr. Contrarily, none of the tested amino acids affected AthPEPCK1 activity (Appendix Fig S10). To validate the inhibition of recombinant AthPEPCK1, we measured PEPCK activity in a protein lysate from Arabidopsis rosettes, using a fluorometric assay optimized ad hoc (Appendix Fig S11). In line with the results obtained for the recombinant enzyme, all six dipeptides (Ile-Gln, Ile-Ala, Phe-Gln, Leu-Thr, Ser-Tyr, and Ser-Val) inhibited PEPCK activity in crude extracts, with Ala-Ile showing the most potent inhibitory effect, whereas no effect was observed with Tyr-Asp (Fig 6C).

Figure 6. AthPEPCK1 activity is inhibited by dipeptides

  • A, B. Recombinant AthPEPCK1 activity was measured at increasing concentrations of six different H–P dipeptides (A) or Tyr-Asp (B). I0.5 values are indicated on each graph. Measurements were performed in the decarboxylation direction, as described in the Methods section. Data were adjusted to a modified Hill equation. The value of 1 corresponds to the activity of 4.3 ± 0.2 U mg −1 . Data are mean ± SEM of 4 independent measurements.
  • C. PEPCK activity was measured in a protein lysate prepared from 3-week-old Arabidopsis leaves in the absence (control, C) or presence of 500 µM of the different dipeptides. Measurements were performed in the carboxylation direction, as described in the Methods section. Data are mean ± SE of 4 independent lysate preparations. Unpaired two-tailed Student’s t-tests were performed to compare dipeptide treatments with the control.


In this paper, we present COSMOS, an analysis pipeline to systematically generate mechanistic hypotheses by integrating multi-omics datasets with a broad range of curated resources of interactions between protein, transcripts, and metabolites.

We first showed how TF, kinase, and phosphatase activities could be coherently estimated from transcriptomics and phosphoproteomics datasets using footprint-based analysis. This is a critical step before further mechanistic exploration. Indeed, transcript and phosphosite usually offer limited functional insights by themselves as their relationship with corresponding protein activity is usually not well characterized. Yet, they can provide information on the activity of the upstream proteins regulating their abundances. Thus, the functional state of kinases, phosphatases, and TFs is estimated from the observed abundance change of their known targets, i.e., their molecular footprint. Thanks to this approach, we could simultaneously characterize protein functional states in tumors at the level of signaling pathway and transcriptional regulation. Key actors of hypoxia response, inflammation pathway, and oncogenic genes were found to have especially strong alteration of their functional states, such as HIF1A, EPAS1, STAT1/2, MYC, and CDK2. Loss of VHL is a hallmark of ccRCC and is directly linked to the stability of the HIF (HIF1A and EPAS1) proteins found deregulated by our analysis (Maxwell et al, 1999 Ivan et al, 2001 Jaakkola et al, 2001 ). Finding these established signatures of ccRCC to be deregulated in our analysis is a confirmation of the validity of this approach.

We then applied COSMOS with a novel meta causal Prior Knowledge Network spanning signaling, transcription, and metabolism to systematically find potential mechanisms linking deregulated protein activities and metabolite concentrations. To the best of our knowledge, this is the first attempt to integrate these three omics layers together in a systematic manner using causal reasoning. Previous methods studying signaling pathways with multi-omics quantitative datasets (Drake et al, 2016 ) connected TFs with kinases but they were limited by the preselected locally coherent subnetwork of the TieDIE algorithm. Introducing global causality along with metabolomics data allows us to obtain a direct mechanistic interpretation of links between proteins at different regulatory levels and metabolites. The goal of our approach is to find a coherent set of such mechanisms connecting as many of the observed deregulated protein activities and metabolite concentrations as possible. Using COSMOS is particularly interesting as all the proposed mechanisms between pairs of molecules (proteins and metabolites) have to be plausible not only in the context of their own pairwise interaction but also with respect to all other molecules that we wish to include in the model. For example, the proposed activation of MYC by NFKB1 and MAPK1 is further supported by STAT3 activation, because MAPK1 is also known to activate STAT3. Thus, we developed COSMOS to scale this type of reasoning up to the entire PKN with all significantly deregulated protein activities and metabolites. We relied on an ILP optimization through the CARNIVAL R package (Liu et al, 2019b ) to contextualize this PKN with our data. We refined the optimization procedure to handle this very large PKN and built an R package to facilitate others to use it with their own data. Given a set of deregulated TFs, kinases/phosphatases, or metabolites, COSMOS provides the users with a set of coherent mechanistic hypotheses to explain changes observed in a given omics layer with upstream regulators from other omics layers. Thus, its aim is to integrate measured data with prior knowledge in a consistent and systematic manner, not to explicitly predict the outcome of new experiments.

Since the interferon gamma response pathway was the most over-represented cancer hallmark in the COSMOS network solution, we investigated further the relevance of the mechanistic hypothesis connecting members of this pathway. The network showed that the crosstalks between MAPK1, NFKB1, MYC, HIF1A, and YY1 could explain the deregulation in glutamine and reduced glutathione metabolism, as well as inosine, hypoxanthine, and adenine. These were particularly relevant as they were important interactions in ccRCC. MYC and glutamine metabolism appear to be an interesting therapeutic target of ccRCC (Shroff et al, 2015 ). YY1 is a known indirect inhibitor of MYC involved in cancer development (Austen et al, 1998 ). The COSMOS network showed YY1 could also potentially have a role in the down-regulation of the ADA and PNP metabolic enzyme activities. Coherently, PNP has been shown to be non-essential in ccRCC cell lines, which is expected from down-regulated metabolic enzymes (Gatto et al, 2015 ). In addition, the link shown by COSMOS between NFKB1 and MYC can have implications for the treatment of ccRCC, due to its pivotal role in arsenite (a drug used in chemotherapy) treatment of cancer (Huang et al, 2014 ). Furthermore, the activation of the NFKB1-MYC link in FBW7-deficient cells seems to sensitize them to Sorafinib (a MEK-Raf inhibitor), a drug used in treatment of primary kidney cancer (Huang et al, 2014 ). In addition, NFKB1 and MYC are both promising ccRCC treatment targets (Peri et al, 2013 Bailey et al, 2017 ). The link shown by COSMOS between KMT2A and adenosine is interesting, because KMT2A mutations have been reported in a number of ccRCC patients (Yan et al, 2019 ), suggesting that this enzyme might play a functional role in ccRCC development. Moreover, it has been proposed, at least in vitro, that ccRCC cell lines with low basal levels of phospho-AKT were sensitive to treatment with an adenosine analog (Kearney et al, 2015 ). The link between YES1, MAPK1, and SMAD4 in the COSMOS network is especially relevant considering that YES1 is a known targetable oncogene (Hamanaka et al, 2019 ). These examples illustrate the ability of COSMOS to extract mechanistic hypotheses to understand and potentially improve treatment of cancer by integration of multiple omics data and prior knowledge.

However, it is important to mention that COSMOS is only aimed at providing hypotheses to further explore experimentally. COSMOS does not aim at recapitulating all the molecular interactions that may be happening in a given context. Currently, COSMOS simply provides a large set of coherent mechanistic hypotheses, given the data and prior knowledge available. We argue that this facilitates the interpretation of a complex multi-omics dataset and guides the exploration of biological questions.

We assessed the performances and robustness of our approach. We computed a tumor specific correlation network of TF and kinase activities and compared it to the co-regulation predicted by COSMOS. This yielded encouraging results, though imperfect, underscoring the fact that the mechanisms proposed by COSMOS—like those by any similar tool—are hypotheses. It also highlighted that adding more omics data to integrate allows to generate more hypotheses and connect them together, but does not necessarily improve their predictive performances.

There are three main known limits to the predictions of COSMOS. First, the input data are incomplete. Only a limited fraction of all potential phosphosites and metabolites are detected by mass spectrometry. This means that we have no information on a significant part of the PKN part of the unmeasured network is kept in the analyses and the values are estimated as intermediate “hidden values”. Second, not all regulatory events between TFs, kinase, and phosphatases and their targets are known, and activity estimation is based only on the known regulatory relationships. Thus, many TFs, kinase, and phosphatases are not included because they have no curated regulatory interactions or no detected substrates in the data. Third, and conversely, COSMOS will find putative explanations within the existing prior knowledge that may not be the true mechanism.

These problems mainly originate from the importance that is given to prior knowledge in this method. Since prior knowledge is by essence incomplete, the next steps of improvement could consist of finding ways to extract more knowledge from the observed data to weight in the contribution of prior knowledge. For instance, one could use the correlations between transcripts, phosphosites and metabolites to quantify the interactions available in databases such as OmniPath. Importantly, any other omics that relate to active molecules (such as miRNAs or metabolic enzyme fluxes) can be used to estimate protein activities through footprint approaches (such as DNA accessibility or PTMs other than phosphorylation) can be seamlessly integrated (as we showed with the fluxomic of the breast cancer dataset). Moreover, COSMOS was designed to work with bulk omics datasets, and it will be very exciting to find ways of applying this approach to single cell datasets. Encouragingly, the footprint methods that bring data into COSMOS seem fairly robust to the characteristics of single-cell RNA data such as dropouts (Holland et al, 2020 ). Related to the importance of prior knowledge, the PKN can also depend on how we interpret the information we have about molecular interactions. In particular, we converted the reaction network of Recon3D into a causal network where metabolite reactants “activate” metabolic enzymes, and metabolic enzymes “activate” metabolite products. This first approximation assumes that metabolite abundances are only driven by their production rates. We plan to refine this in the future to include that metabolite abundances can change as a result of consumption as well. Finally, we expect that in the future, data generation technologies will increase coverage and our prior knowledge will become more complete, reducing the mentioned limitations. In the meantime, we believe that COSMOS is already a useful tool to extract causal mechanistic insights from multi-omics studies.

Improvement of fermentation processes for ABE fermentation

Apart from modifications at the microbial level, engineering the fermentation process itself is another effective strategy for alleviating butanol toxicity and enhancing butanol production. At present, various fermentation processes, which determine capital investment in the upstream and downstream processes, feedstock consumption, and energy requirements, have been developed to further improve the efficiency of ABE fermentation.

Optimization of culture conditions

Effects of exogenous additives on ABE fermentation

According to the metabolic pathway and physiological characteristics, various organic acids (such as acetate, butyrate, amino acids, and lactic acid) could serve as alternative substrates for butanol production, and then maintain the robust expression of enzymes associated during the solventogenic and solventogenesis phase [169, 170]. Interestingly, when butyrate was the sole carbon source, only 0.2 g/L of butanol was produced, but the production of butanol significantly increased to 10 g/L when both butyric acid and glucose were present, suggesting that butyric acid may be an important factor triggering solvent production [171]. More importantly, after optimization of the glucose concentration, butyric acid addition, and C/N ratio, the amounts of butanol and ABE production were further increased to 17 g/L and 21.71 g/L, respectively, in the scale-up fermentation of C. acetobutylicum YM1 [172, 173]. Similarly, with the addition of 30 mM ammonium acetate, the fermentation time of C. acetobutylicum EA was shortened about 12 h, and the yield of butanol increased from 8.3 to 13 g/L [174].

Generally, based on the critical micelle concentration, surfactants could self-assemble and into micelles, and then relieve the butanol toxicity against microorganisms by entrapping the butanol into micelles. To the end, adding surfactant was used to significantly improve the performance of ABE fermentation. Butanol is coated with surfactant to slow down its toxicity to microorganisms, so as to enhance butanol production. For instance, with the addition of non-ionic surfactant [3% (v/v) L62], the amounts of butanol and total solvents produced correspondingly increased to 15.3 g/L and 21 g/L, which were, respectively, 43% (w/w) and 55% (w/w), higher than the control. More interestingly, when the surfactant was added at 9 h, the productivities of butanol and total solvent further increased to 0.31 g/L/h and 0.39 g/L/h from 0.13 g/L/h and 0.17 g/L/h, respectively [175]. Likewise, a combined zinc-supplemented/magnesium-starved fermentation medium could also effectively improve central carbon metabolism through multi-level modulation, e.g., up-regulation of glycolytic pathway, up-regulation in thiolase, butyraldehyde dehydrogenase and butanol dehydrogenase, and down-regulation in alcohol dehydrogenase, and then enhanced glucose utilization, reduced ethanol production and induced solventogenesis earlier, making the production of butanol increased from 11.83 to 19.18 g/L [176].

Mixed cultivation for ABE fermentation

To further enhance the economic feasibility of ABE fermentation, mixed cultures with different microorganisms were developed to: (1) enlarge the range of substrates [177] (2) increase the availability of intermediates [178, 179], and (3) decrease the cost of maintaining strict anaerobic conditions [180]. For instance, to eliminate the costly enzymatic hydrolysis step, a mixed culture of C. thermocellum and C. acetobutylicum was used for ABE fermentation, in which 40 g/L cellulose was directly utilized to produce 5.8 g/L butanol [181]. Similarly, when a mixed culture of C. acetobutylicum and B. subtilis without anaerobic treatment was used to reduce the application of costly reducing agents, 14.9 g/L of butanol was produced, which was 21.1% higher than that from a pure culture of C. acetobutylicum [182]. However, the greater possibility of infection by a bacteriophage when the number of transfer/sub-culturing steps is increased has become the major drawback, which has restricted the industrial application of co-culture systems.

Regulating the oxidation–reduction potential (ORP) for ABE fermentation

As shown in Fig. 1, a high level of NADH enhances butanol production at the expense of reduced acetone formation, suggesting that manipulating the intracellular redox balance at the molecular level or of the fermentation process may be an effective pathway to drive more carbon flux and energy towards butanol production [183, 184].

Intracellular regulators for redox balance

For clostridia, the electron flow is primarily regulated at the ferredoxin/hydrogenase node, so that reducing hydrogenase activities could effectively inhibit molecular hydrogen formation, driving electron flow towards butanol due to the regeneration of the NAD(P) + pool [185]. Therefore, overexpressing NADH-dependent adhE2 could effectively regulate the redox rebalance, and subsequently further improve butanol production [80]. Similarly, Rex, a redox-sensing protein and transcriptional regulator, could effectively regulate the expression of genes involved in butanol pathways against the intracellular NADH/NAD + shift. As a result, a Rex-negative mutant produced greater amounts of ethanol and butanol with less hydrogen and acetone as by-products [186].

Bioprocess engineering

Compared to the tedious task of genetic modification, bioprocess engineering (such as adding an electron carrier to strengthen NADH synthesis or aerating with CO to repress hydrogenases) could be explored with an immediate impact on environmental and intracellular ORP [187]. To this end, some artificial electron carriers (such as methyl viologen and neutral red) were added to drive the carbon flow from acids to alcohols: adding 2 g/L of Na2SO4 (an electron receptor) significantly increased butanol production, which reached 12.96 g/L, 34.8% higher than that of the control [188]. Likewise, when a mixture of 85% N2 and 15% CO was sparged during the ABE fermentation, hydrogenase activity and electron transfer were effectively suppressed, but the cellular NAD(P)H pool was significantly increased, improving the production of butanol from 4.8 to 7.8 g/L [189]. More interestingly, when the ORP of ABE fermentation was regulated at − 290 mV, solvent production could be initiated earlier, increasing solvent productivity by 35%, but the butanol yield was only slightly increased compared with that without ORP control [190].

Optimization of the fermentation process

High-cell-density fermentation

Compared to aerobic fermentations, fermentations with clostridia have excellent specific carbon fluxes [103], but suffer from low cell density [a maximum absorbance of around 10–11 at 600 nm (A600)] due to product inhibition, some unknown quorum-sensing mechanism, or unsuitable bioreactor operation [191,192,193]. To this end, various fermentation processes (e.g., immobilized cell, batch, fed-batch, and continuous) have been developed to realize a high-cell-density fermentation by alleviating substrate and product inhibitions [33, 93] (Table 4).

Compared to fed-batch and batch fermentations (characterized by product inhibition and considerable down time) and despite a few disadvantages (such as high capital cost, phage contamination, and flocculation of bacterial growth), continuous processes (using free cells, immobilized cells, and cell recycling) offer a more attractive and productive alternative for commercial industrial ABE production. The various advantages include reductions in sterilization and re-inoculation times, superior productivity, and fewer substrate and butanol inhibitions [194]. For example, with the help of membrane cell-recycle bioreactor, a high cell density continuous ABE fermentation of C. acetobutylicum BKM19 was carried out to produce butanol and ABE with the volumetric productivities of 10.7 and 21.1 g/L/h, the production of 11.9 and 23.5 g/L, and the yields of 0.17 and 0.34 g/g, respectively, under the optimal operational condition [195]. Generally, a productivity of > 10 g/L/h, titer of > 10 g/L butanol, and yield up to 0.44 g/g could be achieved from a high-cell-density fermentation, a great success in ABE fermentation by Clostridium [196, 197].

In situ product recovery (ISPR) techniques

To further alleviate butanol toxicity, several in situ product recovery (ISPR) techniques, including pervaporation, adsorption, liquid–liquid extraction, and gas stripping, were also developed to integrate with the fermentation process for higher butanol productivity [198,199,200,201,202] (Table 5). For example, with the combination of fed-batch fermentation and gas stripping, the production, productivity, and yield of total solvent significantly increased to 233 g, 1.16 g/L/h, and 0.47 g/g, respectively [203]. However, there are several advantages and disadvantages of each recovery system for butanol production (Table 5). Reverse osmosis seemed to be the most preferable recovery technique from an economic point of view, but it is prone to membrane clogging or fouling [204]. Recently, a series of novel composite membranes (such as FAS cross-linked PDMS [205], PDMS-based pervaporation membranes [206], modified grapheme oxide with ionic liquid [207], and mixed matrix membranes [208]) were fabricated and applied to ISPR techniques, and displayed a more stable performance during long-time continuous operation of ABE fermentation [24]. More importantly, an ideal integrated recovery process should minimize energy consumption and concentrate butanol with high selectivity, so hybrid integrated recovery processes were also developed to compensate for the respective disadvantages of individual processes, and showed good potential for industrial ABE fermentation [191, 209].

Materials and methods

Foraging performance

The research was carried out in southern Ontario, Canada from early June to early July 2004. The average (± s.e.m.) daily high temperature was 23.2±0.65°C. Forage during this period was abundant. The empty honeycomb we placed in the observation hive at the start of the experiment was 100% full 29 days later. Assuming a full frame mass of 4.5 kg(Sammataro, 1998), this corresponds to an average daily increase in frame mass of 155 g. We marked newly eclosed bees (Apis mellifera L.) with individually numbered tags and introduced them into a two-frame observation hive containing approximately 2000 bees. We made four introductions of 80 bees 3 days apart in order to have bees commencing foraging over several days.

Two weeks after introducing the first bee cohort, we removed a few bees that had already initiated foraging and began data recording. All bees departing and entering the hive travelled through a transparent Plexiglas tunnel. The marked bees were diverted into a side tunnel, caged and weighed on an analytical balance with precision of 0.1 mg. The balance reported the bee weight and time of day to the computer, and we added the bee identity, her travel direction and whether she carried pollen. We recorded data from the start of bee activity in the morning until 18:00 h on a total of 20 successive days, skipping a single rainy day with no foraging activity. At the end of the experiment, we edited the data set to include only trips longer than 5 min. We omitted all shorter trips assuming they were orientation trips by bees about to initiate foraging (Capaldi et al.,2000 Dukas and Visscher,1994 Ribbands,1953). For each foraging trip, we calculated the trip duration in min, the mass of forage in mg and the food delivery rate, defined as the mass of forage over trip duration.

Concurrently, entries and exits were monitored on a separate set of marked honeybees, which were used for proteomic and enzymatic analyses. These bees were collected at four different life stages: hive bees (11-15 days old),young foragers (2 days of foraging experience), mature foragers (4-11 days of foraging experience) and old foragers (≥12 days of foraging experience). Upon collection, bees were placed on ice and dissected so that only their thoraxes were kept. Thoraxes were immediately frozen in liquid nitrogen and stored at -80°C for future analyses.

It was essential to compare the behaviour of the same individual bees throughout their life to control for the possibility of a positive correlation between foraging performance and lifespan. Hence, following published methods(Dukas and Visscher, 1994), the main statistical analysis involved repeated measures ANOVA on the data set of food delivery rates over the first 7 days by the 24 bees that foraged for at least 7 days. Only four of these 24 bees collected pollen on at least half their foraging trips, precluding a detailed analysis of pollen foragers. To evaluate the effect of senescence, we conducted a second analysis of the performance of the 14 bees that foraged for at least 12 days. Sample sizes were insufficient for analyses beyond 12 days of foraging experience.

Two-dimensional electrophoresis and proteome analysis

Similar sized thoracic sections from nine hive bees and nine mature foragers [mass=27.0±0.6 mg and 26.3±0.1 mg (mean ±s.e.m.), respectively] were individually homogenised in 500 μl of ice cold two-dimensional gel lysis buffer [as detailed elsewhere(Smith et al., 2005)] using a motorised dounce homogeniser. The homogenate was clarified by centrifugation(18 000 g, for 5 min at 4°C) and the supernatant desalted using commercially available protein desalting columns (Pierce, Rockford, IL,USA).

From each homogenate 200 μg total protein was resolved by two-dimensional (2D) gel electrophoresis. All 2D electrophoresis was carried out using the Investigator™ electrophoresis system (Genomic Solutions,Ann Arbor, MI, USA) according to the manufacturer's instructions and using the pre-made rehydration/solubilisation, equilibration and running buffers. Briefly, the first dimension was resolved on pHIash (pH 3-10) immobilised pH gradient (IPG) strips, using the pHaser isoelectric focussing apparatus pre-programmed ramped voltage regimen, for a total of 100 000 volt-hours, and the second dimension was resolved on trycine chemistry/10% duracryl slab gels run on the Investigator™ (Ann Arbor, MI, USA) 2D casting and running apparatus, again using the pre-programmed voltage regimen. After electrophoresis the gels were fixed with water, methanol and acetic acid (in accordance with the instructions provided with the gel stain) and then stained with SYPRO-ruby stain (Genomic Solutions). Imaging of the stained gels was carried out using the Perkin Elmer Pro-Express gel imaging system. The gel images were then analysed using Phoretix 2D™ analytical software version v2004 (Nonlinear Dynamics).

Proteome changes thought to be associated with the transition from hive activity to foraging were selected according to similar criteria to those used by Smith et al. (Smith et al.,2005). The protein spot was present on all hive bee and all foraging bee gels (i.e. the protein was consistently resolved) yet there was a significant change in mean normalized spot volume, a parameter offered by the Phoretix analytical software which combines spot area and intensity to give an overall index of expression, between hive and foraging gels.

Selected protein spots (see Results) were cut from the gel using the Perkin Elmer Pro-Pick robotic work station and the gel plugs preserved in 2% glycerol at 4°C until they were subjected to in-gel trypsin digestion and peptide analysis.

In-gel tryptic digestion and nano electrospray quadropole time of flight mass spectroscopy analysis

The gel plugs, containing the protein spots of interest (see above), were destained with 50 mmol l -1 ammonium bicarbonate, containing 50%acetonitrile and air dried. The proteins were then reduced by adding 30 μl of 10 mmol l -1 dithiotreitol (DTT) in 25 mmol l -1 ammonium bicarbonate to each gel plug and incubating for 1 h at 56°C. After cooling to room temperature, the DTT solution was removed and the gel plugs treated with the same volume of 100 mmol l -1 iodoacetamide in 50 mmol l -1 ammonium bicarbonate. After 60 min incubation at ambient temperature, in the dark, the gel plugs were washed with 30 μl of 25 mmol l -1 ammonium bicarbonate for 15 min and then dehydrated with 100% acetonitrile. After 10 min the liquid phase was removed, and the gel plugs were completely dried in air. The proteins were then subjected to in-gel digestion: 0.015 μg trypsin in 30 μl of 50 mmol l -1 ammonium bicarbonate solution containing 10% acetonitrile was added to each gel plug and these were incubated at 37°C overnight. The digested proteins were desalted and concentrated using a Millipore C18 ZipTip prior to MS analysis and the peptides were finally eluted in 8 μl of 50% aqueous acetonitrile containing 0.2% formic acid. All protein digests were analyzed by a Q-TOF Global Ultima (Micromass Waters, Manchester, UK) with a nanoES source. Capillary voltage was typically 1.2-1.6 kV, cone voltage was 50-100 V and the voltage was 100 V. Mass spectra in time of flight mass spectroscopy (TOF MS)and MSMS mode were in a mass range 50-1800 m/e with a resolution of 8000 full width at half maximum height (FWHM). Argon was used as collision gas.

Normalized spot volumes from hive bees and mature foragers 2D gels were compared by Student's t-test (Statistix analytical software).

Enzyme activities

Thoraxes from each life stage (hive bees, young foragers, mature foragers,old foragers mean ± s.e.m. thorax mass= 27.8±0.8 mg,27.1±0.5 mg, 26.5±0.3 mg and 26.6±0.6 mg, respectively)were powdered using a liquid N2-cooled mortar and pestle. Thorax weight did not differ among life stages (ANOVA F3,39=1.0, P=0.41). Whole thoraxes were then homogenized on ice using a glass on glass homogenizer for 1 min in 20 volumes of extraction buffer consisting of 75 mmol l -1 potassium phosphate (pH 7.3) and 10 mg ml -1 Lubrol® (Suarez et al.,1996). All enzymes were measured at 37°C in a Spectromax Plus 384, 96-well microplate reader (Molecular Devices, Sunnyvale, CA, USA). Assays were performed in triplicate and control rates without substrate were determined for each assay.

Enzyme activity of cytochrome c oxidase (COx), phosphofructokinase(PFK) and hexokinase (HK) were measured on fresh thorax homogenates. Enzyme activity of pyruvate kinase (PK) and citrate synthase (CS) were measured after having been frozen and thawed once and twice, respectively. Nine to eleven thoraxes were used for each life stage. Assays condition were, COx: 50 mmol l -1 potassium phosphate (pH 7.5), 50 μmol l -1 cytochrome c PFK: 10 mmol l -1 fructose 6-phosphate (F6P) (omitted in control), 1 mmol l -1 ATP, 0.15 mmol l -1 NADH, 2 mmol l -1 AMP, 10 mmol l -1 MgCl2, 100 mmol l -1 KCl, 5 mmol l -1 DTT, 1 U aldolase, 5 U triose phosphate isomerase and 5 U α-glycerophosphate dehydrogenase in 50 mmol l -1 imidazole (pH 7.4) HK: 5 mmol l -1 d -glucose (omitted in control), 4 mmol l -1 ATP, 10 mmol l -1 MgCl2, 100 mmol l -1 KCl, 0.5 mmol l -1 NADP, 5 mmol l -1 DTT, 1 U glucose-6-phosphate dehydrogenase, 50 mmol l -1 Hepes (pH 7.4) PK: 5 mmol l -1 phosphoenol pyruvate (PEP omitted in control), 50 mmol l -1 imidazole (pH 7.4), 5 mmol l -1 ADP, 2.5 mmol l -1 MgCl2, 0.15 mmol l -1 NADH, 10 mmol l -1 fructose 1,6-phosphate and 9.25 U lactate dehydrogenase (LDH)CS: 0.5 mmol l -1 oxaloacetate (omitted in control), 0.09 mmol l -1 acetyl-CoA, and 0.1 mmol l -1 dithiobisnitrobenzoic acid (DTNB) in 20 mmol l -1 Tris (pH 8.0).

For each enzyme, we tested whether enzyme activity increased from hive bees to young foragers, mature foragers and old foragers by performing an analysis of variance linear contrast (ANOVA linear contrast SPSS version 12.0, SPSS Inc.). Post hoc analysis was performed using the Dunn-Sidak test.

Author information


Faculty of Computer Science, Technion, Haifa, Israel

Shoval Lagziel & Tomer Shlomi

Faculty of Biology, Technion, Haifa, Israel

Won Dong Lee & Tomer Shlomi

Lokey Center for Life Science and Engineering, Technion, Haifa, Israel

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar


SL and TS designed the study. SL performed the computational analysis. All authors interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.


Expression analysis

Normalized preprocessed data was obtained from GEO (GSE59097) [49]. Probes on the microarray platform GPL18893 were annotated using NCBI’s stand-alone BLAST correcting the gene labels for 647 probes. Only top hits were used specifically, hits with greater than 95% identity, no gaps, and a score of over 100 were used (Additional file 3: Table S8). The R package limma was used to compare artemisinin sensitive and resistant samples collected from Cambodia and Vietnam [127]. Samples with predominantly ring-stage parasites with no detectable gametocytes were used. Resistant parasites were defined as both having at least one mutant Kelch13 allele and a parasite clearance half-life of greater than 5 h (Fig. 1) [49, 128]. Sensitive parasites were defined by having at least no mutant Kelch13 alleles and a parasite clearance half-life of less than 5 h. Random Forest classifiers were built using the R package randomForest, using all ring-stage samples [129]. The metadata classifier used the variables listed in Additional file 2: Figure S2, as outlined in the original study [49]. Cambodian and Vietnamese ring-stage transcriptomes were compared separately to ensure patterns associated with resistance status were reproducible across phylogenies. These countries were chosen for large number of isolates and prevalence of resistance. Microarray probes were screened to remove non-metabolic genes and to keep only one probe per gene (consistent with standard practice). Multiple testing correction was conducted using a false discovery rate [130, 131].

Gene expression data with calculations of fold changes and associated adjusted p-value were incorporated into our curated model using the Metabolic Adjustment for Differential Expression (MADE) algorithm. MADE utilizes statistical significance of gene expression changes along with network context to assign binary gene states (‘on’/‘off’) to each metabolic gene. This constrains the network by limiting flux through reactions mapped to ‘off’ genes while maintaining growth, or a similar objective. An 80% growth threshold was used given that there is no reported evidence that resistant and sensitive parasites produce variable biomass as measured by the size of ring-stage parasites while varying this threshold affects sensitive parasite biomass yield, it does not affect essentiality predictions (data not shown). Essential genes were predicted for the resultant condition-specific models (Fig. 3) by conducting single gene and reaction deletions with established algorithms [132]. Consensus lethal gene and reaction deletions from the Cambodian and Vietnamese parasite models were used.

Flux analysis and metabolic tasks

Flux balance analysis (FBA) is an approach to explore metabolic phenotypes in silico [133]. FBA simulates steady-state flux values for each of the network’s reactions that maximize subsequent flux through an objective function given a set of constraints. We chose biomass production as the objective reaction, consistent with previous studies interrogating gene essentiality [50, 53, 79, 134], and permitted flux through all transport reactions. Constraints on the system include conservation of mass, reversibility of reactions, and reaction localization. Flux variability analysis (FVA) uses a related approach to find the range of fluxes permissible given system constraints [135].

We simulated in vitro experiments and in vivo data to evaluate the model these are our metabolic ‘tasks’ that the reconstruction should pass. We simulate in vitro growth requirements by modifying media components or access to particular metabolites. Metabolite import or production was eliminated from the reconstruction, and subsequent biomass production was observed. Effects of enzyme inhibition, gene knockouts, and metabolite production were also used to evaluate the model. Lethal modifications were defined as changes that resulted in no production of biomass growth-reducing modifications were defined as producing less than 90% of unconstrained flux value [81, 134].

The COBRA Toolbox 2015, Tiger Toolbox (version 1.3.1), and MATLAB R2013b were used for model generation and flux simulations.


Manual curation of an existing P. falciparum metabolic network reconstruction [50] was conducted by a literature review and reference to generic and Plasmodium-specific databases (KEGG, Expasy, and PlasmoDB, MPMP) [43, 136,137,138]. Data obtained from these sources were used to evaluate the inclusion of reactions as well as their stoichiometry, reversibility, localization, and gene annotations. Genetically and biochemically supported reactions were kept and new reactions were added. Reactions were removed if (1) explicitly determined to be false or (2) were nonfunctional and not supported biochemically or genetically. Spontaneous reactions (reactions that occur without enzymes) are noted to differentiate from orphan reactions (reactions with unknown enzyme catalysts).

In order to assess gene essentiality, we used a biomass reaction as the modeling objective function. Thus, flux through this reaction, simulating cellular growth, was maximized for all in silico experimental procedures. We used the biomass reaction from a previous study [50] with modifications. Curation of the biomass reaction was informed by metabolites detected in metabolomics studies [28, 56,57,58] if possible, metabolite ratios were predicted from metabolomics data. We curated the biomass reaction with consideration of published essentiality data metabolites detected in metabolomics experiments with no known catabolism or import pathways were excluded from the biomass reaction.

Essentiality studies

We predicted essentiality by performing single deletion studies with both genes and reactions and double gene deletion studies in our curated model and each expression-constrained sensitive and resistant models. All simulations were performed in an in silico red blood cell environment (Additional file 3: Table S9). Gene deletions were simulated by removing the gene of interest from the model. This change results in the inhibition of flux through all reactions that require that gene to function. If the model could not produce biomass with these constraints, the gene was deemed essential. Growth reducing phenotypes were also observed and noted. For reaction deletion studies, we removed reactions sequentially. Subsequent growth effects were used to determine reaction essentially. Consensus results for resistant or sensitive models are discussed.


  1. Scanlon

    I about it still heard nothing

  2. Mika

    Yes, all logically

  3. Moran

    I think I make mistakes. Write to me in PM, speak.

Write a message