Book Recommendation: Complex Traits and Complex Genetic Architecture

Book Recommendation: Complex Traits and Complex Genetic Architecture

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am looking for a book (or any good source of information) that offers an in-depth discussion and models about the evolution and analysis of complex traits and complex genetic architecture. Do you have any suggestions?

I'd define complex traits as those traits whose variance in the population is explained by quite a lot of various genetic loci and environmental factors.


Evolutionary Genetics : Concepts and Case Studies: Concepts and Case Studies

Edited by Lexington Charles W. Fox Department of Entomology University of Kentucky, Faculty of Life Sciences University of Manchester Jason B. Wolf Lecturer

The mathematical theory of selection, recombination, and mutation

by R Bürger (as suggested by @rg255 in the comments)

Lynch and Walsh's Genetics and Analysis of Quantitative Traits

(as suggested by @kmm in the comments)

PS: Do tell us if you found a better one.

Epigenomics and genotype-phenotype association analyses reveal conserved genetic architecture of complex traits in cattle and human

Lack of comprehensive functional annotations across a wide range of tissues and cell types severely hinders the biological interpretations of phenotypic variation, adaptive evolution, and domestication in livestock. Here we used a combination of comparative epigenomics, genome-wide association study (GWAS), and selection signature analysis, to shed light on potential adaptive evolution in cattle.


We cross-mapped 8 histone marks of 1300 samples from human to cattle, covering 178 unique tissues/cell types. By uniformly analyzing 723 RNA-seq and 40 whole genome bisulfite sequencing (WGBS) datasets in cattle, we validated that cross-mapped histone marks captured tissue-specific expression and methylation, reflecting tissue-relevant biology. Through integrating cross-mapped tissue-specific histone marks with large-scale GWAS and selection signature results, we for the first time detected relevant tissues and cell types for 45 economically important traits and artificial selection in cattle. For instance, immune tissues are significantly associated with health and reproduction traits, multiple tissues for milk production and body conformation traits (reflecting their highly polygenic architecture), and thyroid for the different selection between beef and dairy cattle. Similarly, we detected relevant tissues for 58 complex traits and diseases in humans and observed that immune and fertility traits in humans significantly correlated with those in cattle in terms of relevant tissues, which facilitated the identification of causal genes for such traits. For instance, PIK3CG, a gene highly specifically expressed in mononuclear cells, was significantly associated with both age-at-menopause in human and daughter-still-birth in cattle. ICAM, a T cell-specific gene, was significantly associated with both allergic diseases in human and metritis in cattle.


Collectively, our results highlighted that comparative epigenomics in conjunction with GWAS and selection signature analyses could provide biological insights into the phenotypic variation and adaptive evolution. Cattle may serve as a model for human complex traits, by providing additional information beyond laboratory model organisms, particularly when more novel phenotypes become available in the near future.

Modeling the Genetic Architecture of Complex Traits With Molecular Markers

Author(s): Rongling Wu, Wei Hou, Yuehua Cui, Hongying Li, Tian Liu, Song Wu, Chang-Xing Ma, Yanru Zeng Department of Statistics, University of Florida, Gainesville, FL 32611 USA.


Journal Name: Recent Patents on Nanotechnology

Volume 1 , Issue 1 , 2007


Understanding the genetic control of quantitatively inherited traits is fundamental to agricultural, evolutionary and biomedical genetic research. A detailed picture of the genetic architecture of quantitative traits can be elucidated with a well-saturated genetic map of molecular markers. The parameters that quantify the genetic architecture of a trait include the number of individual quantitative trait loci (QTL), their genomic positions, their genetic actions and interactions, and their responsiveness to biotic or abiotic factors. A variety of genetic designs and statistical models have been developed to estimate and test these architecture-modeling parameters. With the availability of very highly abundant single nucleotide polymorphism markers, DNA sequence variants, i.e., quantitative trait nucleotides (QTNs), which contribute to quantitative variation can be identified. A newly emerging active area - functional mapping, has shown its value to unravel the genetic machinery of dynamic traits at the QTL or QTN level that change their phenotypes with time or other variables. Functional mapping provides a quantitative framework for testing the interplay between genetic effects and trait formation and development and, thus, appeals to push statistical genetic analysis and modeling into the context of developmental biology. Some of the statistical methods for genetic mapping have been patented.

Recent Patents on Nanotechnology

Title: Modeling the Genetic Architecture of Complex Traits With Molecular Markers


Author(s):Rongling Wu, Wei Hou, Yuehua Cui, Hongying Li, Tian Liu, Song Wu, Chang-Xing Ma and Yanru Zeng

Affiliation:Department of Statistics, University of Florida, Gainesville, FL 32611 USA.

2 How Should We Understand Causation?

Much of the classic philosophical literature on causation has focused on the basic question of what counts as a cause, that is, what conditions some relationship must satisfy in order to qualify as a cause-and-effect relation. Central concerns include whether causation can be analyzed noncircularly, in terms of notions that do not presuppose causation, or what the relata of causation are. [ 2 ] Most of this work is not immediately useful for specific scientific problems such as the ones with which this collection of papers is concerned.

In recent decades, however, several novel frameworks for thinking about causation have been developed that are directly applicable in our context. One key example is the causal modeling approach developed in Spirtes et al., [ 3 ] which provides formal tools for statistically inferring causal models from data. A related approach derives from structural equation modeling, [ 4 ] and includes the interventionist framework of Woodward [ 5 ] (see also ref. [ 6 ]). We will focus on Woodward's approach here.

The interventionist approach is based on the idea that causal relationships, unlike mere statistical correlations, can be exploited for purposes of manipulation and control. For two variables X and Y to be causally related, there has to exist some intervention on X that changes the value of Y under a range of background conditions. [ 5 ] An intervention on a gene—via overexpression, knock-down, or knock-out perturbations—often results in a corresponding phenotypic change. As a classic example from developmental genetics, if we were to remove one copy of the brachyury gene from the genome of a mouse, it would result in reduced tail length and defects in the sacral vertebrae if we were to remove both copies, it would lead to embryonic lethality. [ 7 ] Such “if-then” statements regarding what would happen under various possible conditions are called counterfactual conditionals, or counterfactuals.

Interventionism is considered a type of “counterfactual theory” of causation, [ 8-10 ] because relations between variables that are manipulable in the interventionist sense generate true counterfactual statements. Counterfactual views of causation construe causes as difference-makers. [ 8 ] The following counterfactual—if X had not had the value x1, then Y would not have had the value y1—is stating that the value of X makes a difference to the value of Y. In biology, counterfactual theories have been more productive than earlier accounts of causation based on regularities/laws, [ 11, 12 ] or the transfer of matter or energy. [ 13-15 ] There are few, if any, strict laws in biology, and it is neither practical nor desirable to always trace complex biological processes in terms of flows in underlying physical units.


Genetic Map Reconstruction and Seed QTL for M × N RILs

A RIL population derived from soybean parental genotypes Minsoy and Noir 1 was utilized in this study. Previous studies involving this M × N RIL population relied on genetic maps constructed from simple sequence repeat and restriction fragment length polymorphism marker data ( Lark et al., 1995 Mansur et al., 1993 ). To increase the accuracy of this mapping study, we obtained SNP marker data from 1536 loci across the sequenced soybean genome ( Schmutz et al., 2010 ) and reconstructed a 2500 cM soybean genetic map from 557 markers (Suppl. Table S1) that were found to segregate within the M × N RIL population. Twenty-four LGs with an average intermarker distance of 4.7 cM (Suppl. Table S2) formed the genetic map for this population, using the soybean genome sequence ( Schmutz et al., 2010 ) for alignment of reference markers.

Multiple QTL for seed composition have been identified in the M × N RILs ( Lark et al., 1994 Mansur et al., 1993 Orf et al., 1999 ). Soybean plants of the Minsoy genotype typically produce yellow seed of higher oil content and lower protein content than the black-seeded Noir 1 genotype. In field trials from 1992 to 2002, the average seed oil content was 17.43 ± 1.03% from Minsoy and 15.29 ± 0.34% from Noir 1. For these same trials, the average seed protein content was 35.16 ± 0.15% from Minsoy and 37.35 ± 0.84% from Noir 1. Within the M × N RILs, the average seed oil content ranged from 13.64 to 19.20%. The average seed protein content in the M × N RILs ranged from 31.14 to 38.24%. We calculated QTL positions based on the newly refined genetic map using seed oil and protein data collected over the span of two decades (Suppl. Table S3).

The combined seed QTL analyses for the M × N RILs highlighted one QTL for seed oil on Chr 8 and two QTL for seed protein on Chr 4 and 6 that were calculated to explain 10 to 38% of observed trait variation in the population. The existence of coincident seed oil and protein QTL with opposite allelic effects (Suppl. Table S3) for the QTL on Chr 6 and 8 also confirm the inverse correlation that has been observed between these traits ( Brim and Burton, 1979 ). Based on the consensus genetic map (, verified 13 Dec. 2013) and corresponding SNP marker positions, all three QTL appear to coincide with prior evidence for a seed oil or protein QTL in soybean. The seed protein QTL interval on Chr 4 coincides with Prot 19-1 ( Stombaugh et al., 2004 ) and Prot 7-2 ( Orf et al., 1999 ) seed protein QTL. Likewise, the seed protein and oil QTL on Chr 6 colocalizes with Oil 23-1 ( Hyten et al., 2004 ), Oil 4-6, and Protein 21-3 ( Kabelka et al., 2004 ) seed QTL. The seed oil and protein QTL on Chr 8 coincides with Oil 1-1 ( Mansur et al., 1993 ) and Prot 17–4 ( Tajuddin et al., 2003 ) seed QTL.

Genome-Wide Gene Transcript Accumulation during Early Seed Maturation

As an initial assessment of gene expression trait variation, Minsoy and Noir 1 were evaluated for transcript accumulation profiles of 30,681 genes in early- and mid-maturation stages of the developing soybean seed using total RNA processed for hybridization to the Affymetrix Soy GeneChip. A total of 200 genes (Suppl. Table S4) were found to be significantly differentially expressed between Minsoy and Noir 1 at an FDR of 5% and at a fold-change ratio of two or more. Transcripts for genes involved in lipid transport function were among those that were differentially expressed between the two genotypes.

Transcript accumulation data was collected from the immature green seed stage (Suppl. Fig. S1, Suppl. Table S5) corresponding to early seed maturation at the onset of reserve accumulation from 93 members of the M × N RIL population. It is of interest to note that transcript accumulation data on over 16,344 genes supported nonadditive genetic variation through transgressive segregation where gene transcript accumulation values for the RIL population extended beyond the range of the parental values. Previous studies reported similar phenomenon in other species ( Brem and Kruglyak, 2005 Hammond et al., 2011 Potokina et al., 2008 West et al., 2007 ). Within the M × N RIL population, transgressive segregation described the transcript accumulation patterns of more than 50% of gene expression traits evaluated within the population. Transgressive variation was previously observed for reproductive, morphological, and seed traits, including seed yield, evaluated within the M × N RIL population ( Mansur et al., 1993 ). These results were in marked contrast to the number of gene transcripts (200) that were found to be differentially expressed between the parents Minsoy and Noir 1.

Detection of Global eQTLs in the Immature Soybean Seed

Composite interval mapping identified 28,470 eQTL for 15,568 unique genes expressed in the immature seed (Suppl. Fig. S1) of the M × N RIL population (Fig. 1A). Zero to six eQTL were mapped for each gene expression trait, and the LOD scores for these eQTL ranged from 3.38 to 89.37. These eQTL were found to explain anywhere from a few percent to almost 100% of the variation seen in a given gene expression trait however, the range of attributed variation was found to lie predominantly between 10 and 20%. A small portion of the mapped eQTL (3824 or 13.4%) was categorized as cis-acting regulators based on the proximity of the physical gene location to the eQTL genetic position, with the remainder of the eQTL categorized as trans-acting regulators. Because these eQTL have yet to be demonstrated to act in cis or trans, they may be more appropriately termed local or distant regulators ( Rockman and Kruglyak, 2006 ), respectively, but the more familiar terms cis-acting and trans-acting are used for clarity. In this study, the eQTL of a gene was defined as cis-acting if the gene mapped to the same chromosome and was located within 1.575 Mb of the physical location of the SNP marker near the eQTL. This distance was based on the average intermarker spacing of the SNP markers used for the genetic map. Increasing the allowed distance to 5 Mb or 10 Mb extended the number of candidate cis-acting eQTL to approximately 19 or 21% of the total number of eQTL. The proportion of eQTL on any given chromosome that was found to be cis-acting varied from 10.5%, on Chr 7, to 49.6%, on Chr 3.

Genome-wide expression quantitative trait loci (eQTL) in the immature seed of the Minsoy × Noir 1 recombinant inbred lines soybean population. (A) An eQTL scatter plot of physical gene location in total megabases versus eQTL genetic location in total centimorgans. Each eQTL point is color-coded to represent transcript accumulation upregulated by the Minsoy (red) allele or the Noir 1 (blue) allele. Arrows indicate specific examples of allelic bias. (B) The frequency of eQTL mapped to each genetic location is graphed along the 20 soybean chromosomes (Chr). A dashed yellow line denotes the threshold for eQTL hotspots.

Patterns of Gene Regulation in the Immature Soybean Seed

A number of genetic loci were observed to contain significantly more eQTL than expected to appear by chance distribution (Fig. 1B). A threshold was calculated based on the 95th percentile of the maximum number of eQTL detected at any given locus when 28,470 eQTL were randomly distributed across each interval. Positions where the number of mapped loci peaked above the threshold of 39 eQTL per genetic locus were identified and represent putative regulatory hotspots of gene transcription. Many of these mapped to adjoining positions that likely represent the same hotspot, and after accounting for the number of genes per interval, 54 hotspots were considered enriched for eQTL.

The physical gene position and genetic position of the genome-wide eQTLs for each gene are depicted on an eQTL scatter plot (Fig. 1A). Cis-acting eQTL are represented by the diagonal formed across the eQTL scatter plot. All other points on the scatter plot represent trans-acting eQTL. The average LOD of a cis-acting eQTL was higher (11.65) than the average LOD of a trans-acting eQTL (5.65) overall. This result confirms previous reports that cis-acting eQTL are typically of greater effect (higher LOD) than trans-eQTL in genome-wide studies ( Brem and Kruglyak, 2005 Brem et al., 2002 Drost et al., 2010 Keurentjes et al., 2007 Kirst et al., 2005 Morley et al., 2004 Schadt et al., 2003 Vuylsteke et al., 2005 West et al., 2007 ).

Overall, the number of eQTL with allelic effects influencing transcript accumulation in either direction were approximately equal, with 50.18% attributed to the Minsoy allele, and 49.82% to the Noir 1 allele. It was therefore remarkable that a number of eQTL hotspots displayed strong directional bias for allelic effects from a single parent. This directional bias is shown by the color codes for allelic effects in Fig. 1A. On Chr 12, for example, a vertical line of 147 mapped eQTL (red = Minsoy) above genetic position 1554 cM, for example, shows directional bias for the Minsoy allele (136 Minsoy, 11 Noir 1). In the opposite direction, at another eQTL hotspot on Chr 8 at 1023 cM, a directional bias exists for the Noir 1 allele (105 Noir 1, 20 Minsoy). Examples of such directional bias were found on every chromosome.

Validation of the Global eQTL Dataset through eQTL Mapping of Genes Involved in Flavonoid Biosynthesis

Transcriptional regulation of the well-studied flavonoid biosynthesis pathway takes place in the immature seed. Upon mining the eQTL dataset for all genes that annotate to the flavonoid biosynthesis pathway Gene Ontology category, we found that >20% of the eQTL identified for genes in this pathway (adjusted P value = 2.06 × 10 –4 ) mapped to a Chr 8 interval (∼904 total cM, Fig. 2A, Suppl. Table S6). Moreover, the eQTL for flavonoid biosynthesis pathway genes all possessed additive effects with the Noir 1 allele, the genotype with black (versus yellow) seed color. Of these genes, the eQTL for only one gene candidate, the CHS1 gene (Glyma08g11610), was identified as a cis-acting regulator. Using a quantitative measurement of seed coat pigmentation and the genetic map assembled for the M × N RIL population, a seed coat pigmentation QTL was also identified over the Chr 8 interval (Suppl. Table S3) and accounted for over 77% of the seed coat pigmentation trait (LOD > 40) (Fig. 2B). The position of this seed coat pigmentation QTL is consistent with the genomic location of a repetitive cluster of CHS genes that controls seed coat pigmentation through generation of small RNAs that downregulate all CHS gene family members ( Tuteja et al., 2009 ).

(A) Gene location vs. expression quantitative trait loci (eQTL) genetic location scatter plot of eQTL for genes annotating to the flavonoid biosynthesis pathway. Eight genes involved in flavonoid biosynthesis, particularly chalcone synthase (CHS) genes, mapped to the chromosome (Chr) 8 locus ∼904 cM. The Chr 8 cis-acting eQTL for CHS1 was identified at the Inhibitor locus for seed coat color. Each eQTL point is color-coded to represent transcript accumulation upregulated by the Minsoy (red) allele or the Noir 1 (blue) allele. (B) The seed coat pigmentation QTL maps to Chr 8 and colocalizes with the flavonoid biosynthesis eQTL hotspot.

Transcriptional Regulation of Seed-Specific Genes

To identify eQTL hotspots specific to regulation of seed genes, genes that accumulate transcripts in soybean seed tissue alone were identified. The RNA sequencing (RNA-seq) data was obtained from soybean tissues including seed, pod shell, leaf, flower, root, and nodules previously described in a soybean gene expression atlas ( Severin et al., 2010b ) and combined with unpublished RNA-seq data from a related near-isogenic line (sequence read archive data under BioProject PRJNA208048). Differential gene transcript accumulation from the near-isogenic line pair (HiPro and LoPro) was described for four seed stages ( Bolon et al., 2010 ). HiPro is a high seed protein and low seed oil line that is nearly identical in genotype to the low seed protein and high seed oil line LoPro except for introgression regions ( Severin et al., 2010a ) that include the major LG I seed protein QTL on Chr 20 ( Bolon et al., 2010 ). Here, the data from all 14 soybean tissues (leaf, flower, pod, shell [2 stages], seed [7 stages], root, and nodule) was utilized for both genotypes to identify genes with seed-specific expression. The eQTL for these seed-specific genes were highlighted from the global eQTL dataset (Suppl. Table S7). Clusters of seed-specific gene eQTL were found at hotspots on Chr 20 (2498 total cM, Fig. 3A) and Chr 13 (1627 total cM, Fig. 3A). The location of these seed-specific eQTL hotspots did not correspond to the eQTL hotspot with the greatest number of eQTL (Chr 7, Fig. 1B) or to the eQTL hotspot for flavonoid biosynthesis (Chr 8, Fig. 2A).

(Previous page) Regulation of seed-specific and seed pathway genes at three expression quantitative trait loci (eQTL) hotspots. Each eQTL point is color-coded to represent transcript accumulation upregulated by the Minsoy (red) or allele or the Noir 1 (blue) allele. Colored arrows highlight the eQTL hotspots indicated. A) An eQTL scatter plot for seed-specific genes shows an enrichment of seed-specific eQTL that colocalize to an eQTL hotspot on chromosome (Chr) 20 and another on Chr 13. B) An eQTL scatter plot for photosynthesis genes reveals eQTL hotspots on Chr 7, 13, and 20. C) An eQTL scatter plot for genes that annotate to fatty acid biosynthesis show two main loci enriched in fatty acid biosynthesis gene eQTLs on Chr 20 and Chr 7. D) Mapping of eQTL for oleosin genes shows that the majority of oleosin genes are influenced at one of two loci, Chr 20 or Chr 13, also with directional effects. The eQTL hotspot on Chr 20 is common to all of the above. Transcript accumulation of genes with eQTL in these categories are predominately upregulated with the Noir 1 allele at the Chr 7 and Chr 13 eQTL hotspots whereas they are upregulated with the Minsoy allele at the Chr 20 eQTL hotspot.

Transcriptional Regulation of Specific Seed Functional Pathways

Genes with eQTL at the seed-specific eQTL hotspot (2498 total cM) on Chr 20 were examined for enrichment in specific functional categories. Based on Gene Ontology annotations, the most highly enriched categories at the Chr 20 hotspot were for photosynthesis (adjusted P value = 4.65 × 10 –16 ) and fatty acid biosynthetic process (adjusted P value = 7.1 × 10 –8 ) (Suppl. Fig. S2). The eQTL for all genes with either photosynthesis or fatty acid biosynthetic process annotations were subsequently highlighted. Photosynthesis gene eQTL were found to cluster to three regions of the genome, including hotspots on Chr 7, 20, and 13 (Fig. 3B, Suppl. Table S8). Hotspots for eQTL of fatty acid biosynthesis genes were found on Chr 20 and 7 (Fig. 3C, Suppl. Table S9 and S10). Examination of the hotspot on Chr 7 also showed that it was enriched in photosynthesis gene eQTLs (adjusted P value = 3.5 × 10 –30 ) (Suppl. Fig. S3). It is noteworthy that the majority of seed-specific, photosynthesis, and fatty acid biosynthesis genes with eQTL that mapped to the hotspot on Chr 20 (∼2498 cM) showed upregulation of transcript accumulation with the presence of the Minsoy allele (Fig. 3A–3C, Suppl. Fig. S4–S5, Suppl. Tables S7–S9) despite the existence of more eQTL of Noir 1 effect (137 versus 132) at this locus. Surprisingly, eight eQTL for oleosin genes also specifically mapped to this interval on Chr 20 (Fig. 3D, Suppl. Table S9). The expression of these oleosin genes was upregulated with the presence of the Minsoy allele, the genotype with higher seed oil ( Orf et al., 1999 ), consistent with evidence that seeds with higher oil content possess more oleosins ( Parthibane et al., 2012a Siloto et al., 2006 ).

Multifaceted Regulation of Seed Gene Expression and Relationships with eQTL Hotspots

Complex patterns of directional bias were observed for genes regulated by multiple eQTL. We created a network (Fig. 4, Suppl. Table S11) from the eQTL data to observe the connections among the three eQTL hotspots on Chr 20, 7, and 13 (Fig. 3) with genes for specific seed functional pathways clustered at these hotspots. The majority of these genes with eQTL at the Chr 20 hotspot were found to be upregulated with the Minsoy allele at the Chr 20 hotspot (Fig. 4, Suppl. Table S11). In contrast, the majority of these genes with eQTL at the Chr 7 and 13 hotspots were found to be upregulated with the Noir 1 allele at the respective loci. It is noteworthy that patterns of opposing directional bias were observed for some genes regulated by more than one eQTL. One example of this phenomenon involved a subset of genes with eQTL mapping to the ∼2493 to 2498 total cM interval on Chr 20 and ∼797 cM on Chr 7 (Fig. 4, Suppl. Fig. S5, Suppl. Table S12). This network showed that transcript levels for a subset of genes are upregulated with the presence of the Minsoy allele at the 2493 locus, and transcript levels of the same genes were upregulated with the presence of the Noir 1 allele at the 797 locus. Genes with upregulated transcript levels with the Minsoy allele included a number of photosynthesis-related genes including genes for plastocyanins, PsaN (Photosystem I reaction center subunit PSI-N), PsaF (Photosystem I subunit F), and LHCB3 and LHCB5 (light-harvesting chlorophyll-binding proteins). Upregulation of transcript levels for fatty acid biosynthesis-related genes, including FAH1 (Fatty Acid Hydroxylase 1) and MOD1 (Mosaic Death 1), an enoyl-acyl carrier protein (ACP) reductase subunit of a complex that catalyzes de novo synthesis of fatty acids ( Mou et al., 2000 ) also corresponded to the presence of the Minsoy allele. In addition, all 12 eQTL that mapped to the 2493 to 2498 cM interval for fatty acid biosynthesis gene transcripts were upregulated with the presence of the Minsoy allele, while all nine eQTL that mapped to the 797 cM locus for fatty acid biosynthesis genes were upregulated with the presence of the Noir 1 allele. These findings are consistent with the presence of higher seed oil content in the Minsoy parent versus the Noir 1 parent and the role of photosynthesis and fatty acid biosynthesis in seed oil accumulation.

Connections among three major expression quantitative trait loci (eQTL) hotspots involved in major seed and seed-specific processes. A network representation depicts interactions among photosynthesis, fatty acid (FA) biosynthesis, oleosin, and other seed-specific genes with shared eQTL depicted separately in Fig. 3. Supplemental Table S8 displays the gene and eQTL data represented in this diagram. Gray nodes at the top represent the three major eQTL hotspots at loci on chromosomes (Chr) 7 (∼797 total cM), 13 (∼1627 total cM), and 20 (2493–2498 total cM). Green nodes = photosynthesis genes. Pink nodes = fatty acid biosynthesis genes. Orange nodes = oleosin genes. Yellow nodes = seed-specific genes other than the oleosin genes. Connections for genes with transcript accumulation upregulated by the presence of the Minsoy allele are shown with red lines. Connections for genes with transcript accumulation upregulated by the presence of the Noir1 allele are shown with blue lines. Waved lines indicate genes that are within cis-acting distances from the hotspot locus.

Regulatory Gene Candidates at the Chromosome 20 Seed eQTL Hotspot

Two eQTL were found to be candidate cis-acting regulators out of the 269 eQTL at the 2498 locus (Suppl. Table S9): one of the oleosin genes (Glyma20g33850) and a BME3 (Blue Micropylar End 3) GATA (DNA binding motif) transcription factor gene (Glyma20g32050). Although several oleosin genes exist in soybean, the oleosin gene Glyma20g33850 aligned with the greatest homology to the OLE3 (Oleosin 3) gene in peanut (Arachis hypogaea L.). Oleosin 3 from the immature peanut seed was recently shown to possess diacylglycerol biosynthesis and phosphatidylcholine hydrolysis enzymatic activity that provides evidence for a direct role in increasing oil content through biosynthesis of triacylglycerol from monoacylglycerol ( Parthibane et al., 2012a ). Sequencing of the BME3 soybean gene in Minsoy versus Noir 1 genotypes revealed sequence polymorphisms corresponding to missense mutations in four amino acids (T→A, Q→L, V→F, S→T) conserved between soybean and Arabidopsis BME3 transcription factors. The binding motif for BME3, WGATAR, was also found in the promoter regions of 251 out of 257 genes with eQTL mapping to the 2498 locus.

Examination of all 319 genes located at the Chr 20 eQTL hotspot showed that there were only eight seed-specific genes and 34 transcription factor genes at this location. Gene transcript accumulation data from RNA-seq profiles compiled from near-isogenic lines HiPro and LoPro with contrasting seed protein and oil content ( Bolon et al., 2010 Severin et al., 2010b ) show that the seed-specific gene with the highest expression at this locus is the oleosin gene Glyma20g33850 (Fig. 5A, Suppl. Table S13). Moreover, gene transcript accumulation was highest in the genotype with higher seed oil content (LoPro, Fig. 5B, Suppl. Table S13). Among the transcription factor genes at this locus, transcript accumulation for the BME3 transcription factor gene Glyma20g32050 was also the highest, with slightly higher overall transcript accumulation in the LoPro genotype (Fig. 5C, Suppl. Table S13). High levels of gene expression highlighted these two genes from among the 319 genes shown to reside at the Chr 20 eQTL hotspot in the reference soybean genome. Moreover, these same two genes (Glyma20g33850 and Glyma20g32050) were the only genes at this locus with seed eQTL in the M × N RILs that mapped back to the Chr 20 eQTL locus.

Ribonucleic acid sequencing (RNA-seq) evidence for genes located at the expression quantitative trait loci (eQTL) hotspot on chromosome (Chr) 20. (A) The y axis shows RNA-seq RPKM (reads per kilobase of transcript per million mapped reads) counts for each tissue represented as a colored bar. Seed-specific genes at the locus are shown across the x axis with colored bars representing different seed stages. Eight seed-specific genes reside at this locus, and the seed-specific gene with the highest evidence of transcript accumulation is the oleosin gene Glyma20g33850. (B) The y axis shows RNA-seq RPKM counts for each genotype (LoPro = Red, HiPro = Blue) represented as a colored bar. The x axis shows a timeline of seed developmental stages. Transcript accumulation of Glyma20g33850 is higher in LoPro than HiPro during seed development. (C) The y axis shows RNA-seq RPKM counts for each tissue represented as a colored bar (leaf, flower, pod, shell [stages –1 and –2], seed [stages –2, –1, 0, 1, 2, 3, 4], root, and nodule tissues). Transcription factor genes at the locus are shown across the x axis. The transcription factor gene at this locus with the greatest evidence for transcript accumulation is the GATA (DNA binding motif) gene Glyma20g32050. Supplemental Table S13 shows the RNA-seq read evidence for genes in this region in LoPro vs. HiPro across this range of tissues.

EQTL that Colocalize with Seed Phenotypic QTL

To identify other potential genes and pathways that correlate with seed oil and protein accumulation in the immature seed, we also examined eQTL for colocalization with seed oil and protein QTL locations mapped in the M × N RIL population. Expression QTL for 1598 unique genes were found to map to seed oil and protein QTL intervals in this population (Suppl. Fig. S6, purple bars represent seed QTL intervals from Suppl. Table S3). A list of 129 unique genes with cis-acting eQTL that colocalized to the region of a seed protein or oil QTL was compiled and included genes with lipid-associated annotations (Suppl. Table S14). A cis-acting eQTL that overlaps an oil QTL on Chr 8 was also mapped for a gene with homology to the COMATOSE (CTS) ATP-binding cassette (ABC) transporter gene in Arabidopsis.

Mapping trigenic interactions quantitatively and surveying the global trigenic landscape

To explore the trigenic interaction landscape, we designed query strains that sampled three key quantitative features of our global digenic interaction network (7). We designed query strains carrying mutations in two genes spanning a range of the following features: (1) digenic interaction strength, (2) number of digenic interactions (average digenic interaction degree), and (3) digenic interaction profile similarity (Fig. 1A and table S1). Gene pairs were selected to fill bins of varying digenic interaction attributes and to cover all major biological processes in the cell, thus producing a sample that would provide a diverse survey of the trigenic interaction landscape. We largely focused on unambiguous singletons because duplicated genes represent a relatively small subset of genes and thus can only represent a small fraction of the global trigenic interaction network. For this survey, we constructed 151 double-mutant query strains and 302 single-mutant strains, encompassing 47 temperature-sensitive alleles of different essential genes and 255 deletion alleles of nonessential genes. The query strains in this set were selected to span the different digenic attribute bins according to predefined thresholds (table S1). An additional 31 double-mutant queries fell outside of the defined thresholds but were included for validation and comparison purposes (data S1 to S3) (16). The fitness of the resulting query strains was measured using a quantitative growth assay, and the behavior of the single- and double-mutant query strains showed strong agreement with our previously published data set (figs. S1 and S2 and data S4) (7, 15).

(A) Criteria for selecting query strains for sampling trigenic interaction landscape of singleton genes in yeast. The gene pairs were grouped into three general categories based on a range of features: (1) Digenic interaction strength. Gene pairs were directly connected by zero to very weak (digenic interaction score: 0 to –0.08, n = 74 strains), weak (–0.08 to –0.1, n = 32), or moderate (<–0.1, n = 45) negative digenic interactions. (2) Number of digenic interactions. Gene pairs had a low (10 to 45 interactions, n = 50), intermediate (46 to 70, n = 53), or high (>71, n = 48) average digenic interaction degree (denoted by the number of black edges of each node). (3) Digenic interaction profile similarity. Gene pairs had low (score: –0.02 to 0.03, n = 46 represented by genes A and B, which show a relatively low overlap of genetic interactions with genes K to R), intermediate (0.03 to 0.1, n = 59 represented by genes C and D, which display an intermediate overlap of genetic interactions), or high (>0.1, n = 46, represented by genes E and F, which display a relatively high level of overlap of genetic interactions) functional similarity, as measured by digenic interaction profile similarity and coannotation to the same GO term(s). Query mutant genes were either nonessential deletion mutant alleles (Δ) or conditional temperature-sensitive (ts) alleles of essential genes. (B) Diagram illustrating the triple-mutant SGA experimental strategy. To quantify a trigenic interaction, three types of screens are conducted in parallel. To estimate triple-mutant fitness, a double-mutant query strain carrying two desired mutated genes of interest (red and blue filled circles) is crossed into a diagnostic array of single mutants (black filled circle). Meiosis is induced in heterozygous triple mutants, and haploid triple-mutant progeny is selected in sequential replica pinning steps. In parallel, single-mutant control query strains are used to generate double mutants for fitness analysis. (C) Triple-mutant SGA quantitative scoring strategy. The top equation shows the quantification of a digenic interaction, where εij is the digenic interaction score, ƒij is the observed double-mutant fitness, and the expected double-mutant fitness is expressed as the product of single-mutant fitness estimates ƒiƒj. In the bottom equation, the trigenic interaction score (τijk) is derived from the digenic interaction score, where ƒijk is the observed triple-mutant fitness and ƒiƒjƒk is the triple-mutant fitness expectation expressed as the product of three single-mutant fitness estimates. The influence of digenic interactions is subtracted from the expectation, and each digenic interaction is scaled by the fitness of the third mutation.

Trigenic interaction screening required development and implementation of three operational components. First, synthetic genetic array (SGA) analysis—an automated form of yeast genetics that is often used to cross a query gene mutation into an array of single mutants to generate a defined set of haploid double mutants (6)—was adapted such that a double-mutant query strain could be crossed into an array of single mutants to generate triple mutants for trigenic interaction analysis (Fig. 1B). Because the identification of a trigenic interaction requires comparison with the corresponding double mutants, we also conducted screens in which the individual mutants of the query gene pair were scored for digenic interactions (Fig. 1B). Second, for experimental feasibility, we assembled a diagnostic array of 1182 strains, comprising 990 nonessential gene deletion mutants and 192 essential gene mutants carrying temperature-sensitive alleles, which combine to span

20% of the yeast genome (data S5). The diagnostic array was designed to be highly representative of the rest of the genome in terms of exhibited genetic interaction profiles (fig. S3). Briefly, array strains were selected from a larger genetic interaction data set for their ability to represent different regions of the global network in a minimally redundant way. This was accomplished by iteratively selecting strains to maximize the performance of profile similarities when predicting coannotations to a functional gold standard (17). Third, we developed a scoring method, the τ-SGA score, which combines double- and triple-mutant fitness estimates derived from colony size measurements to identify trigenic interactions quantitatively (Fig. 1C). The τ-SGA score differs from the MinDC score reported previously (18), because it accounts for all cases in which two of the genes are not independent, resulting in an expectation that contains digenic interaction effects scaled by the fitness of the noninteracting genes (fig. S4) (16). The final trigenic τ-SGA interaction score then accounts for digenic effects but also enables detection of trigenic interactions in which digenic effects of insufficient explanatory power can be found.

We focused exclusively on the analysis of deleterious negative trigenic interactions for two reasons. First, quantitative scoring of negative genetic interactions is often more accurate than that for positive interactions because there is a greater signal-to-noise ratio for negative genetic interactions. Hence, negative genetic interactions are associated with lower false-positive and false-negative rates than positive interactions (8), a feature that is important for the robust statistical analysis necessary to differentiate true trigenic interactions from the extensive background digenic network. Second, negative digenic interactions are generally more functionally informative than positive digenic interactions (8), and thus the large-scale mapping of a negative trigenic interaction network is expected to provide the most mechanistic insight into gene function and pathway wiring.

The Jones Lab

We study the evolution of complex traits by simulating phenotypes determined by multiple loci and environmental effects. These traits harbor genetic variation, which is critical for evolution to occur. The genetic variance is often summarized in a matrix, called the G-matrix, which contains additive genetic variances and additive genetic covariances. The G-matrix describes the genetic architecture of complex traits. The Jones Lab, in collaboration with Steve Arnold (Oregon State University) and Reinhard Bürger (University of Vienna), has been using individual-based simulations to study the evolution of the genetic architecture (and other related issues).

The software we have used for our papers is freely available on GitHub. To learn more about these packages, consult our program note:

Jones, A. G., R. Bürger, and S. J. Arnold. 2018. The G-matrix simulator family: software for research and teaching. Journal of Heredity 109:825-829.

To write your own G-matrix simulation software, consult Adam Jones’ book:

C++ for Biologists: Evolutionary Models

(A free pdf is available here, or a hard copy can be purchased from Amazon)

Here is a summary of the available software packages:

G-matrix Simulator 2014: A Windows-based simulator that contains the code to produce the results reported by Jones et al. 2003, 2004 and 2012. This simulator has a graphical user interface and only works on Windows-based machines. Relevant papers:

Jones, A. G., S. J. Arnold, and R. Bürger. 2003. Stability of the G-matrix in a population experiencing pleiotropic mutation, stabilizing selection, and genetic drift. Evolution 57:1747-1760.

Jones, A. G., S. J. Arnold, and R. Bürger. 2004. Evolution and stability of the G-matrix on a landscape with a moving optimum. Evolution 58:1639-1654.

Jones, A. G., R. Bürger, S. J. Arnold, P. A. Hohenlohe, and J. C. Uyeda. 2012. The effects of stochastic and episodic movement of the optimum on the evolution of the G-matrix and the response of the mean to selection. Journal of Evolutionary Biology 25:2210-2231.

G-matrix Home Version: This version of the simulator is similar to the 2014 version under the hood, but it has been streamlined to make it more usable in an instructional setting. This version is used in the Evolutionary Quantitative Genetics Workshop offered by Steve Arnold and Joe Felsenstein every summer.

G-matrix Command Line: A version of the simulator without a graphical user interface. The source code should compile for any operating system with a standard C++ compiler.

Local Adaptation and Epistasis: With this simulator, we explored the effects of pleiotropy and epistasis on the evolution of local adaptation. We also investigated the feasibility of detecting the loci affecting the trait in genome-wide scans of population differentiation. The results of this study are reported in the following paper:

Jones, A. G., S. J. Arnold, and R. Bürger. 2019. The effects of epistasis and pleiotropy on genome-wide scans for adaptive outlier loci. Journal of Heredity, in press.

Book Recommendation: Complex Traits and Complex Genetic Architecture - Biology

The research interest of our group is to understand the genetic architecture of complex quantitative traits important to modern agriculture, environmental quality, evolutionary biology, and biomedicine. We study fundamental genetic processes that underlie phenotypic variation and evolution across time and space scales and functional signals. The extraordinarily high complexity of such genetic processes has prompted and intrigued us to develop powerful experimental designs and statistical models for dissecting their underlying mechanisms with the aid of analytical tools from other disciplines, such as control theory and game theory.

In nature, no cell or organism can grow in isolation. Instead, they exist with other interacting members in a community or ecosystem. We have incorporated evolutionary game theory to model how such ecological interactions affect the phenotypic formation of any individual in a population. Mathematical equations have been established to quantify the independent growth of an individual (i.e., growth by assuming that it is in isolation) and its interactive growth with other conspecifics in the shared environment. Our group is interested in developing game-theoretic models for mapping ecological QTLs that mediate independent growth vs. interactive growth through competition and collaboration. These models are being generalized to ecological interactions that take place at a wide range of levels of organization from proteins and RNA to cells to complex organisms.

We have always enjoyed generating aggressive ideas for our genetic and genomic research. On one hand, we have developed new theory, designs, and methods simulated from the latest discoveries of quantitative genetics and biology, which are used for next-level hypothesis tests by other researchers. On the other hand, established theoretical ideas are further used to design new experiments and collect new experimental data from which to gain new insight into various genetic processes.

The Liu Lab

Statistical Genetics: We are interested in developing novel statistical methods and computational tools for analyzing large scale genomic datasets. We have developed methods for analyzing rare variant association studies using sequence data, approaches for integrating multiple omics datasets, as well as efficient tools for large scale data analysis. Our methods and tools are being actively applied in hundreds of genetic studies worldwide.

Addition Genetics: We are actively pursuing a better understanding on the genetic basis for nicotine addiction. To do so, we seek to aggregate very large datasets on tobacco use phenotypes (in collaboration with the GSCAN consortium), integrate phenotypes of nicotine metabolites, smoking topography and tobacco use (in collaboration with TCORS at Penn State), and develop powerful and scalable methods that enable these analyses.

Functional Biology of X-inactivation We aim to understand the genetic regulation of X-inactivation using integrative genomics approach and apply these methods to study lupus genetics.


Core Ideas

  • Functional mapping uncovers the genetic architecture of shoot growth dynamics.
  • Gibberellic acid is an underlying component for natural variation for shoot growth dynamics in rice.
  • Genomic prediction is effective for improving early growth dynamics.

Early vigor is an important trait for many rice (Oryza sativa L.)-growing environments. However, genetic characterization and improvement for early vigor is hindered by the temporal nature of the trait and strong genotype × environment effects. We explored the genetic architecture of shoot growth dynamics during the early and active tillering stages by applying a functional modeling and genomewide association (GWAS) mapping approach on a diversity panel of ∼360 rice accessions. Multiple loci with small effects on shoot growth trajectory were identified, indicating a complex polygenic architecture. Natural variation for shoot growth dynamics was assessed in a subset of 31 accessions using RNA sequencing and hormone quantification. These analyses yielded a gibberellic acid (GA) catabolic gene, OsGA2ox7, which could influence GA levels to regulate vigor in the early tillering stage. Given the complex genetic architecture of shoot growth dynamics, the potential of genomic selection (GS) for improving early vigor was explored using all 36,901 single-nucleotide polymorphisms (SNPs) as well as several subsets of the most significant SNPs from GWAS. Shoot growth trajectories could be predicted with reasonable accuracy using the 50 most significant SNPs from GWAS (0.37–0.53) however, the accuracy of prediction was improved by including more markers, which indicates that GS may be an effective strategy for improving shoot growth dynamics during the vegetative growth stage. This study provides insights into the complex genetic architecture and molecular mechanisms underlying early shoot growth dynamics and provides a foundation for improving this complex trait in rice.


E arly vigor , defined as a plant's ability to accumulate shoot biomass rapidly during early developmental stages, is critical for stand establishment, resource acquisition, and, ultimately, yield. The rapid emergence of leaves leads to early canopy closure, which reduces soil evaporation, thereby improving seasonal water use efficiency and conserving water for later vegetative growth and grain production. In rice, early vigor is a particularly important trait for regions where rice is direct seeded ( Mahender et al., 2015 ). As the cost of labor rises, a shift from the labor-intensive practice of transplanted rice to direct-seeded rice is the expected solution to solve this problem ( Mahender et al., 2015 ).

Several studies have examined seedling vigor in rice and elucidated the underlying genetic basis using conventional phenotyping strategies under field and greenhouse conditions ( Redoña and Mackill, 1996 Lu et al., 2007 Cairns et al., 2009 Rebolledo et al., 2012a 2012b , 2015 Liu et al., 2014 ). In a recent study by Rebolledo et al (2015) , multiple vigor-related traits such as plant morphology and nonstructural carbohydrates were quantified in a rice diversity panel of 123 japonica varieties ( Rebolledo et al., 2015 ). The authors integrated multiple phenotypic metrics in a functional–structural plant model, called Ecomeristem, and performed GWAS mapping using phenotypic metrics and model parameters as trait values ( Luquet et al., 2012 Rebolledo et al., 2015 ). Such multitrait approaches provide a more comprehensive understanding of the biochemical and genetic basis of early vigor than conventional single trait approaches.

Early vigor is a function of time. The timing of developmental switches that initiate tiller formation and rapid exponential growth are a crucial component of this trait. However, despite this temporal dimension, most studies have assessed the genetic basis of early vigor at one or a few discrete time points ( Redoña and Mackill, 1996 Lu et al., 2007 Cairns et al., 2009 Rebolledo et al., 2012a 2012b , 2015 Liu et al., 2014 ). Such approaches are overly simplistic and may only provide a snapshot of the genetic determinants that cumulatively influence the final biomass. However, sampling for biomass at high frequencies over a developmental window for mapping populations using conventional destructive phenotyping approaches would require tens to hundreds of thousands of plants and be highly labor-intensive. With the advent of high-throughput image-based phenomic platforms, plants can be phenotyped nondestructively more frequently throughout their growth cycle to examine the temporal dynamics of physiological and morphological traits ( Berger et al., 2010 Golzarian et al., 2011 Busemeyer et al., 2013 Topp et al., 2013 Moore et al., 2013 Würschum et al., 2014 Hairmansis et al., 2014 Slovak et al., 2014 Yang et al., 2014 Honsdorf et al., 2014 Chen et al., 2014 Bac-Molenaar et al., 2015 ).

Mathematical equations that describe a developmental or physiological process can be applied to this high-resolution temporal data to describe temporal growth trajectories using mathematical parameters. Several models, such as logistic, exponential, and power-law functions, have been used to describe plant growth ( Paine et al., 2012 ). These approaches enhance the temporal resolution of phenotyping and, when combined with association or linkage mapping, improve the power to detect genetic associations for complex traits compared with traditional cross-sectional approaches ( Wu and Lin, 2006 Xu et al., 2014 Campbell et al., 2015 ). However, despite the recent advances in phenotyping technologies, the genetic basis of early growth dynamics in rice or other cereals remains largely unexplored.

Multiple and sometimes uncorrelated phenotypes determine the rate and extent of vegetative growth in crops. We hypothesize that capturing growth dynamics at a higher temporal resolution can help elucidate the genetic basis of this trait. To this end, we sought to examine the genetic architecture of temporal shoot growth dynamics during the early and active tillering stages (8–27 d after transplanting (DAT) and 19–41 DAT) in rice. A panel of ∼360 diverse rice accessions was phenotyped using a nondestructive image-based platform and temporal trends in shoot growth were modeled with a power-law function ( Zhao et al., 2011 ). We provide insights into the genetic basis of shoot growth by using GWAS analysis. The underlying molecular mechanisms were explored using RNA sequencing on a subset of the diversity panel during the early tillering stage. Genomic selection of the model parameters and daily estimates of shoot biomass suggest that GS may be an effective strategy for improving early vigor in rice.

Book Recommendation: Complex Traits and Complex Genetic Architecture - Biology

How does genetic variation impact phenotypic traits, both at the organismal and cellular level (including an emphasis on gene regulation)? What are the molecular pathways from genetic variation to cellular and organismal phenotypes? Why does so much of the genome contribute to the genetic basis of complex traits?

Our lab includes people trained in a variety of different fields using computational and experimental approaches to tackle these problems. We often work on problems where there are no off-the-shelf statistical methods. Thus, an important part of our work is in developing appropriate statistical and computational approaches that can yield new insights into biological data.

Useful links

Lab News

And welcome to our new PhD students Alyssa Fortier and Roshni Patel and new postdocs: Sahin Naqvi, Jeff Spence, Hakhamanesh Mostafavi and Clemens Weiss!

May: New papers include our latest work on understanding genetic architecture of complex traits (the "Omnigenic 2" paper) with Xuanyao Liu and Yang Li: [Link] and our paper with the Criswell, Marson and Greenleaf labs on the chromatin structure of immune cells (led by Diego in our lab, with Michelle Nguyen and Anja Mezger): [Link] Congratulations to Diego, Xuanyao, Yang and the rest of the teams!

January: Welcome to Shaila Musharoff who is joining us this month!

December: Farewell to David and Emily! They will both be greatly missed. David will be starting his own lab at NY Genome Center, and Emily is taking her genomics expertise into the consulting world.

9/18/2018. News roundup:

Comings and goings:
August: Welcome to our new postdocs Jake Freimer and Yuval Simons!

Eilon took a job as one of the first scientists at the new firm Insitro, founded by Daphne Koller. Bon Voyage to Eilon!

Emily defended her thesis! Congratulations Emily! She will stay in the lab through the end of the year to finish papers.

Congratulations to Natalie and Ben, both lab alumni, on their lovely wedding.

July: Lab alumna Alexis Battle was awarded early tenure in Biomedical Engineering at Johns Hopkins. Wonderful news!

June: Hannah M received the Robert Baynard Textor award from the Department of Anthropology, for 'creativity in anthropology'. Well done, and well-deserved, Hannah!

May: Arbel graduated and moved to a postdoc position in Molly Przeworski's lab at Columbia. In August he and his wife Maya welcomed twin baby boys! Congratulations! This is the third twin pair in the lab in the last 7 years. Is this a significant enrichment?

Jonathan's plenary talk from PEQG on the Omnigenic model (with Evan, Yang and Xuanyao) is online here.

March: Natalie graduated and moved to a position as a staff scientist at Ancestry. Congratulations Natalie! We miss you!

January: Kelley has moved to start her own lab in the Department of Genome Sciences at University of Washington. We are excited to follow her progress in Washington!

10/25/2017. News roundup:

October: ASHG: Natalie gave a well-received plenary talk to 7000 people on her side-project on conference gender dynamics and Nasa, Yang, Emily and David also gave very nice platform presentations. Well done everyone!

Update: Nasa won ASHG's best student talk (for work with Melina Claussnitzer). Well done!!

We have several papers out or about to come out in journals: Eilon on using cfDNA to measure transplant rejection Yang and David on LeafCutter (splicing) Diego on estimating cell types driving complex traits Arbel on NAGC in gene duplicates Jessica on triclosan and microbiomes (working with Ami's lab).

Congratulations to Ziyue on her wedding!

September: Yang and Xun moved on to start their own labs. Bon voyage!!

Congratulations to Harold who has been awarded a prestigious Hannah Gray postdoc-faculty fellowship from HHMI.

June: Welcome Nasa, Hannah and Margaret to the lab!

6/22/2017. News roundup for the last few months:

June: release of our perspective piece on the 'omnigenic' model of complex traits, with lead authors Evan Boyle and Yang Li [PDF Link]. Our paper stimulated a lot of discussion on social media and elsewhere, including Ed Yong's article in the Atlantic and in Stanford news.

May: Congratulations to 3 of our postdocs who have accepted assistant professor faculty positions for the coming academic year: Kelley Harris (U Washington, Genome Sciences), Yang Li (U Chicago, Genetic Medicine), and Xun Lan (Tsinghua U, Basic Medical Sciences).

And Kelley also received a prestigious faculty transition award through the CASI program at Burroughs Wellcome Fund.

Natalie and Arbel received graduate fellowships from CEHG. Well done! And Jonathan will become co-director of CEHG starting in June.

Diego's paper on inferring cell types that drive disease is up on bioRxiv.

April: Kelley had a very nice paper building on striking results from her PhD work: Rapid evolution of the human mutation spectrum, out this month in eLife [Link].

Harold, Evan and David each have new papers out about their work with other labs: Sleuth [Harold] and Cas9 Binding [Evan] and ASE for detecting GxE [David].

March: Anand has moved to a data scientist position at Facebook. We'll miss his amazing technical expertise and informal contributions to many projects in the lab! Bon Voyage, Anand!

December 2016. Arbel and Anand's work on how recurrent mutation alters the site frequency spectrum in large samples is now out in PLOS Genetics: [Link]. Congratulations!

Natalie had a fun article in which she applied computational methods to study the history of the journal Genetics since 1917 [Link]. [Timelapse plot of author locations]

November. Yair, Evan, and Natalie's paper on SDS and polygenic adaptation: Detection of human adaptation during the past 2000 years is out now in Science [Link]. Congratulations!

9/20/2016. Welcome to Harold Pimentel who joined the lab last month!

Also, Eilon has a paper out in Nature Genetics showing trans-interactions (i.e., trans-eQTLs) between MHC protein alleles and the expression of T cell receptor genes. With help from Leah Sibener and Chris Garcia we were able to interpret these in terms of physical interactions in the protein structure [Link]

6/1/2016. Lots of exciting news to report from April and May:

Anil's paper on using ribosome profiling data to detect novel open reading frames is now online at eLife [Link].

Xun's paper on the evolution of gene duplicates was published in Science [Link]. This received some nice attention, including in a blog post by Francis Collins.

Yair, Evan and Natalie's paper on very recent polygenic adapation is now out on bioRxiv [Link]. This got lots of attention, including news pieces in Nature and Science.

Yair and Yang both gave talks at Cold Spring's Biology of Genomes Meeting. It was also great to see talks from lab alumni Alexis Battle, Joe Pickrell and Dan Gaffney, and to get to hang out with many other colleagues, friends, and other lab alumni.

Here you can see a pair of great sketches of Yang and Yair, drawn by Alex Cagan.

4/28/2016. Well done Yang, whose paper RNA splicing is a primary link between genetic variation and disease is out today [Link]. This paper uses data from 7 years of joint projects between our lab and Yoav's lab to provide a detailed accounting of the links between genetic variation, variation in gene regulation and disease.

3/29/2016. Congrats to Evan (NSF predoc!). Also to Yang and David K on their Leafcutter paper on quantification of RNA splicing, out now on bioRxiv.

1/4/2016. Farewell and best wishes to Bryce (postdoc, Alkes Price lab, Harvard) and to Graham and Kyle, starting their own labs at the Salk Institute and UCSD, respectively.

11/20/2015. Congratulations to Bryce for his masterful PhD defense in Chicago. It was a bittersweet occasion, marking the end of the U Chicago era.

9/14/2015. Bryce and Graham's WASP paper (QTL mapping with allele-specific reads) is out today in Nature Methods.

8/18/2015. We enjoyed a farewell dinner at the Counter on Friday for Audrey and Towfique (starting their own labs at U of Idaho and Mt Sinai, respectively). Bon voyage!

A big welcome to Kelley Harris and Ziyue Gao who are joining us as new postdocs from Rasmus Nielsen and Molly Przeworski's labs!

Welcome to short term visitor Dan Lawson and also Alexis Battle who is back to visit us briefly this month.

Congrats to Eilon and Michal on birth of their son Yuval!

In the grants department, well done to Kelley (NRSA), David G (EMBO/LLHF), Yang (CEHG), Jessica (NSF predoc)! And Natalie and Evan got a pair of internal research grants, one from Genetics and one from Systems Biology. Kudos.

5/22/2015. JKP nearly done teaching. Catching up on news: Congratulations to Graham who is taking a faculty job at the Salk Institute!!

And congratulations to Towfique who is taking a faculty job at Mt Sinai.

And more congrats to Yang who has been awarded a CEHG postdoc fellowship for next year!

And Xun's paper on evolution of gene duplications is out on bioRxiv.

4/2/2015. Congratulations to David G. who has won a Dan David Prize scholarship!

3/31/2015. Congratulations to former lab member Melissa Hubisz, now at Cornell, for winning an NSF predoc award! Also well done to Jessica, Emily and Evan for honorable mentions.

12/18/2014. Alexis, Zia and Sidney's paper on the relationship between genetic variation and phenotypic variation in RNA, ribosomes, and proteins is out today. Congrats!

12/16/2014. Congratulations to Cristina who completed her Ph.D defense in CS yesterday!!

12/04/2014. Anil and Heejung's paper on using multi-scale approaches to model DNase data at TF binding sites is out on bioRxiv.

11/11/2014. Our new website SciReader for scientific recommendations is out in beta release. You can check it out here: Well done Priya, Natalie, Yonggan!

11/11/2014. Bryce and Graham's paper and software on unbiased allele-specific read mapping and powerful QTL mapping is out on bioRxiv and the corresponding WASP software is on GitHub.

9/22/2014. Lots of new arrivals this month: David, Yang, Anand and Emily visiting scholars: Audrey, Towfique and Kyle. Welcome!

8/07/2014. This week we are moving into our long-term lab space in the Clark Center! We are very happy to be in the new space. In other news, Anand has been awarded a CEHG fellowship for next year. Well done, Anand!

6/24/2014. Alexis will be starting her own lab next month at Johns Hopkins in the departments of CS and Biostats [Link]. We wish her all the very best in her new position!!

6/24/2014. Well done to Eilon on winning the prestigious EMBO fellowship! David Golan, who will be joining in September has been awarded the Rothschild and Fulbright fellowships. Congrats to both!!

6/24/2014. Xun and Nick's paper on genetic influences on DNA methylation is out on bioRxiv [Link].

4/29/2014. Anil's fastSTRUCTURE paper is now out in Genetics: Link.

Alexis will be speaking next week at Biology of Genomes about her work with Zia and Sydney on understanding the effects of genetic variation on mRNA, translation and proteins.

4/2/2014. Darren Cusanovich's paper on knockdown of TFs in LCLs (with Yoav's lab) is out now in PLOS Genetics: Link. Choongwon Jeong's paper on adaptive introgression of high altitude adaptations from Sherpa into Tibetans (collaboration with Anna Di Rienzo's lab), is out now in Nature Communications: Link.

4/1/2014. Bon voyage to Heejung Kim she spent the winter term visiting us from Matthew's lab in Chicago.

3/1/2014. Welcome to Yair who has now arrived from Chicago with his family!

2/10/2014. Our paper on genetic load in human populations, joint with Guy Sella's lab, is out now in Nature Genetics. Well done to Yuval Simons (in Guy's lab) and Michael Turchin (now in Matthew Stephens' lab)! Link.

2/10/2014. Welcome to our new postdoc Eilon Sharon, who has moved here from Eran Segal's lab. Eilon will be joint with Hunter Fraser's lab. Welcome also to this term's rotation students: Peyton Greenside, Natalie Telis, Diego Calderon, Arbel Harpak and Emily Glassberg!

12/6/2013. Joe Davis has written a great blog post about Graham and Bryce's recent paper on genetic variation and histone modification.

12/4/2013. fastSTRUCTURE is out! Links to Anil's manuscript and beta-release software are here.

12/4/2013. Thanks to Christine Vogel for her perspective on Zia's evolution of mRNA/protein paper.

11/12/2013. Welcome to Yonggan and Priya who are joining the lab this month!

11/12/2013. Congrats to Jack/Athma/Roger whose paper on DNase QTLs was chosen as one of the top papers of 2012 in Regulatory and Systems Genomics at the RECOMB/ISCB meeting.

11/1/2013. There's a nice perspective in NRG by Hannah Storey on 4 recent papers--including one by Graham and Bryce--that studied the effects of genetic variation on histone mods.

10/30/2013. Kudos to Shyam for winning the prestigious Charles Epstein Trainee Research Award (postdoc division) for his talk at ASHG on historical inference for African populations. Graham Coop is a previous winner from our lab (in 2007).

10/22/2013. Congratulations to lab alum Joe Pickrell who has just accepted a position as one of the first faculty at the New York Genome Center. In addition, Zia Khan is now in transit to his first faculty position--in CS at U. Maryland. Good luck to both!

10/22/2013. Darren's paper on knockdown experiments targeting 59 TFs is out on ArXiv.

10/17/2013. Zia and Graham/Bryce have a pair of papers out in Science today: evolution of protein expression in primates and effects of genetic variation on histone modifications. Congrats to Zia, Graham and Bryce!

10/17/2013. Ben Voight's 2006 paper on selection was highlighted in a nice blog article by Emma Ganley as part of the 10th anniversary celebrations at PLOS Biology.

10/11/2013. Welcome to our first two Stanford rotation students: Ilana Arbisser from the Biology department and Michael Sikora from Genetics!

8/1/2013. We are delighted to be moving to Stanford University. This will be a fantastic academic environment and a great place to live. That said, we will miss our many friends at the University of Chicago, where the lab was based for 12 years.

8/1/2013. Welcome to our newest postdoc Alexis Battle! It's great to have her onboard.

8/1/2013. Welcome to Anil, Stoyan and Xun, who are arriving at Stanford this month. The rest of the lab will follow soon or work from Chicago during the transitional period.

Hello World. The lab moves to Stanford on 8/1/2013, after 12 years at the University of Chicago.