Statistical Genetics

 

The Field of Statistical Genetics is concerned with understanding the genetics of phenotypes by inferring the identities, effects, and interactions of causal genetic variants. While the identities of relatively few causal variants have been precisely defined, the analysis of genome-wide genetic data has produced the possible locations, candidate genes, and estimates of effect sizes of genetic variants impacting a vast array of phenotypes, from disease to many other measurable traits. The core statistical analysis techniques applied to genome-wide genotype data are association analyses, which seek to identify genomic positions and other aspects of the causal variants that can impact a phenotype. 

In the MezeyLab, our area of expertise is understanding what works and what does not when applying statistical genetic analyses given a system and available data - and in what cases there is the potential to extract additional insights with a more complex analysis methodology. Our recent methodology work has developed computational statistics and machine learning methods for homogenous group identification, optimal covariate model building, and epistasis feature identification that can extend the inference capabilities of the generalized linear mixed models (GLMM) used for association analysis. Our recent collaborations include work with scientists analyzing model systems, agricultural traits, and human diseases ranging from rare to common including cancer, with objectives that range from developing a model system for disease, identifying the causal variant for a Mendelian disease, identifying genetic influences on molecular traits from transcriptome to pharmacogenomic to assess drug targets and predict drug response. Representative examples of our previously published and current work in this area include:

  • Development of computational statistics and machine learning methods for inferring latent variables for the analysis of genome-wide genotype and multivariate Molecular Phenotypes where our collaborative work has included analysis of expression Quantitative Trait Loci (eQTL), metabolic (mQTL), and immune infiltrates in humans and other systems

  •  

  • Analysis of the variants impacting traits of Agricultural Crop Species from genome-wide data, where our collaborative work has included analysis of environment robustness and yield traits in rice and genetic engineering targets in other species

  •  

  • Analysis of variants impacting diseases and agricultural traits in Veterinary and Animal Systems from genome-wide genotype data including the identification of causal variants and loci impacting rare diseases in dogs and horses

  •  

  • Development of Bayesian linkage analysis methods for pedigree-based identification of variants impacting Rare Human Diseases

  •  

  • Multi-family analyses of whole-exome sequencing data to map loci responsible for Extreme Mendelian Diseases to identify drug targets 

  •  

  • Bioinformatics analysis of segregating rare variants in populations measured by whole-exome sequencing to predict population prevalence of Mendelian and Pharmacogenomic Diseases

  •  

  • Collaborative work analyzing the impact of a known causal variant on Genotype by Environment interactions on neuroanatomical features extracted from functional MRI (fMRI)

  •  

  • Development of regularized generalized linear models and covariates selection approaches for computationally efficient mixed modeling analysis of Genome-wide Association Study (GWAS) data and collaborative application to identify loci impacting Complex Human Diseases when analyzing microarray and next-generation sequencing genotype data

  •  

  • Development of epistatic analysis methods for increasing discovery and predictive power when analyzing Complex Phenotypes from the analysis of exome and whole-genome next-generation sequencing data

  •  

  • Development of Bayesian hierarchical models for identifying misspecification and for learning Composite Clinical Traits amenable to genetic association analysis from combined genome-wide genotype and clinical data