About HiGAP

The HiGAP initiative

Charting heterogeneity in the analysis and biological interpretation of results from GWAS studies

Why this project?

Genome-wide association studies (GWAS) have become a standard tool in the discovery of disease-related genetic variants. Since the first published GWAS in 2005, many tools and pipelines have been published to facilitate the analysis of GWAS datasets. While initially the main aim of GWAS was to detect statistical associations of genetic variants with (disease-) traits, increasing sample sizes and a realization that many traits are highly polygenic have shifted the aim towards biological interpretation of detected associations. A parallel increase in the availability of external biological resources, such as eQTL information from GTeX or scRNA information from the Allen Brain Institute, aids in optimizing such biological interpretation. To translate biological interpretations of GWAS results, functional experiments are needed. These experiments are costly and labour-intensive and benefit greatly from carefully formulated mechanistic hypotheses.

Biological interpretation of GWAS results, however, is not straightforward. Many different tools and pipelines exist, and different labs may have different preferences and standards for analyzing and interpreting GWAS data and results. While tools have generally been individually benchmarked and compared to tools with similar goals, there is currently no information on how heterogeneity in the steps involving initial quality control, the genetic association analysis, and choice of post-GWAS in silico analyses may influence the formulation of a mechanistic hypothesis.

The aim of HiGAP is two-fold: Firstly, quantifying the heterogeneity in quality control procedures and its effect on down-stream analyses. Secondly, charting heterogeneity in the formulation of mechanistic hypotheses based on GWAS results. We would like to gain insight into whether different analysis pipelines would ultimately lead to conclusions about putative underlying disease mechanisms that are the same, complementary or conflicting.

How will we work?

The project will be conducted in two parts according to the two aims of HiGAP. (1) Teams will receive summary statistics for an unknown heritable disease with a polygenic architecture similar to a psychiatric trait such as schizophrenia or major depression, and that is based on real data. Analysts can choose any tool and strategy of their liking to interpret the genetic associations and form hypotheses about the most likely biological mechanism. (2) In addition, they will receive raw genotype data that needs to be cleaned and prepared for a subsequent polygenic risk score analysis.

We will advertise our project on social media to solicit teams of analysts. Participating teams will have to sign a confidentiality agreement and are not allowed to discuss any details of the project with other participating teams until all results have been submitted. Each team is asked to report on several interim results as well as the final conclusion, following a pre-defined report form. In addition, each team is asked to provide a brief report on applied tools and parameter settings.

Results need to be reported back before xx/xx/2021. After this, a team of analysts (not involved in any of the other participating teams) will analyse the reported results.

Preliminary timeline:

  • September 2020: HiGAP launch

  • Soliciting analysis teams

  • Teams receive access to data

  • Teams submit results

  • Heterogeneity analysis

  • Publication

We realize that every trait is unique with its own genetic architecture and causal biological mechanisms, and that these idiosyncrasies may be differentially sensitive to heterogeneity in analysis pipelines.