Integration of GWAS Summary Statistics and Gene Expression Reveals Target Cell Types Underlying Kidney Function Traits

Background

Genetic variants identified in genome-wide association studies (GWAS) are often not specific enough to reveal complex underlying physiology. By integrating RNA-seq data and GWAS summary statistics, novel computational methods allow unbiased identification of trait-relevant tissues and cell types.


Methods

The CKDGen consortium provided GWAS summary data for eGFR, urinary albumin-creatinine ratio (UACR), BUN, and serum urate. Genotype-Tissue Expression Project (GTEx) RNA-seq data were used to construct the top 10% specifically expressed genes for each of 53 tissues followed by linkage disequilibrium (LD) score–based enrichment testing for each trait. Similar procedures were performed for five kidney single-cell RNA-seq datasets from humans and mice and for a microdissected tubule RNA-seq dataset from rat. Gene set enrichment analyses were also conducted for genes implicated in Mendelian kidney diseases.


Results

Across 53 tissues, genes in kidney function–associated GWAS loci were enriched in kidney (P=9.1E-8 for eGFR; P=1.2E-5 for urate) and liver (P=6.8·10-5 for eGFR). In the kidney, proximal tubule was enriched in humans (P=8.5E-5 for eGFR; P=7.8E-6 for urate) and mice (P=0.0003 for eGFR; P=0.0002 for urate) and confirmed as the primary cell type in microdissected tubules and organoids. Gene set enrichment analysis supported this and showed enrichment of genes implicated in monogenic glomerular diseases in podocytes. A systematic approach generated a comprehensive list of GWAS genes prioritized by cell type–specific expression.


Conclusions

Integration of GWAS statistics of kidney function traits and gene expression data identified relevant tissues and cell types, as a basis for further mechanistic studies to understand GWAS loci.