Like many research fields, genomics is grappling with a lack of diversity in its research workforce. Students from under-represented groups still make up a too-small minority of STEM trainees and an even smaller percentage of senior researchers in STEM fields.
But the field of genomics has another diversity problem: the woeful under-representation of minority genomes and genomic data incorporated into genomic research programs.
The decreasing cost of sequencing is producing thousands of human genomes and corresponding knowledge that benefits human health. These efforts have produced catalogs of common genetic variants, references genomic datasets for expanded study, and knowledge of sequence variants that cause disease. Genome-wide association studies also produced polygenic risk scores, which can predict a person’s disease predisposition based on the total number of disease-associated alleles in their genome.
But the vast majority of the genomic datasets – and thus the knowledge gained – focus on individuals of white European ancestry.
This bias in data means that we are missing base knowledge for other groups of people [1]. Because allele frequencies are different across human populations, variants that are common in non-European populations, such as indigenous groups, people of African descent, and others, remain unknown [2]. In fact, the current build of the human reference genome is missing sequence found in Africans [3], meaning that read mapping approaches will completely miss sequence from un-represented populations. This lack of knowledge makes it difficult to identify rare variants that are the cause of disease in non-European populations.
But another problem is that genotype-phenotype relationships are different across populations. Knowledge gained from studying one group of individuals doesn’t necessarily apply to other populations.
Polygenic risk scores (PRS) are a prime example. PRS are already used in health care to indicate disease predisposition based on a person’s genotype – but they are effective mostly for the populations whose genomic data were used to generate the PRS knowledge.
A recent study showed that PRS are 5-times less predictive for individuals of recent African ancestry versus those of European descent [4].
An important end result of these disparities is that individuals from under-represented groups are missing out on the promise of genomics and genome-enabled health care decisions. Part of the reason for these disparities is almost certainly influenced by the lack of minority voices in research leadership positions. These disparities need to be addressed.
One bit of good news is that researchers, educators, and funding agencies are making intentional strides to address the problem. Genomic consortia are consciously expanding focus to diverse populations, and funding agencies are demanding it. Many research groups are intentional in recruiting diverse participants, including through outreach efforts to listen to those communities and rebuild trust with medical establishments [5]. Training programs across the country are recruiting and supporting more diverse trainees but also seeking mechanisms to empower students at earlier stages in their education and to address systemic biases in the training pipeline. The National Human Genome Research Institute (NHGRI) within NIH recently released its roadmap for increasing diversity in the genomics workforce, which can serve as a useful guideline for action.
There is a crucial need for diversity in genomic data and the genomics workforce. We see this as a call to action for our campus and community. The G&T newsletter is committed to highlighting these issues and efforts at UW-Madison to improve the equity and inclusivity of genomic science.
- Weinberger, D.R., et al. Neuron. 107, 407-411. (2020)
- Bergstrom A., et al., Science. 367 (2020)
- Sherman, R.M., et al. Nature Genetics. 51, 30-35 (2019)
- Martin, A.R., et al. Nature Genetics. 51, 584-491 (2019)
- Lewis, K.L., et al. American Journal of Human Genetics. 108, 894-902 (2021)