By Will Dunham
WASHINGTON (Reuters) – Scientists on Wednesday unveiled a new accounting of the human genome that improves on its predecessor by including a rich diversity of people to better reflect the global population – a boost to ongoing efforts to identify genetic underpinnings of diseases and new ways to treat them.
This “pangenome” achievement was announced two decades after the first sequencing of the human genome, a feat that transformed biomedical research by giving scientists a reference map to analyze DNA for clues about disease-related mutations.
The new genome rundown may help clarify the contribution of genetic variation to health and disease, improve genetic testing, and guide drug discovery. It could be of particular value in understanding neurodevelopmental disorders such as schizophrenia, autism, macrocephaly, and microcephaly, as well as drug metabolism.
The work, led by the international Human Pangenome Reference Consortium of scientists funded by the U.S. government’s National Human Genome Research Institute (NHGRI), essentially was a reboot of the prior effort and solved a key deficiency – a failure to represent the genetic variations present among the world’s 8 billion people.
The previous work had significant gaps and was based largely on a single person’s DNA. The new work is a collection of nearly perfect genome assemblies for 47 people of diverse ancestries and an alignment of those individual genomes to show which parts match and which differ. Calling this a first draft, the researchers intend to increase the number of people reflected in the data to 350 by mid-2024.
“A pangenome is not just one reference genome, but a whole collection of diverse genomes. By comparing those genomes we can then build a map of not just one individual, but a whole population of variation,” said University of California, Santa Cruz genomicist Benedict Paten, co-leader of the consortium and senior author of the main research paper published in the journal Nature.
This collection comprised genomes of people including those of African, East Asian, South Asian, European, North American, South American, and Caribbean ancestry, though not yet Oceania.
“Bottom line – what we’re doing is retooling genomics to create a diverse, inclusive representation of human variation as the fundamental reference structure, and so mitigating bias. This is important if we want our research to benefit everyone equally,” Paten said.
A genome is an organism’s genetic blueprint – in this case a human – and contains the information needed for development and growth. But each person’s genome varies slightly – about 0.4% on average – from other people. These genetic differences can shed light on a person’s health, help diagnose disease, craft treatments and forecast medical outcomes.
“By building very high quality, almost complete references we’re getting a better picture for how some of the most complex regions of the genome vary. Until now, the composition of these fast-evolving regions has been largely invisible to us,” Paten said.
Researchers in 2003 unveiled what was billed as the complete sequence of the human genome, though about 8% of it had not been fully deciphered. That reference genome was a mosaic drawn from about 20 people, including 70% from one individual of mixed European and African ancestry. The first complete human genome, based on a single European individual, was published last year after scientists filled in the gaps.
Our species Homo sapiens arose in Africa roughly 300,000 years ago and later spread worldwide.
“Human ancestry is incredibly complex, and we’re all related to each other through our common history,” said Ira Hall, director of the Yale Center for Genomic Health and one of the research leaders. “And so by sampling broadly across the genetic tree of humanity, it benefits everybody. Even if some specific group isn’t explicitly included, it still is representing our common origins and provides common benefits.”
The cost of supporting the consortium will be about $40 million over five years, NHGRI said, less than the multibillion-dollar expenditure for the 2003 genome project thanks to technological advances.
(Reporting by Will Dunham, Editing by Rosalba O’Brien)