Saturday, March 25

The ‘alphabet of life’ sees the light: the most complete sequence of the genome of a human being has been published

Six simultaneous scientific articles appeared this Thursday in ‘Science’ magazine document the publication of the most complete version, to date, of the genome of a human being: the set of genetic material that makes Homo sapiens different from a mouse or a mosquito. The announcement comes two decades after the Human Genome Project achieved a first draft of the chemical alphabet What makes up our genetic material? Genes are segments of DNA – the acid that contains the basic units of genetic inheritance – and contain information on specific characteristics of living organisms, their functioning and their development.

A huge family tree to understand the human family

Know more

“These parts of the human genome that we hadn’t been able to study for more than 20 years are important for understanding genome function, genetic disease, and human diversity and evolution,” he says. in a press release Karen Miga, associate professor of biomolecular engineering at the University of California at Santa Cruz, and one of the leaders of this project that is now culminating.

In the year 2000, the announcement of what was really nothing more than a draft of the human genome became a political event. In a ceremony broadcast via satellite, starring the then president of the USA, Bill Clinton, and the British prime minister, Tony Blair, that day at the end of June the media were filled with grandiloquent phrases. “We are learning the language with which God created life,” Clinton went on to say.

At that time, the scientific baton was shared by the geneticist Francis Collins and the biologist Craig Venter (the first from the public sector, the second from his company, Celera) who even then warned of the preliminary nature of the results and the enormous task that was ahead. In July 2021 another battery of papers scientists announced this improved version of the monumental project, the results of which are public today. Researchers around the world will finally be able to study the 8% of the genome that still remained undeciphered.

3 billion DNA bases

The well-known ‘double helix’ of DNA is made up of two strands of bases – a sort of chemical letters held together by hydrogen bonds – that twist like a corkscrew to form the arms of chromosomes, the X-shaped strands located in the nucleus of cells. The human genome consists of approximately 3 billion base pairs of DNA. According to the researchers in the press release, the availability of a more complete and gapless sequence of the genome is essential to understand “the full spectrum of human genomic variation and the genetic conditioning of certain diseases.”

The work has been carried out by Telomere to Telomere (T2T) consortium formed in 2019, among others, by members of the National Human Genome Research Institute (NHGRI), which in turn is part of the US National Institutes of Health; the aforementioned University of California, Santa Cruz, and the University of Washington, Seattle. The NHGRI has funded most of the project.

Two thousand genes to study

The new reference genome, called T2T-CHM13, adds nearly 200 million base pairs of new DNA sequences, including 99 genes that “probably” code for proteins and almost 2,000 genes that still need further study. It also corrects thousands of structural errors in what was previously the reference sequence.

The gaps now covered by the new sequence encompass all of the short arms of five human chromosomes and cover some of the most complex regions of the genome. These include repetitive DNA sequences found in important chromosomal structures such as telomeres (the ends of chromosomes) and centromeres (the narrowing of the chromosome that separates a short arm from a long arm) that coordinate the separation of replicated chromosomes during cell division.

The new sequence also reveals previously undetected duplications of segments, often because the technology available at the time was not capable of doing so. These are long stretches of DNA that are duplicated and play important roles in both the evolution and development of diseases. “Many of the newly revealed regions have important functions in the genome even though they do not include active genes,” the researchers say.

“There is a profound advantage to seeing the entire genome as a whole system. It puts us in a position to tease out how that system works.”

David Haussler
Director of the Genomics Institute at the University of California at Santa Cruz

“There is a profound advantage to seeing the entire genome as a complete system. It puts us in a position to unravel how that system works,” says David Haussler, director of the Genomics Institute at the University of California at Santa Cruz. “We have gained a tremendous understanding of human biology and disease by having about 90% of the human genome, but there were many important aspects that remained hidden, out of sight of science, because we did not have the technology to read those parts of the genome. Now we can stand on top of the mountain and see the whole landscape below and get a complete picture of our human genetic heritage.”

The new T2T reference genome will complement the standard human reference genome, known as Genome Reference Consortium build 38 (GRCh38)which had its origins in the publicly funded Human Genome Project, and has been continuously updated since the first draft in 2000 (announced at that solemn Clinton-Blair political ceremony).

Pangenomics: the new goal

“We are adding a second complete genome, and then there will be more,” explains Haussler. “The next phase is to think that the reference of the genome of humanity is not a single genomic sequence. This is a profound transition, the harbinger of a new era in which we will end up capturing human diversity in an unbiased way.”

The T2T Consortium has now joined the Human Pangenome Reference Consortium, whose goal is to create a new “human pangenome reference” based on the complete genome sequences of 350 individuals. The pan-genome is the collection of all the genes of a species, both those that all individuals have in common and those that differ from one another.

“Pangenomics is about capturing the diversity of the human population, and also making sure that we’ve captured the entire genome adequately,” says Benedict Paten, associate professor of biomolecular engineering at the University of California, co-author of the T2T and leader of the pangenomics project. “If we don’t have a map of these difficult-to-sequence regions of the genome in multiple individuals, we are missing out on a lot of variation present in our population.”