Friday, March 29

A huge family tree to understand the human family


The sequencing of the human genome in the last two decades has led to a deeper understanding of our evolutionary past. In this way, genomic data on hundreds of thousands of individuals, including thousands of prehistoric people, have been generated.

A team of scientists has applied a new non-parametric method that combines log-tree data from ancient and modern human genomes, allowing them to deduce a complete human genealogy.

“By treating all of our ancestry as a single network, we can estimate general characteristics of common ancestors in the human family tree, such as their age and potentially even their location.”

Yan Wong

“A nonparametric method means that we had to make very few assumptions about the nature of human migrations. For example, we don’t need to conjecture whether there was one, or only a few, migrations out of Africa, or that they occurred in a certain way, in a given time.” moment. Our goal is to let the data speak for itself,” Yan Wong, from the Li Ka Shing Center for Health Information and Discovery at the University of Oxford (United Kingdom) and co-author of the study, tells SINC. publish the magazine Science.

With this technique, the ancestor of two individuals is geographically assigned to the midpoint between the geographical location of their two descendants and their ancestors in the past are determined. “Although we know that this method is imperfect, it seems to recapitulate many of the known human movements well. Perhaps the surprise is, in fact, that it works reasonably well”, Aida Andrés, from the Institute of Genetics at University College London and author of a article in the same magazineas a comment to this work.

To date, thousands of human genomes had been collected, containing segments from different and multiple ancestors of different ages. Consequently, building a comprehensive picture of genealogy and genomic variation throughout human history represents a technical challenge.

Now, Wong and his team have managed to build a huge family tree for all of humanity. “By treating all of our ancestry as a single network, we can estimate general characteristics of common ancestors in the human family tree, such as their age and even potentially their location. Our method will potentially scale to millions of genomes,” says Wong.

The story that spawned all our genetic variation

Since individual genomic regions are only inherited from one parent, the ancestry of each point in the genome can be thought of as a tree. The set of trees, known as a ‘tree sequence’ or ‘ancestral recombination plot’, links genetic regions through time to the ancestors in which the genetic variation first appeared.

In total, eight different databases including a total of 3,609 individual genomic sequences from 215 populations were used. Ancient genomes included samples found around the world, ranging in age from 1,000 to more than 100,000 years. The algorithms predicted where common ancestors needed to be present in evolutionary trees to explain patterns of genetic variation. The resulting network contained almost 27 million ancestors.

“Our pedigree shows, for the first time, that the signal from dispersal out of Africa is clearly present throughout the genome.”

Anthony Wilder Wohns

Anthony Wilder Wohns, who carried out the research at the University of Oxford and who is now a postdoctoral researcher at the Broad Institute of MIT and Harvard (USA), points out to SINC: “Although this genealogy includes an enormous amount of detail “Looking at the middle ages and the location of ancestors provides great insight into broad features of human history. Sometimes we can even pull all of this data together to reveal important patterns.”

After adding location data from these sample genomes, the authors used the network to estimate where common ancestors had lived. The results successfully recapitulated key events in human evolutionary history, including migration out of Africa.

“It has long been known that there was a dispersal off this continent, perhaps about 100,000 years ago. The signs of this event are found in parts of the genome such as the mitochondrial, the Y chromosome and several other genes. However, our genealogy shows For the first time, the signal for this event is clearly present throughout the genome,” says Wong.

The researchers looked at signs of very deep ancestral lineages in Africa, the event outside of Africa, and the introgression or archaic incorporation of genes in Oceania.

The mutations that give us the clues

This method also accounts for missing and erroneous data, and uses fragmented ancient genomes to help pinpoint the timing of alleles’ appearance.

“These are genetic variants. They appear by mutation at some point, and if they are established from that moment on, that genomic position will be variable in the population. Some chromosomes will have one allele, others the other. We can infer the age of those alleles using modern genomes, but it is a difficult problem and we do it with little resolution”, reveals Andrés.

“The genomic datasets we use have been constructed from many different sources and using different methods. Certain types of errors inevitably occur. Our approach helps identify them.”

Gil McVean

What they’ve done in this study is use low-quality ancient genomes to help determine their age. For each allele they have wondered when they see it for the first time. If, for example, you look at a genome from 5,000 years ago, you’ll know for sure that that mutation is older than that.

“This helps to improve the models that allow us to infer human demographic history. But, in addition, if that allele has important effects—for example, if it allows us to digest lactose, or if it increases the risk of a disease—improving the inference of the age of the allele helps us understand the history of that phenotype or that disease”, continues the University College researcher.

Gil McVean, another co-author from the University of Oxford, tells SINC: “The genomic datasets we use have been constructed from many different sources and using different methods. Certain types of error inevitably occur. Our approach helps identify them. We estimate the rate to be small, less than 0.5% of variant locations in the genome, but removing them creates a more accurate and complete picture of human genomic variation.”

Although the study focuses on humans, the method is valid for most living things.

Study limitations

Wilder Wohns, for his part, explains that one of the main limitations of this work is that they use a very simple method to estimate the location of our ancestors.

“Much more work could be done in this field. Furthermore, our estimates of the location of ancestors are ultimately limited by those of genomic sequences. For example, the accuracy of our reconstruction of migrations of indigenous peoples to the Americas is hampered by the relative paucity of samples from northeastern Siberia and northwestern North America,” he adds.

Furthermore, if large historical migrations occurred that left no local descendants, the accuracy of such estimates of the location of ancestors would be diminished.

“One of the biggest limitations of any study using the large existing genome databases—including this one—is that we don’t have an adequate representation of all human populations.”

Aida Andres

Finally, the method takes into account the errors of the genetic data sets used, but it does not do so perfectly. This can also affect estimates of the age and location of our ancestors.

“In my view, one of the biggest limitations of any study using the large existing genome databases—including this one—is that we don’t have an adequate representation of all human populations. The databases are biased toward populations studied. For example, the European ones. But this happens, not only in this research, but in most of the current genomic work, and it will be solved only by sequencing more genomes of more world populations”, concludes Andrés.



www.eldiario.es