New algorithm generates synthetic human genetic code

According to research, fragments of artificial genomes with human characteristics can be created on the computer. In recent years, thanks to increasingly complex algorithms, Artificial Intelligence has been able to replicate complex models derived from the real world and even generate high-quality synthetic data such as credible images of works of art, newspaper articles and faces of people. Today, research has gone further and has done so in the field of biology.

An international team of researchers, which includes the Universities of Tartu, Paris and Padua, has in fact managed to create fragments of artificial genomes with real characteristics on the computer starting from an existing genomic database. The researchers, in essence, have developed an algorithm that can generate the genetic code of non-existent people.

The Two basic approaches

Scientists used two basic approaches to create artificial genomes. First, they trained a type of Artificial Intelligence called Generative Adversarial Network (GAN) with real data taken from a genomic database. The AI ​​works in such a way that, given a training set, it “learns” to generate new data with the same statistics as the training set. Subsequently, they used a restricted Boltzmann machine (RBM), which would be a probabilistic graphical model, inclusive of a certain number of parameters, which when applied to a distribution of data is able to provide a representation.

The researchers carried out several analyzes to compare the characteristics of the artificial genomes with those of the real genomes and, as they progressed, they refined their work. In this way they managed to replicate realistic human genomes almost indistinguishable from reality. “As surprising as it may seem, these artificial genomes, which emerge from data packets initially randomly picked up and then modeled, mimic the complexities we can observe within real human populations and, for most of their properties, are indistinguishable.

From the other biobank genomes we used to train our algorithm, except for one detail: they don’t belong to any human donor, ”said Luca Pagani, one of the study’s senior authors and professor at the University of Padova.

The pubblication

In the article published in the journal PLOS Genetics, the team of scientists who developed the artificial genome project stated that their false genetic sequences have real value as a tool for geneticists. According to the researchers, in fact, these DNA codes could help further genetic testing, without however compromising the privacy of real people who should give up their genetic data.

The actual concern is if these GAN-created genomes can be collected into a functional human genetic structure or not.

“My initial take is that it is interesting, but I’m not sure I see real practical implications for research right now,” Deanna Church, vice president of the Mammalian Business Area and Software Strategy at the biotech company Inscripta, told. Church also told “I definitely think the work is interesting, but I don’t see practical applications of it right now,” Church added. “Of course, I could be missing something.”

The authors of the paper didn’t respond to the request for comment. But in that regard, Church says, she’s not too optimistic.

“While detecting privacy issues across thousands of genomes might seem like searching for a needle in a haystack, the combination of multiple statistical measures has allowed us to overcome this important problem as much as possible.

”We think that our effort will bring improvements in the evaluation and design of the generative model and will fuel the field of machine learning “, concluded Flora Jay, study coordinator and CNRS researcher in the interdisciplinary computer science laboratory LRI / LISN of the Université Paris- Saclay.

Do you want to know more about the technological threats in the collection of DNA?

Take a look to the courses of the School of Disruption.

Latest articles

School of Disruption


Related articles