Researcher says he recovered gene sequences after a Chinese scientist asked that they be removed from government archive
Chinese researchers directed the U.S. National Institutes of Health to delete gene sequences of early Covid-19 cases from a key scientific database, raising concerns that scientists studying the origin of the pandemic may lack access to key pieces of information.
The NIH confirmed that it deleted the sequences after receiving a request from a Chinese researcher who had submitted them three months earlier.
“Submitting investigators hold the rights to their data and can request withdrawal of the data,” the NIH said in a statement.
The removal of the sequencing data is described in a new paper posted online Tuesday by Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center in Seattle. The paper, which hasn’t been peer reviewed, says the missing data include sequences from virus samples collected in the Chinese city of Wuhan in January and February of 2020 from patients hospitalized with or suspected of having Covid-19.
Some of the deleted information is still available in a paper that was published in a specialized journal, but scientists typically look for gene sequences in major databases like the one the NIH maintains, Dr. Bloom said. Dr. Bloom said he was able to find the deleted data after searching for it elsewhere online.
The missing sequences are unlikely to change researchers’ current understanding of the early weeks of the Covid-19 pandemic in Wuhan. But Dr. Bloom said their removal sows doubts about China’s transparency in the continuing investigation into the origin of the pandemic.
Some other scientists agreed.
“It makes us wonder if there are other sequences like these that have been purged,” said Vaughn S. Cooper, a University of Pittsburgh evolutionary biologist who wasn’t involved in the new paper and said he hasn’t studied the deleted sequences himself.
To pursue the origin of the pandemic, scientists need access to information that could shed light on how the virus emerged into the human population and began spreading. The removal of information from a database can make it harder for them to find it, potentially slowing their research, as can lack of access to other research. An international team led by the World Health Organization as well as other scientists are investigating how the pandemic began.
According to the NIH statement, the scientist who submitted the sequences requested in June 2020 that they be deleted because they had been updated and were to be posted to another, unspecified database. The investigator said they wanted the older version to be removed to avoid confusion, according to the NIH.
Chinese researchers initially submitted the sequences to the NIH database in March 2020 and published information about them in a paper on a preprint server, according to the NIH. The paper described the use of an advanced sequencing technology to detect SARS-CoV-2, the virus that causes Covid-19. The researchers didn’t immediately respond to a request for comment.
China’s National Health Commission didn’t immediately respond to a request for comment.
One challenge for scientists studying the origin of the virus is the paucity of data from early cases in Wuhan, Dr. Bloom says in the paper. Those data, he says, are mostly limited to virus sequences obtained in December 2019 from a dozen patients connected to the city’s Huanan Seafood Market, the site of the first known outbreak of Covid-19, and a small additional number of sequences collected before late January 2020.
The removal of the sequences yielded “a somewhat skewed picture of viruses circulating in Wuhan early on,” Dr. Bloom said. “It suggests possibly one reason why we haven’t seen more of these sequences is perhaps there hasn’t been a wholehearted effort to get them out there.”
The publication of Dr. Bloom’s paper could reinforce calls for greater collaboration from China in the global effort to pinpoint the source of SARS-CoV-2.
A WHO official working with the international team that prepared the organization’s March report on the origins of the virus said Dr. Bloom’s paper didn’t radically alter the team’s understanding of the early pandemic but did bolster the case for more analysis of the earliest Covid-19 infections.
Dr. Bloom is a co-author of a letter published in May in the journal Science that criticized the WHO report and called for a deeper investigation into two leading hypotheses of the origin of Covid-19: that the pandemic virus entered the human population after escaping from a lab, or that it jumped to humans naturally from infected animals.
He said he realized that sequences had been removed from NIH’s Sequence Read Archive database when he read an analysis by other investigators and tried to find the sequences himself.
Following the discovery, he spent mornings and weekends scouring the internet for other sources of the deleted sequences—and ultimately was able to obtain and download them. Dr. Bloom then contacted the NIH to ask why the sequences were removed.
Dr. Cooper, the University of Pittsburgh virologist, said the deleted sequences don’t resolve a continuing debate over whether the pandemic emerged from a lab accident or animal spillover into humans. “You could still argue it both ways,” he said.
But Dr. Bloom’s paper suggests that other early sequence data might still emerge, said Sergei Pond, a Temple University biology professor with expertise on the evolution of viral pathogens.
“If more sequences came to light, especially from early time points, or archival samples elsewhere, everything could change once again,” he said. “I think this is likely to happen.”
Stephen Goldstein, a University of Utah evolutionary virologist who wasn’t involved in Dr. Bloom’s research, said it was unclear if any new insights could be gleaned from the deleted sequences. “From a scientific standpoint, I don’t think they point to anything nefarious,” he said, adding that he had not made his own analysis of the sequences.
The deleted sequences are fragments, and “it’s the full genome sequences that have typically been the most informative,” said Joel Wertheim, an evolutionary biologist at the University of California, San Diego and an author of a recent paper on the early pandemic.
Dr. Bloom says in his paper that even if there is no further international investigation, the approach he took could be used to learn more about the origin or early spread of the coronavirus.
“We really need to look hard and see if there is other early information about sequences that hasn’t been found,” he said. “I intend to go through every early preprint I can find about SARS-CoV-2 and see if it describes any data that isn’t in the databases.”