Researchers at MIT have developed a framework to ensure genetic data can be shared freely without putting individuals’ privacy at risk. The approach, which applies statistical science also used by Apple ($AAPL), adds misinformation to search queries to prevent the identification of individuals while still delivering results that are good enough for research purposes.
Two researchers from the Computer Science and Artificial Intelligence Laboratory at MIT worked on the approach with a collaborator at Indiana University, leading to a publication in Cell Systems. The paper describes the adaptation of differential privacy to genome-wide association studies (GWAS). Differential privacy, a cryptographic theory adopted by Apple, uses techniques including the injection of misinformation into search results to ensure a researcher can learn as much as possible about a dataset as a whole without being able to know anything about the individuals in it.
This concept holds an obvious appeal to GWAS, a field torn between demands to protect the privacy of individuals and the need to realize the health benefits of widespread, large-scale analyses. As it stands, this is being achieved by anonymization–a method that carries the risk of being reversed–and through the use of gatekeepers. As the Cell Systems authors see it, the gatekeeper approach is slowing down the rate at which science advances.
“Right now, what a lot of people do, including the NIH, for a long time, is take all their data–including, often, aggregate data, the statistics we’re interested in protecting–and put them into repositories,” says Sean Simmons, an MIT postdoc and first author on the paper, said in a statement. “And you have to go through a time-consuming process to get access to them.” MIT’s Bonnie Berger, a coauthor of the paper, said it can take months to gain access to a repository.
Whether differential privacy is the answer to these problems remains to be seen. The authors see it as being suitable for use in situations “in which privacy concerns would make alternative approaches cumbersome or impossible.” Figuring out which situations the approach is best suited to will require experimentation, something at least one researcher not involved with the paper is keen to see happen.
“Hopefully, this will encourage the biomedical community to test this promising approach at large scale and, if it’s successful, define best practices and develop related tools,” Jean-Pierre Hubaux, a professor of computer science at the École Polytechnique Fédérale de Lausanne, said.