Deep dive for dark matter may aid all of data science

National Science Foundation backs Rice-led effort to create science-aware artificial intelligence

A Rice University scientist and his colleagues are booting their search for dark matter into a study they hope will enhance all of data science.

Rice astroparticle physicist Christopher Tunnell and his team have received a $1 million National Science Foundation (NSF) grant to reimagine data science techniques and help push data-intensive physical sciences past the tipping point to discovery.

Experiments in the physical sciences are starting to produce thousands of terabytes of data, Tunnell said. "These datasets are fundamentally different from large datasets of everyday photos, text or video," he said. "Ours relate to experiences of the natural world that only highly specialized instruments and sensors can 'see.'"

In tackling this class of problem, the two-year project aims to influence the way data scientists use machine and deep learning in bioinformatics, computational biology, materials science and environmental sciences. Tunnell said the goal is to support these physical science communities through a "domain-enhanced" data science institute.

"In large astroparticle data sets, we often look for the faintest signals that anyone has ever attempted to measure," said Tunnell, an assistant professor of physics and astronomy and computer science and lead investigator on the project.

"Science is incremental," he said, explaining the domain-enhanced approach. "We have spent decades building up mankind's most precise physical theories, which provide the foundation for these measurements. When using machine learning in this realm, the machine has to learn through its own 'Phys 101.' But the great artificial intelligence advancements of the last decade have been mostly in computer vision and natural language processing with a muted impact in physical sciences."

Tunnell's co-investigators are Waheed Bajwa, an associate professor of electrical and computer engineering at Rutgers University, and Hagit Shatkay, a professor of computer and information sciences at the University of Delaware. The team formed at an Ideas Lab run by the NSF and Knowinnovation that brought together scientists and engineers to facilitate novel data science ideas that did not fit any disciplinary mold.

The researchers argue that particle physics can serve as a driver for technological advances that are later used by other sciences in the same way that data-handling needs at the European Organization for Nuclear Research (known as CERN) led to the development of the World Wide Web.

"Our proposal focuses on one scientific application -- in this case astroparticle physics -- to test out multiple novel methods," Tunnell said. "We are searching for solutions to a real-world problem rather than problems that fit our solution. That, in my view, is what interdisciplinary science is about."

For the dark matter search, they need data science and machine-learning algorithms that improve measurements of particle interactions in their detectors. "This will simultaneously increase the ability to measure faint dark-matter signals while improving the precision of energy measurements," Tunnell said. "It will help the experiment be sensitive to neutrinoless double-beta decay, a process that sheds light on the nature of neutrino mass and, potentially, why our universe is made of matter."

He said they will employ probabilistic graphical models that allow them to encode their knowledge of science, as well as inverse problem formulations that teach machine-learning routines enough that they can learn the rest on their own.

Tunnell has already gained a foothold in the search for dark matter, even if the matter itself is not at hand. Earlier this year, he and colleagues at the XENON1T experiment announced in Nature they had found the first physical evidence of the material with the longest half-life ever measured. The sophisticated detector under a mountain in Italy discovered that Xenon 124 has a half-life of 18 sextillion years, demonstrating that the experiment and subsequent data science can measure exotic physical signals.

He noted the grant incorporates funds for educational outreach and training of data scientists in the techniques under development.

Tunnell's group was formed as part of Rice’s Data Science Initiative, with additional seed funding for research from two Rice Creative Venture grants. This work has already led to one discovery: a strong friendly interdisciplinary team interested in trying something new," he said.