On September 27–28, Computation hosted a multidisciplinary workshop entitled Combating Cancer: Big Data Challenges for Precision Medicine. Participants included the Norwegian General Consul and Consul as well as the Director and representatives of the Cancer Registry of Norway (CRN). Other attendees came from Microsoft, RAND Corporation, and the software company Crayon. Held at the Livermore Computing Complex, the event addressed a range of challenges in data collection, management, analysis, and decision making.
Organized by Livermore computer scientist Ghaleb Abdulla and CRN researchers Jan Nygård and Mari Nygård, the workshop advanced one of several pilot projects within the national Cancer Moonshot effort that leverage Livermore’s high performance computing (HPC) capabilities. The Cancer Moonshot was initiated in 2016 by former U.S. Vice President Joe Biden to improve cancer prevention, diagnosis, treatment, and care by doubling the rate of progress.
According to Jim Brase, Computation’s Deputy Associate Director for Programs, health security is a matter of national security and, therefore, an important facet of the Laboratory’s mission. “The Department of Energy [DOE] invests in biomedical research to help solve important problems. As biomedical analytics become more data-centric, our sophisticated computing capabilities position us to use machine learning–based analytics to challenge theory and simulations with experiments,” he says
While Livermore brings computing resources to the Combating Cancer project, CRN brings a large data set. Norway’s public health care system maintains registries of health histories, and CRN oversees a national database for millions of cervical cancer screening results. The project team is analyzing 25 years of CRN’s data (1991–2015) to help develop personalized cancer prevention and treatment strategies. Pattern recognition, machine learning, and time-series statistics are part of the strategy to enhance precision medicine and improve the outcome of patients affected by cervical cancer.
Figure 1. Jan Nygård, of CRN’s Registry Informatics Department, explains the project’s multifaceted approach to data privacy. Nygård and another CRN colleague, Mari Nygård, are working onsite at Livermore for a 6-month period of close collaboration. (Click to enlarge.)
A New Phase
The workshop came at a crucial phase in the project, when collaborators are expanding research activities and supporting a new funding stream from the Laboratory Directed Research and Development (LDRD) Program. “We are at a stage to explore new domains necessary for health data,” says Abdulla. “For instance, how do you define health care policies that come from interpreting data?”
The workshop was divided into sessions examining the landscape of precision medicine, data privacy and secure computing, cancer screening modeling, and future opportunities. Livermore Computing staff presented guests with a video tour of the Laboratory’s supercomputing ecosystem, including machine rooms.
Using patient data means using it responsibly. Several workshop discussions centered on strategies for protecting sensitive information and establishing a secure analytic environment, including perimeter security (such as a firewall), encryption (of network and data storage), and data restrictions (such as how much information the user can view at one time).
Encryption is computationally expensive, hence the need for Livermore’s powerful HPC resources. A major project objective involves determining the best way to encrypt, decrypt, and re-encrypt patient data at different stages of processing while still ensuring its integrity. Although cloud-based data storage provides easy scalability, encryption challenges increase. Workshop participants considered ways to reduce external risks in the cloud environment, such as by performing computation of encrypted data without decrypting it—a process known as homomorphic encryption.
Figure 2. Livermore computer scientist Kyle Halliday describes a new software program called HPCrypt, which is designed to enforce data encryption in the HPC environment. Halliday’s team is developing data-access permissions while running HPCrypt in the Catalyst test environment. A long-term goal is to make the software available as open source. (Click to enlarge.)
Machine learning, another workshop topic, plays a role in crunching large data sets. A series of presentations considered the improvements machine learning can bring to cancer screening—in other words, making the difficult work of modeling data easier—as well as the logistical challenges of implementing predictive analytics in clinical practice.
For example, cytology and histology results offer only part of a patient’s health history. A lifestyle survey, such as one asking about a patient’s social habits including smoking, might be administered only once. Workshop attendees explored solutions for a tricky question: How can researchers create data points over time for such survey results, then integrate that information into laboratory test data?
Ultimately, Abdulla explains, “the goal of our model is to predict the cervical cancer screening interval for a woman using her data, not just her age. We mathematicians and computer scientists can build models, but other questions arise when models are put into practical use. We want policymakers to be aware of what the data can tell them—not only to decrease costs, but to reduce the patient’s anxiety about cancer screening and prevent unneeded tests.”
The workshop’s final session described prototypes in progress and summarized related work, including the Laboratory’s other Cancer Moonshot efforts. In addition to the new LDRD project, Livermore teams are developing computational analyses of cancer genetics and simulations that can predict cancer growth.
The Combating Cancer project team’s plans to reconvene in Norway in 2018 reinforce the value of face-to-face interaction. “At this workshop, we connected new team members, such as those working on the new LDRD effort, with our partners,” states Abdulla, who emphasizes the importance of maintaining trust between international collaborators. “The CRN has delivered Norwegian citizens’ health data to another country’s researchers. Data is the ‘gold’ of the digital age, and our work must continue to safeguard it as we strive for more effective cancer screening.”
Figure 3. Laboratory computer scientist Jonathan Allen provides an overview of the Candle effort (CANcer Distributed Leaning Environment), which is a joint project among four DOE laboratories, the National Cancer Institute, and commercial partners. The Candle team is applying a deep learning framework to large-scale data processing. (Click to enlarge.)