Scientists comprehend a large amount of evidence to determine what experiments to perform next, but the evidence that is available today in many ways exceeds what any individual researcher can digest and process. Although statistical methods show the significance of experimental results, no similar quantitative methods exist either to justify hypotheses or select experimental designs. In this project, we are formulating a new graphical paradigm that formalizes this particular activity by representing experiments in literature and providing quantitative metrics for selecting the next experiment. Initially targeted for biologists, our proposed methods can: 1) integrate multiple forms of causal evidence from both primary data and literature; 2) compute all possible causal graph structures consistent with this evidence; and 3) propose new experiments based on formal metrics of causal uncertainty. We hypothesize that representing causal information — both qualitative and quantitative — with causal graphical models will give an analytic basis to quantify causal uncertainty and guide experiment selection.
The objective of this work is: 1) to translate experimental results from biological data and literature into the formal language of causal graphical models using constraint programming; and 2) to develop methods to automatically rank experiments by their ability to reduce causal uncertainty. To derive causal graphs from annotated literature, we are implementing a state-of-the-art causal discovery algorithm that processes constraints on causal structure expressed as logical propositions. These propositions permit us to represent a range of constraints — from those computed statistically with data to those conveyed only qualitatively in free-text. With such constraints, a generalized constraint solver computes the causal structures that are consistent with the evidence. With formal casual graphs, we will define metrics of causal uncertainty that will both quantify the underdetermination of a causal structure and identify which new experiments would most effectively reduce this uncertainty. Experiment selection will be framed as the search for specific experiments that eliminate the most causal structures from consideration.
These methods will be evaluated in a variety of domains, starting with Noonan syndrome (NS). We have a longstanding collaboration with Dr. Alcino Silva’s Laboratory in the Department of Neurobiology, which studies learning and memory. This project will result in a formal, causal graphical model of NS, derived from both literature and primary data. We will release these tools as part of an open source, freely available application called ResearchMaps.org. We hypothesize that our experiments will show how informally integrating evidence causes biases, and how computational methods can free us from such biases to consider the entire combinatorial space of causal interpretations.