Hsu, W, Speier, W, Taira, RK
AMIA Annu Symp Proc. 2012. p. 350-9.
Publication year: 2012

Randomized controlled trials are an important source of evidence for guiding clinical decisions when treating a patient. However, given the large number of studies and their variability in quality, determining how to summarize reported results and formalize them as part of practice guidelines continues to be a challenge. We have developed a set of information extraction and annotation tools to automate the identification of key information from papers related to the hypothesis, sample size, statistical test, confidence interval, significance level, and conclusions. We adapted the Automated Sequence Annotation Pipeline to map extracted phrases to relevant knowledge sources. We trained and tested our system on a corpus of 42 full-text articles related to chemotherapy of non-small cell lung cancer. On our test set of 7 papers, we obtained an overall precision of 86%, recall of 78%, and an F-score of 0.82 for classifying sentences. This work represents our efforts towards utilizing this information for quality assessment, meta-analysis, and modeling.