Singleton, KW, Hsu, W, Bui, AAT
In AMIA Annu Symp Proc. 2012. p. 1385-92.
Publication year: 2012

The growing amount of electronic data collected from patient care and clinical trials is motivating the creation of national repositories where multiple institutions share data about their patient cohorts. Such efforts aim to provide sufficient sample sizes for data mining and predictive modeling, ultimately improving treatment recommendations and patient outcome prediction. While these repositories offer the potential to improve our understanding of a disease, potential issues need to be addressed to ensure that multi-site data and resultant predictive models are useful to non-contributing institutions. In this paper we examine the challenges of utilizing National Cancer Institute datasets for modeling glioblastoma multiforme. We created several types of prognostic models and compared their results against models generated using data solely from our institution. While overall model performance between the data sources was similar, different variables were selected during model generation, suggesting that mapping data resources between models is not a straightforward issue.