HID-Genetics: A Federated BIRN-enabled Data Management System for Clinical, Imaging, and Genome-Wide Association Studies

David Keator (University of California, Irvine), Jinran Chen (University of California, Irvine), Naveen Ashish (University of California, Irvine), Federica Torri (University of California, Irvine), Anita Lakatos (University of California, Irvine), Steven Potkin (University of California, Irvine), Fabio Macciardi (University of California, Irvine), Dingying Wei (University of California, Irvine)

In the past 20 years, enormous strides have been made in imaging the human brain’s structure and function. Equal steps have been made in understanding the human genome and its role in disease. Complex behavioral and neurodevelopmental/degenerative diseases such as schizophrenia and Alzheimer’s disease appear to involve the combined effects of multiple genes and important interactions with the external and internal environment. Given the known importance of both genetics and environment in brain function, and the role of neuroimaging in revealing brain dysfunction, the capability to integrate genetics with brain imaging data in a single data resource is needed. Currently, there are no open-source data management systems that support federated storage and retrieval of neuroimaging, clinical, and genetics data. The HID-Genetics component of the Human Imaging Database (HID; Ozyurt 2010) bridges the gap between support for federated neuroimaging and clinical data already included in the HID and the genetics data and annotation support needed for today’s imaging-genetics association studies.   

The HID is an open-source, extensible database schema and associated three-tier J2EE application environment for the storage and retrieval of biomedical data designed to operate in a federated database environment. The database contains an extensible framework for the definition and storage of clinical assessment and demographic data. The HID environment also contains a 1) intuitive web based user interface that can be used for the entry and management of subject’s data. A core component of this interface involves the management of behavioral and/or clinical data that uses modules which streamline the development of on-line forms for entry and maintenance of large numbers of measures; and 2) A data integration engine that builds on top of the BIRN data integration environment allowing multiple sites running the HID to create a federated database so that these sites can be queried as a single database resources from the web based user interface.

 To enable storage of genotype data in a federated environment and to integrate the genetics data with extensive clinical and imaging data collected on the same individuals, the HID-Genetics extensions have been designed. The core HID implementation has been augmented to include additional tables for storing Single Neucleotide Polymorphism (SNP) RS number, strand information, chromosome, GC score, allele and additional metrics and metadata from the genotyping platform. The system supports simple summary statistics to help check quality of the genetics data. Additional genetics annotation support includes the ability to import genetics annotation data from the genotyping platform and online sources such as UCSC genome browser (genome.ucsc.edu; Fujita 2011). We are developing capabilities for integration of useful external data, such as genetic annotation data from sources such as the UCSC Genome Browser and others, based on various modalities including (i) Direct (remote) database access, and (ii) Local materialization. Modeled after SNPLims data management system for genome wide association studies (Orro 2008), the HID-Genetics component allows for import of multiple common genetics data formats and export of data to files in format expected by analysis tools such as Plink and EIGENSTRAT. The existing HID query interface has been redesigned to support genetics based queries, filtering of human subject imaging data by genotype, and to provide the user with an intuitive interface for constructing queries across these different data types.      

 The system is being tested and evaluated using the Function Biomedical Informatics Research Network (FBIRN) test-bed.  The FBIRN consortium is actively collecting clinical, imaging, and genetics data across ten geographically distributed sites. It provides a unique environment to test and evaluate the performance and design of the HID-Genetics components.

 This work was supported in part by the NIH through the following NCRR grant: the Biomedical Informatics Research Network (1 U24 RR025736-01).

Ozyurt I.B., Keator D., Wei D., Fennema-Notestine C., Pease K., Bockholt B., Grethe J. Federated Web-accessible Clinical Data Management within an Extensible NeuroImaging Database. Neuroinformatics. 2010;23(1):98-106.

Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011 Jan;39(Database issue):D876-82. Epub 2010 Oct 18.

Orro A., Guffanti G., Salvi E., Macciardi F., Milanesi L. SNPLims: a data management system for genome wide association studies.  BMC Bioinformatics. 2008;9 Suppl 2: S13.

Preferred presentation format: Poster
Topic: Genomics and genetics

Document Actions