Keeping it Simple: Sharing 2631+ Resting State fMRI Datasets Through the FCP, INDI and ADHD-200 Sample

Maarten Mennes (NYU Langone Medical Center, New York, NY, USA), F. Xavier Castellanos (NYU Langone Medical Center, New York, NY, USA & Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA), Michael P. Milham (NYU Langone Medical Center, New York, NY, USA & Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA)

The era of discovery science for brain function was consolidated by the launch of the 1000 Functional Connectomes Project (FCP) in December 2009. The FCP publicly released over 1300 resting state fMRI (R-fMRI) datasets independently collected worldwide. Enthusiastically received in terms of pageviews and downloads, the FCP represented an initial step towards addressing the challenge of providing access to large-scale imaging samples to a broad scientific community. In the next iteration, the International Neuroimaging Data-sharing Initiative (INDI) was launched in October 2010 to facilitate sharing of imaging data with corresponding phenotypic data, and to foster a shift to prospective data sharing. Since its launch, 14 neuroimaging groups have agreed to unrestricted, regularly scheduled sharing of datasets, regardless of publication status; 295 datasets have been shared to date. In addition, 1009 previously published datasets were shared retrospectively, including data from 285 children with ADHD and 491 controls through the ADHD-200 Sample. In total, 2631 resting state fMRI datasets are currently available for download via the FCP website at the Neuroimaging Informatics Tools and Resources Clearinghouse (, a number that is growing weekly.

While embraced by users, sharing data is not trivial for the provider. For data-sharing to succeed, shared datasets should be abundant, easily accessible in a readily usable format and continuously monitored and maintained. Otherwise, users will lose time, interest and trust. The easiest roadblock to tackle was persuading researchers to openly share their data. The movement towards unrestricted data sharing is gaining momentum, encouraged by funding agencies. Preparing the data for successful sharing, however, required considerable effort systematizing idiosyncrasies, as each lab typically maintains its own data structure, naming convention, datatype (e.g., DICOM vs. NIFTI), image orientation (e.g., ASL vs. RPI), etc. While typically easily accomplished through an automated pipeline, we encountered numerous exceptions often resulting in manual, dataset-specific operations. For instance, image information such as orientation, number of acquired volumes and length of acquisition are stored in the image header. Yet, some image operations can covertly corrupt the header information. In the FCP, this led to a left-right discrepancy between the anatomical and functional images included in some datasets as well as incorrect voxel size representations in others. Indeed, as some image idiosyncrasies remained unnoticed during data preparation, we benefited from user feedback to identify possible erroneous datasets.

As an unfunded grassroots effort, we did not provide resources such as cloud computing, an integrated database or advanced processing pipelines. Since appropriate tools are abundantly available (most of them open source and free of charge) we prioritized providing researchers with properly organized raw data. This approach has resulted in over 25,000 downloads since December 2009, recruiting researchers without direct access to imaging data (e.g., statisticians, computer scientists, mathematicians). Still, we continue to seek to provide a more optimized data sharing experience. Thereby, we rely on continued user input to improve our efforts, both at the front- and back-end, reshaping the neuroimaging landscape one shared dataset at a time.  

Preferred presentation format: Poster
Topic: Neuroimaging

Document Actions