Centers of Excellence for Big Data Computing
The Big Data to Knowledge (BD2K) Centers of Excellence have developed new approaches, methods, software tools and related resources including publications, data standards, and educational resources to advance Big Data Science in their relevant biomedical area of focus.
BD2K Centers Resources
- Big Data for Discovery Science (BDDS)
- Researchers at the BDDS focus on proteomics, genomics, and images of cells and brains collected from patients and subjects across the globe. They enable detection of patterns, trends, and relationships among these data for the efficient large-scale analysis of biomedical data. BDDS offers a variety of tools for data management and processing, genetic association studies, statistical analysis, and utilities for existing frameworks. See BBDS Tools page
- BD2K-Library of Integrated Network-based Cellular Signatures Data Coordination and Integration Center (BD2K LINCS-DCIC)
- The BD2K-LINCS DCIC conducts data science research focused on responses of human cells and tissues to perturbing agents like small molecules. The Center provides access to and analysis of this data by the broader biomedical research community. The Center also develops web-based tools and data standards for integrative data access, visualization, and analysis across the distributed LINCS and BD2K sites and other relevant data sources. See BD2K LINCS-DCIC Resource page.
- Center for Causal Modeling and Discovery of Biomedical Knowledge from Big Data (CCD)
- CCD develops computational methods known as causal discovery algorithms that can be used to discover causal relationships from a combination of observational data, experimental data, and prior knowledge. CCD offers a software suite for causal discovery from large and complex biomedical data sets. See CCD Tools page.
- Center for Expanded Data Annotation and Retrieval(CEDAR)
- CEDAR is building new web-based technology to make it easier for biomedical scientists to author detailed metadata that describe their experiments completely, adhere to appropriate community-based standards, and incorporate controlled terms that facilitate interoperability with other online data sets. CEDAR provides a combination of tools to aid the community in producing optimal metadata. See CEDAR tools page.
- Center for Mobility Data Integration to Insight (Mobilize)
- The Mobilize center is analyzing movement data from over 6 million individuals using a smartphone app, revealing new insights about physical activity levels around the world and the factors predictive of these activity levels. The Mobilize center engages the community in mobility big data efforts and promotes the use of big data analytics in biomedical computational research through a number of resources including software, data sources, and publications. See The Mobilize center resources page.
- Center for Predictive Computational Phenotyping (CPCP)
- CPCP aims to accelerate the impact of predictive modeling on clinical practice by developing computational and statistical methods and software for a range of computational phenotyping tasks, including extracting relevant phenotypes from complex data sources and predicting clinically important phenotypes before they are exhibited. See CPCP resources page.
- Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K)
- Researchers at MD2K develop tools to make it easier to gather, analyze, and interpret data from mobile and wearable sensors. The goal is to use those data to reliably quantify physical, biological, behavioral, social, and environmental factors that contribute to health and disease risk. Visit the MD2K page.
- ENIGMA Center for Worldwide Medicine, Imaging, and Genomics (ENIGMA)
- The ENIGMA Center develops computational methods for integration, clustering, and learning from complex biodata types to help identify factors that either resist or promote brain disease, or assist in the diagnosis and prognosis, as well as new mechanisms and drug targets for mental health care. Visit the “ENIGMA-Vis” site for tools to generate interactive plots from genome-wide associated studies (GWAS) data and estimating genetic similarity between user uploaded datasets and ENIGMA data.
- Heart BD2K, a Community Effort to Translate Protein Data to Knowledge: An Integrated Platform (Heart BD2K)
- The goal of the Heart BD2K Center is to democratize data research to include non-computational scientists and individuals and to apply innovative global community-driven data integration and modeling methods to address challenges involved in the study of protein structure, function, and networks with a focus on cardiovascular research. See Heart BD2K products page.
- KnowEng, a Scalable Knowledge Engine for Large-Scale Genomic Data (KnowEnG)
- The KnowEng Center built a computational Knowledge Engine that uses data mining and machine learning techniques to obtain and combine gene function and gene interaction information from disparate genomic data sources. See KnowEng platform.
- Patient-Centered Information Commons: Standardized Unification of Research Elements (PIC-SURE)
- Investigators at the PIC-SURE Center develop systems to combine genetic, environmental, imaging, behavioral, and clinical data on individual patients from multiple sources into integrated sets to enable more accurate classification of individual disease or disease risk, and facilitate greater precision in patient disease prevention and treatment strategies. See PIC-SURE products.
- BD2K Centers Coordination Center (BD2K CCC)
- The BD2KCCC helps to promote collaboration among the Centers and across the BD2K program, and coordinates BD2K Centers Consortium activities. A list of the BD2K Centers’ resources is available on the BD2K CCC website.