8 resultados para Open Research Data
em Duke University
Resumo:
BACKGROUND: Sharing of epidemiological and clinical data sets among researchers is poor at best, in detriment of science and community at large. The purpose of this paper is therefore to (1) describe a novel Web application designed to share information on study data sets focusing on epidemiological clinical research in a collaborative environment and (2) create a policy model placing this collaborative environment into the current scientific social context. METHODOLOGY: The Database of Databases application was developed based on feedback from epidemiologists and clinical researchers requiring a Web-based platform that would allow for sharing of information about epidemiological and clinical study data sets in a collaborative environment. This platform should ensure that researchers can modify the information. A Model-based predictions of number of publications and funding resulting from combinations of different policy implementation strategies (for metadata and data sharing) were generated using System Dynamics modeling. PRINCIPAL FINDINGS: The application allows researchers to easily upload information about clinical study data sets, which is searchable and modifiable by other users in a wiki environment. All modifications are filtered by the database principal investigator in order to maintain quality control. The application has been extensively tested and currently contains 130 clinical study data sets from the United States, Australia, China and Singapore. Model results indicated that any policy implementation would be better than the current strategy, that metadata sharing is better than data-sharing, and that combined policies achieve the best results in terms of publications. CONCLUSIONS: Based on our empirical observations and resulting model, the social network environment surrounding the application can assist epidemiologists and clinical researchers contribute and search for metadata in a collaborative environment, thus potentially facilitating collaboration efforts among research communities distributed around the globe.
Resumo:
How should funding agencies enable researchers to explore high-risk but potentially high-reward science? One model that appears to work is the NSF-funded synthesis center, an incubator for community-led, innovative science.
Resumo:
The main conclusion of this dissertation is that global H2 production within young ocean crust (<10 Mya) is higher than currently recognized, in part because current estimates of H2 production accompanying the serpentinization of peridotite may be too low (Chapter 2) and in part because a number of abiogenic H2-producing processes have heretofore gone unquantified (Chapter 3). The importance of free H2 to a range of geochemical processes makes the quantitative understanding of H2 production advanced in this dissertation pertinent to an array of open research questions across the geosciences (e.g. the origin and evolution of life and the oxidation of the Earth’s atmosphere and oceans).
The first component of this dissertation (Chapter 2) examines H2 produced within young ocean crust [e.g. near the mid-ocean ridge (MOR)] by serpentinization. In the presence of water, olivine-rich rocks (peridotites) undergo serpentinization (hydration) at temperatures of up to ~500°C but only produce H2 at temperatures up to ~350°C. A simple analytical model is presented that mechanistically ties the process to seafloor spreading and explicitly accounts for the importance of temperature in H2 formation. The model suggests that H2 production increases with the rate of seafloor spreading and the net thickness of serpentinized peridotite (S-P) in a column of lithosphere. The model is applied globally to the MOR using conservative estimates for the net thickness of lithospheric S-P, our least certain model input. Despite the large uncertainties surrounding the amount of serpentinized peridotite within oceanic crust, conservative model parameters suggest a magnitude of H2 production (~1012 moles H2/y) that is larger than the most widely cited previous estimates (~1011 although previous estimates range from 1010-1012 moles H2/y). Certain model relationships are also consistent with what has been established through field studies, for example that the highest H2 fluxes (moles H2/km2 seafloor) are produced near slower-spreading ridges (<20 mm/y). Other modeled relationships are new and represent testable predictions. Principal among these is that about half of the H2 produced globally is produced off-axis beneath faster-spreading seafloor (>20 mm/y), a region where only one measurement of H2 has been made thus far and is ripe for future investigation.
In the second part of this dissertation (Chapter 3), I construct the first budget for free H2 in young ocean crust that quantifies and compares all currently recognized H2 sources and H2 sinks. First global estimates of budget components are proposed in instances where previous estimate(s) could not be located provided that the literature on that specific budget component was not too sparse to do so. Results suggest that the nine known H2 sources, listed in order of quantitative importance, are: Crystallization (6x1012 moles H2/y or 61% of total H2 production), serpentinization (2x1012 moles H2/y or 21%), magmatic degassing (7x1011 moles H2/y or 7%), lava-seawater interaction (5x1011 moles H2/y or 5%), low-temperature alteration of basalt (5x1011 moles H2/y or 5%), high-temperature alteration of basalt (3x1010 moles H2/y or <1%), catalysis (3x108 moles H2/y or <<1%), radiolysis (2x108 moles H2/y or <<1%), and pyrite formation (3x106 moles H2/y or <<1%). Next we consider two well-known H2 sinks, H2 lost to the ocean and H2 occluded within rock minerals, and our analysis suggests that both are of similar size (both are 6x1011 moles H2/y). Budgeting results suggest a large difference between H2 sources (total production = 1x1013 moles H2/y) and H2 sinks (total losses = 1x1011 moles H2/y). Assuming this large difference represents H2 consumed by microbes (total consumption = 9x1011 moles H2/y), we explore rates of primary production by the chemosynthetic, sub-seafloor biosphere. Although the numbers presented require further examination and future modifications, the analysis suggests that the sub-seafloor H2 budget is similar to the sub-seafloor CH4 budget in the sense that globally significant quantities of both of these reduced gases are produced beneath the seafloor but never escape the seafloor due to microbial consumption.
The third and final component of this dissertation (Chapter 4) explores the self-organization of barchan sand dune fields. In nature, barchan dunes typically exist as members of larger dune fields that display striking, enigmatic structures that cannot be readily explained by examining the dynamics at the scale of single dunes, or by appealing to patterns in external forcing. To explore the possibility that observed structures emerge spontaneously as a collective result of many dunes interacting with each other, we built a numerical model that treats barchans as discrete entities that interact with one another according to simplified rules derived from theoretical and numerical work, and from field observations: Dunes exchange sand through the fluxes that leak from the downwind side of each dune and are captured on their upstream sides; when dunes become sufficiently large, small dunes are born on their downwind sides (“calving”); and when dunes collide directly enough, they merge. Results show that these relatively simple interactions provide potential explanations for a range of field-scale phenomena including isolated patches of dunes and heterogeneous arrangements of similarly sized dunes in denser fields. The results also suggest that (1) dune field characteristics depend on the sand flux fed into the upwind boundary, although (2) moving downwind, the system approaches a common attracting state in which the memory of the upwind conditions vanishes. This work supports the hypothesis that calving exerts a first order control on field-scale phenomena; it prevents individual dunes from growing without bound, as single-dune analyses suggest, and allows the formation of roughly realistic, persistent dune field patterns.
Resumo:
While substance use problems are considered to be common in medical settings, they are not systematically assessed and diagnosed for treatment management. Research data suggest that the majority of individuals with a substance use disorder either do not use treatment or delay treatment-seeking for over a decade. The separation of substance abuse services from mainstream medical care and a lack of preventive services for substance abuse in primary care can contribute to under-detection of substance use problems. When fully enacted in 2014, the Patient Protection and Affordable Care Act 2010 will address these barriers by supporting preventive services for substance abuse (screening, counseling) and integration of substance abuse care with primary care. One key factor that can help to achieve this goal is to incorporate the standardized screeners or common data elements for substance use and related disorders into the electronic health records (EHR) system in the health care setting. Incentives for care providers to adopt an EHR system for meaningful use are part of the Health Information Technology for Economic and Clinical Health Act 2009. This commentary focuses on recent evidence about routine screening and intervention for alcohol/drug use and related disorders in primary care. Federal efforts in developing common data elements for use as screeners for substance use and related disorders are described. A pressing need for empirical data on screening, brief intervention, and referral to treatment (SBIRT) for drug-related disorders to inform SBIRT and related EHR efforts is highlighted.
Resumo:
BACKGROUND: The ability to write clearly and effectively is of central importance to the scientific enterprise. Encouraged by the success of simulation environments in other biomedical sciences, we developed WriteSim TCExam, an open-source, Web-based, textual simulation environment for teaching effective writing techniques to novice researchers. We shortlisted and modified an existing open source application - TCExam to serve as a textual simulation environment. After testing usability internally in our team, we conducted formal field usability studies with novice researchers. These were followed by formal surveys with researchers fitting the role of administrators and users (novice researchers) RESULTS: The development process was guided by feedback from usability tests within our research team. Online surveys and formal studies, involving members of the Research on Research group and selected novice researchers, show that the application is user-friendly. Additionally it has been used to train 25 novice researchers in scientific writing to date and has generated encouraging results. CONCLUSION: WriteSim TCExam is the first Web-based, open-source textual simulation environment designed to complement traditional scientific writing instruction. While initial reviews by students and educators have been positive, a formal study is needed to measure its benefits in comparison to standard instructional methods.
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule.