16 resultados para Computational integration
em Duke University
Resumo:
Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Resumo:
We construct a theory to compare vertically integrated firms to networks of manufacturers and suppliers. Vertically integrated firms make their own specialized inputs. In networks, manufacturers procure specialized inputs from suppliers that, in turn, sell to several manufacturers. The analysis shows that networks can yield greater social welfare when manufacturers experience large idiosyncratic demand shocks. Individual firms may also have the incentive to form networks, despite the lack of long-term contracts. The analysis is supported by existing evidence and provides predictions as to the shape of different industries.
Resumo:
Axisymmetric radiating and scattering structures whose rotational invariance is broken by non-axisymmetric excitations present an important class of problems in electromagnetics. For such problems, a cylindrical wave decomposition formalism can be used to efficiently obtain numerical solutions to the full-wave frequency-domain problem. Often, the far-field, or Fraunhofer region is of particular interest in scattering cross-section and radiation pattern calculations; yet, it is usually impractical to compute full-wave solutions for this region. Here, we propose a generalization of the Stratton-Chu far-field integral adapted for 2.5D formalism. The integration over a closed, axially symmetric surface is analytically reduced to a line integral on a meridional plane. We benchmark this computational technique by comparing it with analytical Mie solutions for a plasmonic nanoparticle, and apply it to the design of a three-dimensional polarization-insensitive cloak.
Resumo:
We report a comprehensive study of the binary systems of the platinum-group metals with the transition metals, using high-throughput first-principles calculations. These computations predict stability of new compounds in 28 binary systems where no compounds have been reported in the literature experimentally and a few dozen of as-yet unreported compounds in additional systems. Our calculations also identify stable structures at compound compositions that have been previously reported without detailed structural data and indicate that some experimentally reported compounds may actually be unstable at low temperatures. With these results, we construct enhanced structure maps for the binary alloys of platinum-group metals. These maps are much more complete, systematic, and predictive than those based on empirical results alone.
Resumo:
Tissue-engineered skeletal muscle can serve as a physiological model of natural muscle and a potential therapeutic vehicle for rapid repair of severe muscle loss and injury. Here, we describe a platform for engineering and testing highly functional biomimetic muscle tissues with a resident satellite cell niche and capacity for robust myogenesis and self-regeneration in vitro. Using a mouse dorsal window implantation model and transduction with fluorescent intracellular calcium indicator, GCaMP3, we nondestructively monitored, in real time, vascular integration and the functional state of engineered muscle in vivo. During a 2-wk period, implanted engineered muscle exhibited a steady ingrowth of blood-perfused microvasculature along with an increase in amplitude of calcium transients and force of contraction. We also demonstrated superior structural organization, vascularization, and contractile function of fully differentiated vs. undifferentiated engineered muscle implants. The described in vitro and in vivo models of biomimetic engineered muscle represent enabling technology for novel studies of skeletal muscle function and regeneration.
Resumo:
Proteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.
Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.
Resumo:
Determining how information flows along anatomical brain pathways is a fundamental requirement for understanding how animals perceive their environments, learn, and behave. Attempts to reveal such neural information flow have been made using linear computational methods, but neural interactions are known to be nonlinear. Here, we demonstrate that a dynamic Bayesian network (DBN) inference algorithm we originally developed to infer nonlinear transcriptional regulatory networks from gene expression data collected with microarrays is also successful at inferring nonlinear neural information flow networks from electrophysiology data collected with microelectrode arrays. The inferred networks we recover from the songbird auditory pathway are correctly restricted to a subset of known anatomical paths, are consistent with timing of the system, and reveal both the importance of reciprocal feedback in auditory processing and greater information flow to higher-order auditory areas when birds hear natural as opposed to synthetic sounds. A linear method applied to the same data incorrectly produces networks with information flow to non-neural tissue and over paths known not to exist. To our knowledge, this study represents the first biologically validated demonstration of an algorithm to successfully infer neural information flow networks.
Resumo:
The Centrality of Event Scale (CES) measures the extent to which a traumatic memory forms a central component of personnal identity, a turning point in the life story and a reference point for everyday inferences. In two studies, we show that the CES is positively correlated with severity of PTSD symptoms, even when controlling for measures of anxiety, depression, dissociation and self-consciousness. The findings contradict the widespread view that poor integration of the traumatic memory into one's life story is a main cause of PTSD symptoms. Instead, enhanced integration appears to be a key issue. Copyright © 2006 John Wiley & Sons, Ltd.
Resumo:
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Resumo:
BACKGROUND: Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. METHODOLOGY/PRINCIPAL FINDINGS: Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. CONCLUSIONS/SIGNIFICANCE: Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.
Resumo:
BACKGROUND: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. METHODOLOGY/PRINCIPAL FINDINGS: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. CONCLUSIONS/SIGNIFICANCE: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
Resumo:
In chimpanzees, most females disperse from the community in which they were born to reproduce in a new community, thereby eliminating the risk of inbreeding with close kin. However, across sites, some females breed in their natal community, raising questions about the flexibility of dispersal, the costs and benefits of different strategies and the mitigation of costs associated with dispersal and integration. In this dissertation I address these questions by combining long-term behavioral data and recent field observations on maturing and young adult females in Gombe National Park with an experimental manipulation of relationship formation in captive apes in the Congo.
To assess the risk of inbreeding for females who do and do not disperse, 129 chimpanzees were genotyped and relatedness between each dyad was calculated. Natal females were more closely related to adult community males than were immigrant females. By examining the parentage of 58 surviving offspring, I found that natal females were not more related to the sires of their offspring than were immigrant females, despite three instances of close inbreeding. The sires of all offspring were less related to the mothers than non-sires regardless of the mother’s residence status. These results suggest that chimpanzees are capable of detecting relatedness and that, even when remaining natal, females can largely avoid, though not eliminate, inbreeding.
Next, I examined whether dispersal was associated with energetic, social, physiological and/or reproductive costs by comparing immigrant (n=10) and natal (n=9) females of similar age using 2358 hours of observational data. Natal and immigrant females did not differ in any energetic metric. Immigrant females received aggression from resident females more frequently than natal females. Immigrants spent less time in social grooming and more time self-grooming than natal females. Immigrant females primarily associated with resident males, had more social partners and lacked close social allies. There was no difference in levels of fecal glucocorticoid metabolites in immigrant and natal females. Immigrant females gave birth 2.5 years later than natal females, though the survival of their first offspring did not differ. These results indicate that immigrant females in Gombe National Park do not face energetic deficits upon transfer, but they do enter a hostile social environment and have a delayed first birth.
Next, I examined whether chimpanzees use condition- and phenotype-dependent cues in making dispersal decisions. I examined the effect of social and environmental conditions present at the time females of known age matured (n=25) on the females’ dispersal decisions. Females were more likely to disperse if they had more male maternal relatives and thus, a high risk of inbreeding. Females with a high ranking mother and multiple maternal female kin tended to disperse less frequently, suggesting that a strong female kin network provides benefits to the maturing daughter. Females were also somewhat less likely to disperse when fewer unrelated males were present in the group. Habitat quality and intrasexual competition did not affect dispersal decisions. Using a larger sample of 62 females observed as adults in Gombe, I also detected an effect of phenotypic differences in personality on the female’s dispersal decisions; extraverted, agreeable and open females were less likely to disperse.
Natural observations show that apes use grooming and play as social currency, but no experimental manipulations have been carried out to measure the effects of these behaviors on relationship formation, an essential component of integration. Thirty chimpanzees and 25 bonobos were given a choice between an unfamiliar human who had recently groomed or played with them over one who did not. Both species showed a preference for the human that had interacted with them, though the effect was driven by males. These results support the idea that grooming and play act as social currency in great apes that can rapidly shape social relationships between unfamiliar individuals. Further investigation is needed to elucidate the use of social currency in female apes.
I conclude that dispersal in female chimpanzees is flexible and the balance of costs and benefits varies for each individual. Females likely take into account social cues present at maturity and their own phenotype in choosing a settlement path and are especially sensitive to the presence of maternal male kin. The primary cost associated with philopatry is inbreeding risk and the primary cost associated with dispersal is delay in the age at first birth, presumably resulting from intense social competition. Finally, apes may strategically make use of affiliative behavior in pursuing particular relationships, something that should be useful in the integration process.
Resumo:
Our media is saturated with claims of ``facts'' made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim ``cherry-picking''? This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks --- reverse-engineering vague claims, and countering questionable claims --- as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place. This is achieved to using a limited number of high-valued claims to represent high-valued regions of the QRS. Besides the general purpose high-quality claim finding problem, lead-finding can be tailored towards specific claim quality measures, also defined within the QRS framework. An example of uniqueness-based lead-finding is presented for ``one-of-the-few'' claims, landing in interpretable high-quality claims, and an adjustable mechanism for ranking objects, e.g. NBA players, based on what claims can be made for them. Finally, we study the use of visualization as a powerful way of conveying results of a large number of claims. An efficient two stage sampling algorithm is proposed for generating input of 2d scatter plot with heatmap, evalutaing a limited amount of data, while preserving the two essential visual features, namely outliers and clusters. For all the problems, we present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.
Resumo:
With increasing recognition of the roles RNA molecules and RNA/protein complexes play in an unexpected variety of biological processes, understanding of RNA structure-function relationships is of high current importance. To make clean biological interpretations from three-dimensional structures, it is imperative to have high-quality, accurate RNA crystal structures available, and the community has thoroughly embraced that goal. However, due to the many degrees of freedom inherent in RNA structure (especially for the backbone), it is a significant challenge to succeed in building accurate experimental models for RNA structures. This chapter describes the tools and techniques our research group and our collaborators have developed over the years to help RNA structural biologists both evaluate and achieve better accuracy. Expert analysis of large, high-resolution, quality-conscious RNA datasets provides the fundamental information that enables automated methods for robust and efficient error diagnosis in validating RNA structures at all resolutions. The even more crucial goal of correcting the diagnosed outliers has steadily developed toward highly effective, computationally based techniques. Automation enables solving complex issues in large RNA structures, but cannot circumvent the need for thoughtful examination of local details, and so we also provide some guidance for interpreting and acting on the results of current structure validation for RNA.
Resumo:
The economic and social consequences of international trade agreements have become a major area of inquiry in development studies in recent years. As evidenced by the energetic protests surrounding the Seattle meeting of the World Trade Organization (WTO) in December 1999 and the controversy about China's admission to the WTO, such agreements have also become a focus of political conflict in both the developed and developing countries. At issue are questions of job gains and job losses in different regions, prices paid by consumers, acceptable standards for wages and working conditions in transnational manufacturing industries, and the quality of the environment. All these concerns have arisen with regard to the North American Free Trade Agreement (NAFTA) and can be addressed through an examination of changes in the dynamics of the apparel industry in the post-NAFTA period.1 In this book, we examine the evolution of the apparel industry in North America in order to address some of these questions as they pertain to North America, with an eye toward the broader implications of our findings. We also consider the countries of the Caribbean Basin and Central America, whose textile and apparel goods are now allowed to enter the U.S. market on the same basis as those from Canada and Mexico (Odessey 2000). © 2009 by Temple University Press. All rights reserved.