5 resultados para Signature Verification, Forgery Detection, Fuzzy Modeling
em Duke University
Resumo:
There is great potential for host-based gene expression analysis to impact the early diagnosis of infectious diseases. In particular, the influenza pandemic of 2009 highlighted the challenges and limitations of traditional pathogen-based testing for suspected upper respiratory viral infection. We inoculated human volunteers with either influenza A (A/Brisbane/59/2007 (H1N1) or A/Wisconsin/67/2005 (H3N2)), and assayed the peripheral blood transcriptome every 8 hours for 7 days. Of 41 inoculated volunteers, 18 (44%) developed symptomatic infection. Using unbiased sparse latent factor regression analysis, we generated a gene signature (or factor) for symptomatic influenza capable of detecting 94% of infected cases. This gene signature is detectable as early as 29 hours post-exposure and achieves maximal accuracy on average 43 hours (p = 0.003, H1N1) and 38 hours (p-value = 0.005, H3N2) before peak clinical symptoms. In order to test the relevance of these findings in naturally acquired disease, a composite influenza A signature built from these challenge studies was applied to Emergency Department patients where it discriminates between swine-origin influenza A/H1N1 (2009) infected and non-infected individuals with 92% accuracy. The host genomic response to Influenza infection is robust and may provide the means for detection before typical clinical symptoms are apparent.
Resumo:
The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.
Resumo:
DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Resumo:
Marine mammals exploit the efficiency of sound propagation in the marine environment for essential activities like communication and navigation. For this reason, passive acoustics has particularly high potential for marine mammal studies, especially those aimed at population management and conservation. Despite the rapid realization of this potential through a growing number of studies, much crucial information remains unknown or poorly understood. This research attempts to address two key knowledge gaps, using the well-studied bottlenose dolphin (Tursiops truncatus) as a model species, and underwater acoustic recordings collected on four fixed autonomous sensors deployed at multiple locations in Sarasota Bay, Florida, between September 2012 and August 2013. Underwater noise can hinder dolphin communication. The ability of these animals to overcome this obstacle was examined using recorded noise and dolphin whistles. I found that bottlenose dolphins are able to compensate for increased noise in their environment using a wide range of strategies employed in a singular fashion or in various combinations, depending on the frequency content of the noise, noise source, and time of day. These strategies include modifying whistle frequency characteristics, increasing whistle duration, and increasing whistle redundancy. Recordings were also used to evaluate the performance of six recently developed passive acoustic abundance estimation methods, by comparing their results to the true abundance of animals, obtained via a census conducted within the same area and time period. The methods employed were broadly divided into two categories – those involving direct counts of animals, and those involving counts of cues (signature whistles). The animal-based methods were traditional capture-recapture, spatially explicit capture-recapture (SECR), and an approach that blends the “snapshot” method and mark-recapture distance sampling, referred to here as (SMRDS). The cue-based methods were conventional distance sampling (CDS), an acoustic modeling approach involving the use of the passive sonar equation, and SECR. In the latter approach, detection probability was modelled as a function of sound transmission loss, rather than the Euclidean distance typically used. Of these methods, while SMRDS produced the most accurate estimate, SECR demonstrated the greatest potential for broad applicability to other species and locations, with minimal to no auxiliary data, such as distance from sound source to detector(s), which is often difficult to obtain. This was especially true when this method was compared to traditional capture-recapture results, which greatly underestimated abundance, despite attempts to account for major unmodelled heterogeneity. Furthermore, the incorporation of non-Euclidean distance significantly improved model accuracy. The acoustic modelling approach performed similarly to CDS, but both methods also strongly underestimated abundance. In particular, CDS proved to be inefficient. This approach requires at least 3 sensors for localization at a single point. It was also difficult to obtain accurate distances, and the sample size was greatly reduced by the failure to detect some whistles on all three recorders. As a result, this approach is not recommended for marine mammal abundance estimation when few recorders are available, or in high sound attenuation environments with relatively low sample sizes. It is hoped that these results lead to more informed management decisions, and therefore, more effective species conservation.
Resumo:
While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.
In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.
By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.
Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.