4 resultados para Distribution transformer modeling
em Duke University
Resumo:
During mitotic cell cycles, DNA experiences many types of endogenous and exogenous damaging agents that could potentially cause double strand breaks (DSB). In S. cerevisiae, DSBs are primarily repaired by mitotic recombination and as a result, could lead to loss-of-heterozygosity (LOH). Genetic recombination can happen in both meiosis and mitosis. While genome-wide distribution of meiotic recombination events has been intensively studied, mitotic recombination events have not been mapped unbiasedly throughout the genome until recently. Methods for selecting mitotic crossovers and mapping the positions of crossovers have recently been developed in our lab. Our current approach uses a diploid yeast strain that is heterozygous for about 55,000 SNPs, and employs SNP-Microarrays to map LOH events throughout the genome. These methods allow us to examine selected crossovers and unselected mitotic recombination events (crossover, noncrossover and BIR) at about 1 kb resolution across the genome. Using this method, we generated maps of spontaneous and UV-induced LOH events. In this study, we explore machine learning and variable selection techniques to build a predictive model for where the LOH events occur in the genome.
Randomly from the yeast genome, we simulated control tracts resembling the LOH tracts in terms of tract lengths and locations with respect to single-nucleotide-polymorphism positions. We then extracted roughly 1,100 features such as base compositions, histone modifications, presence of tandem repeats etc. and train classifiers to distinguish control tracts and LOH tracts. We found interesting features of good predictive values. We also found that with the current repertoire of features, the prediction is generally better for spontaneous LOH events than UV-induced LOH events.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
Antillean manatees (Trichechus manatus manatus) were heavily hunted in the past throughout the Wider Caribbean Region (WCR), and are currently listed as endangered on the IUCN Red List of Threatened Species. In most WCR countries, including Haiti and the Dominican Republic, remaining manatee populations are believed to be small and declining, but current information is needed on their status, distribution, and local threats to the species.
To assess the past and current distribution and conservation status of the Antillean manatee in Hispaniola, I conducted a systematic review of documentary archives dating from the pre-Columbian era to 2013. I then surveyed more than 670 artisanal fishers from Haiti and the Dominican Republic in 2013-2014 using a standardized questionnaire. Finally, to identify important areas for manatees in the Dominican Republic, I developed a country-wide ensemble model of manatee distribution, and compared modeled hotspots with those identified by fishers.
Manatees were historically abundant in Hispaniola, but were hunted for their meat and became relatively rare by the end of the 19th century. The use of manatee body parts diversified with time to include their oil, skin, and bones. Traditional uses for folk medicine and handcrafts persist today in coastal communities in the Dominican Republic. Most threats to Antillean manatees in Hispaniola are anthropogenic in nature, and most mortality is caused by fisheries. I estimated a minimum island-wide annual mortality of approximately 20 animals. To understand the impact of this level of mortality, and to provide a baseline for measuring the success of future conservation actions, the Dominican Republic and Haiti should work together to obtain a reliable estimate of the current population size of manatees in Hispaniola.
In Haiti, the survey of fishers showed a wider distribution range of the species than suggested by the documentary archive review: fishers reported recent manatee sightings in seven of nine coastal departments, and three manatee hotspot areas were identified in the north, central, and south coasts. Thus, the contracted manatee distribution range suggested by the documentary archive review likely reflects a lack of research in Haiti. Both the review and the interviews agreed that manatees no longer occupy freshwater habitats in the country. In general, more dedicated manatee studies are needed in Haiti, employing aerial, land, or boat surveys.
In the Dominican Republic, the documentary archive review and the survey of fishers showed that manatees still occur throughout the country, and occasionally occupy freshwater habitats. Monte Cristi province in the north coast, and Barahona province in the south coast, were identified as focal areas. Sighting reports of manatees decreased from Monte Cristi eastwards to the adjacent province in the Dominican Republic, and westwards into Haiti. Along the north coast of Haiti, the number of manatee sighting and capture reports decreased with increasing distance to Monte Cristi province. There was good agreement among the modeled manatee hotspots, hotspots identified by fishers, and hotspots identified during previous dedicated manatee studies. The concordance of these results suggests that the distribution and patterns of habitat use of manatees in the Dominican Republic have not changed dramatically in over 30 years, and that the remaining manatees exhibit some degree of site fidelity. The ensemble modeling approach used in the present study produced accurate and detailed maps of manatee distribution with minimum data requirements. This modeling strategy is replicable and readily transferable to other countries in the Caribbean or elsewhere with limited data on a species of interest.
The intrinsic value of manatees was stronger for artisanal fishers in the Dominican Republic than in Haiti, and most Dominican fishers showed a positive attitude towards manatee conservation. The Dominican Republic is an upper middle income country with a high Human Development Index. It possesses a legal framework that specifically protects manatees, and has a greater number of marine protected areas, more dedicated manatee studies, and more manatee education and awareness campaigns than Haiti. The constant presence of manatees in specific coastal segments of the Dominican Republic, the perceived decline in the number of manatee captures, and a more conservation-minded public, offer hope for manatee conservation, as non-consumptive uses of manatees become more popular. I recommend a series of conservation actions in the Dominican Republic, including: reducing risks to manatees from harmful fishing gear and watercraft at confirmed manatee hotspots; providing alternative economic alternatives for displaced fishers, and developing responsible ecotourism ventures for manatee watching; improving law enforcement to reduce fisheries-related manatee deaths, stop the illegal trade in manatee body parts, and better protect manatee habitat; and continuing education and awareness campaigns for coastal communities near manatee hotspots.
In contrast, most fishers in Haiti continue to value manatees as a source of food and income, and showed a generally negative attitude towards manatee conservation. Haiti is a low income country with a low Human Development Index. Only a single dedicated manatee study has been conducted in Haiti, and manatees are not officially protected. Positive initiatives for manatees in Haiti include: protected areas declared in 2013 and 2014 that enclose two of the manatee hotspots identified in the present study; and local organizations that are currently working on coastal and marine environmental issues, including research and education on marine mammals. Future conservation efforts for manatees in Haiti should focus on addressing poverty and providing viable economic alternatives for coastal communities. I recommend a community partnership approach for manatee conservation, paired with education and awareness campaigns to inform coastal communities about the conservation situation of manatees in Haiti, and to help change their perceived value. Haiti should also provide legal protection for manatees and their habitat.
Resumo:
While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.
In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.
By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.
Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.