8 resultados para Probabilistic Algorithms

em Duke University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

While molecular and cellular processes are often modeled as stochastic processes, such as Brownian motion, chemical reaction networks and gene regulatory networks, there are few attempts to program a molecular-scale process to physically implement stochastic processes. DNA has been used as a substrate for programming molecular interactions, but its applications are restricted to deterministic functions and unfavorable properties such as slow processing, thermal annealing, aqueous solvents and difficult readout limit them to proof-of-concept purposes. To date, whether there exists a molecular process that can be programmed to implement stochastic processes for practical applications remains unknown.

In this dissertation, a fully specified Resonance Energy Transfer (RET) network between chromophores is accurately fabricated via DNA self-assembly, and the exciton dynamics in the RET network physically implement a stochastic process, specifically a continuous-time Markov chain (CTMC), which has a direct mapping to the physical geometry of the chromophore network. Excited by a light source, a RET network generates random samples in the temporal domain in the form of fluorescence photons which can be detected by a photon detector. The intrinsic sampling distribution of a RET network is derived as a phase-type distribution configured by its CTMC model. The conclusion is that the exciton dynamics in a RET network implement a general and important class of stochastic processes that can be directly and accurately programmed and used for practical applications of photonics and optoelectronics. Different approaches to using RET networks exist with vast potential applications. As an entropy source that can directly generate samples from virtually arbitrary distributions, RET networks can benefit applications that rely on generating random samples such as 1) fluorescent taggants and 2) stochastic computing.

By using RET networks between chromophores to implement fluorescent taggants with temporally coded signatures, the taggant design is not constrained by resolvable dyes and has a significantly larger coding capacity than spectrally or lifetime coded fluorescent taggants. Meanwhile, the taggant detection process becomes highly efficient, and the Maximum Likelihood Estimation (MLE) based taggant identification guarantees high accuracy even with only a few hundred detected photons.

Meanwhile, RET-based sampling units (RSU) can be constructed to accelerate probabilistic algorithms for wide applications in machine learning and data analytics. Because probabilistic algorithms often rely on iteratively sampling from parameterized distributions, they can be inefficient in practice on the deterministic hardware traditional computers use, especially for high-dimensional and complex problems. As an efficient universal sampling unit, the proposed RSU can be integrated into a processor / GPU as specialized functional units or organized as a discrete accelerator to bring substantial speedups and power savings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When subjects must choose repeatedly between two or more alternatives, each of which dispenses reward on a probabilistic basis (two-armed bandit ), their behavior is guided by the two possible outcomes, reward and nonreward. The simplest stochastic choice rule is that the probability of choosing an alternative increases following a reward and decreases following a nonreward (reward following ). We show experimentally and theoretically that animal subjects behave as if the absolute magnitudes of the changes in choice probability caused by reward and nonreward do not depend on the response which produced the reward or nonreward (source independence ), and that the effects of reward and nonreward are in constant ratio under fixed conditions (effect-ratio invariance )--properties that fit the definition of satisficing . Our experimental results are either not predicted by, or are inconsistent with, other theories of free-operant choice such as Bush-Mosteller, molar maximization, momentary maximizing, and melioration (matching).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Proteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.

Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015, Institute of Mathematical Statistics. All rights reserved.In order to use persistence diagrams as a true statistical tool, it would be very useful to have a good notion of mean and variance for a set of diagrams. In [23], Mileyko and his collaborators made the first study of the properties of the Fréchet mean in (Dp, Wp), the space of persistence diagrams equipped with the p-th Wasserstein metric. In particular, they showed that the Fréchet mean of a finite set of diagrams always exists, but is not necessarily unique. The means of a continuously-varying set of diagrams do not themselves (necessarily) vary continuously, which presents obvious problems when trying to extend the Fréchet mean definition to the realm of time-varying persistence diagrams, better known as vineyards. We fix this problem by altering the original definition of Fréchet mean so that it now becomes a probability measure on the set of persistence diagrams; in a nutshell, the mean of a set of diagrams will be a weighted sum of atomic measures, where each atom is itself a persistence diagram determined using a perturbation of the input diagrams. This definition gives for each N a map (Dp)N→ℙ(Dp). We show that this map is Hölder continuous on finite diagrams and thus can be used to build a useful statistic on vineyards.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Scheduling a set of jobs over a collection of machines to optimize a certain quality-of-service measure is one of the most important research topics in both computer science theory and practice. In this thesis, we design algorithms that optimize {\em flow-time} (or delay) of jobs for scheduling problems that arise in a wide range of applications. We consider the classical model of unrelated machine scheduling and resolve several long standing open problems; we introduce new models that capture the novel algorithmic challenges in scheduling jobs in data centers or large clusters; we study the effect of selfish behavior in distributed and decentralized environments; we design algorithms that strive to balance the energy consumption and performance.

The technically interesting aspect of our work is the surprising connections we establish between approximation and online algorithms, economics, game theory, and queuing theory. It is the interplay of ideas from these different areas that lies at the heart of most of the algorithms presented in this thesis.

The main contributions of the thesis can be placed in one of the following categories.

1. Classical Unrelated Machine Scheduling: We give the first polygorithmic approximation algorithms for minimizing the average flow-time and minimizing the maximum flow-time in the offline setting. In the online and non-clairvoyant setting, we design the first non-clairvoyant algorithm for minimizing the weighted flow-time in the resource augmentation model. Our work introduces iterated rounding technique for the offline flow-time optimization, and gives the first framework to analyze non-clairvoyant algorithms for unrelated machines.

2. Polytope Scheduling Problem: To capture the multidimensional nature of the scheduling problems that arise in practice, we introduce Polytope Scheduling Problem (\psp). The \psp problem generalizes almost all classical scheduling models, and also captures hitherto unstudied scheduling problems such as routing multi-commodity flows, routing multicast (video-on-demand) trees, and multi-dimensional resource allocation. We design several competitive algorithms for the \psp problem and its variants for the objectives of minimizing the flow-time and completion time. Our work establishes many interesting connections between scheduling and market equilibrium concepts, fairness and non-clairvoyant scheduling, and queuing theoretic notion of stability and resource augmentation analysis.

3. Energy Efficient Scheduling: We give the first non-clairvoyant algorithm for minimizing the total flow-time + energy in the online and resource augmentation model for the most general setting of unrelated machines.

4. Selfish Scheduling: We study the effect of selfish behavior in scheduling and routing problems. We define a fairness index for scheduling policies called {\em bounded stretch}, and show that for the objective of minimizing the average (weighted) completion time, policies with small stretch lead to equilibrium outcomes with small price of anarchy. Our work gives the first linear/ convex programming duality based framework to bound the price of anarchy for general equilibrium concepts such as coarse correlated equilibrium.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.