69 resultados para Stochastic algorithms

em Université de Lausanne, Switzerland


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Combinatorial optimization involves finding an optimal solution in a finite set of options; many everyday life problems are of this kind. However, the number of options grows exponentially with the size of the problem, such that an exhaustive search for the best solution is practically infeasible beyond a certain problem size. When efficient algorithms are not available, a practical approach to obtain an approximate solution to the problem at hand, is to start with an educated guess and gradually refine it until we have a good-enough solution. Roughly speaking, this is how local search heuristics work. These stochastic algorithms navigate the problem search space by iteratively turning the current solution into new candidate solutions, guiding the search towards better solutions. The search performance, therefore, depends on structural aspects of the search space, which in turn depend on the move operator being used to modify solutions. A common way to characterize the search space of a problem is through the study of its fitness landscape, a mathematical object comprising the space of all possible solutions, their value with respect to the optimization objective, and a relationship of neighborhood defined by the move operator. The landscape metaphor is used to explain the search dynamics as a sort of potential function. The concept is indeed similar to that of potential energy surfaces in physical chemistry. Borrowing ideas from that field, we propose to extend to combinatorial landscapes the notion of the inherent network formed by energy minima in energy landscapes. In our case, energy minima are the local optima of the combinatorial problem, and we explore several definitions for the network edges. At first, we perform an exhaustive sampling of local optima basins of attraction, and define weighted transitions between basins by accounting for all the possible ways of crossing the basins frontier via one random move. Then, we reduce the computational burden by only counting the chances of escaping a given basin via random kick moves that start at the local optimum. Finally, we approximate network edges from the search trajectory of simple search heuristics, mining the frequency and inter-arrival time with which the heuristic visits local optima. Through these methodologies, we build a weighted directed graph that provides a synthetic view of the whole landscape, and that we can characterize using the tools of complex networks science. We argue that the network characterization can advance our understanding of the structural and dynamical properties of hard combinatorial landscapes. We apply our approach to prototypical problems such as the Quadratic Assignment Problem, the NK model of rugged landscapes, and the Permutation Flow-shop Scheduling Problem. We show that some network metrics can differentiate problem classes, correlate with problem non-linearity, and predict problem hardness as measured from the performances of trajectory-based local search heuristics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Regulatory gene networks contain generic modules, like those involving feedback loops, which are essential for the regulation of many biological functions (Guido et al. in Nature 439:856-860, 2006). We consider a class of self-regulated genes which are the building blocks of many regulatory gene networks, and study the steady-state distribution of the associated Gillespie algorithm by providing efficient numerical algorithms. We also study a regulatory gene network of interest in gene therapy, using mean-field models with time delays. Convergence of the related time-nonhomogeneous Markov chain is established for a class of linear catalytic networks with feedback loops.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The neutral rate of allelic substitution is analyzed for a class-structured population subject to a stationary stochastic demographic process. The substitution rate is shown to be generally equal to the effective mutation rate, and under overlapping generations it can be expressed as the effective mutation rate in newborns when measured in units of average generation time. With uniform mutation rate across classes the substitution rate reduces to the mutation rate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are far-reaching conceptual similarities between bi-static surface georadar and post-stack, "zero-offset" seismic reflection data, which is expressed in largely identical processing flows. One important difference is, however, that standard deconvolution algorithms routinely used to enhance the vertical resolution of seismic data are notoriously problematic or even detrimental to the overall signal quality when applied to surface georadar data. We have explored various options for alleviating this problem and have tested them on a geologically well-constrained surface georadar dataset. Standard stochastic and direct deterministic deconvolution approaches proved to be largely unsatisfactory. While least-squares-type deterministic deconvolution showed some promise, the inherent uncertainties involved in estimating the source wavelet introduced some artificial "ringiness". In contrast, we found spectral balancing approaches to be effective, practical and robust means for enhancing the vertical resolution of surface georadar data, particularly, but not exclusively, in the uppermost part of the georadar section, which is notoriously plagued by the interference of the direct air- and groundwaves. For the data considered in this study, it can be argued that band-limited spectral blueing may provide somewhat better results than standard band-limited spectral whitening, particularly in the uppermost part of the section affected by the interference of the air- and groundwaves. Interestingly, this finding is consistent with the fact that the amplitude spectrum resulting from least-squares-type deterministic deconvolution is characterized by a systematic enhancement of higher frequencies at the expense of lower frequencies and hence is blue rather than white. It is also consistent with increasing evidence that spectral "blueness" is a seemingly universal, albeit enigmatic, property of the distribution of reflection coefficients in the Earth. Our results therefore indicate that spectral balancing techniques in general and spectral blueing in particular represent simple, yet effective means of enhancing the vertical resolution of surface georadar data and, in many cases, could turn out to be a preferable alternative to standard deconvolution approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: In vitro aggregating brain cell cultures containing all types of brain cells have been shown to be useful for neurotoxicological investigations. The cultures are used for the detection of nervous system-specific effects of compounds by measuring multiple endpoints, including changes in enzyme activities. Concentration-dependent neurotoxicity is determined at several time points. METHODS: A Markov model was set up to describe the dynamics of brain cell populations exposed to potentially neurotoxic compounds. Brain cells were assumed to be either in a healthy or stressed state, with only stressed cells being susceptible to cell death. Cells may have switched between these states or died with concentration-dependent transition rates. Since cell numbers were not directly measurable, intracellular lactate dehydrogenase (LDH) activity was used as a surrogate. Assuming that changes in cell numbers are proportional to changes in intracellular LDH activity, stochastic enzyme activity models were derived. Maximum likelihood and least squares regression techniques were applied for estimation of the transition rates. Likelihood ratio tests were performed to test hypotheses about the transition rates. Simulation studies were used to investigate the performance of the transition rate estimators and to analyze the error rates of the likelihood ratio tests. The stochastic time-concentration activity model was applied to intracellular LDH activity measurements after 7 and 14 days of continuous exposure to propofol. The model describes transitions from healthy to stressed cells and from stressed cells to death. RESULTS: The model predicted that propofol would affect stressed cells more than healthy cells. Increasing propofol concentration from 10 to 100 μM reduced the mean waiting time for transition to the stressed state by 50%, from 14 to 7 days, whereas the mean duration to cellular death reduced more dramatically from 2.7 days to 6.5 hours. CONCLUSION: The proposed stochastic modeling approach can be used to discriminate between different biological hypotheses regarding the effect of a compound on the transition rates. The effects of different compounds on the transition rate estimates can be quantitatively compared. Data can be extrapolated at late measurement time points to investigate whether costs and time-consuming long-term experiments could possibly be eliminated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper presents an approach for mapping of precipitation data. The main goal is to perform spatial predictions and simulations of precipitation fields using geostatistical methods (ordinary kriging, kriging with external drift) as well as machine learning algorithms (neural networks). More practically, the objective is to reproduce simultaneously both the spatial patterns and the extreme values. This objective is best reached by models integrating geostatistics and machine learning algorithms. To demonstrate how such models work, two case studies have been considered: first, a 2-day accumulation of heavy precipitation and second, a 6-day accumulation of extreme orographic precipitation. The first example is used to compare the performance of two optimization algorithms (conjugate gradients and Levenberg-Marquardt) of a neural network for the reproduction of extreme values. Hybrid models, which combine geostatistical and machine learning algorithms, are also treated in this context. The second dataset is used to analyze the contribution of radar Doppler imagery when used as external drift or as input in the models (kriging with external drift and neural networks). Model assessment is carried out by comparing independent validation errors as well as analyzing data patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot