995 resultados para Rejection-sampling Algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the fundamental machine learning tasks is that of predictive classification. Given that organisations collect an ever increasing amount of data, predictive classification methods must be able to effectively and efficiently handle large amounts of data. However, it is understood that present requirements push existing algorithms to, and sometimes beyond, their limits since many classification prediction algorithms were designed when currently common data set sizes were beyond imagination. This has led to a significant amount of research into ways of making classification learning algorithms more effective and efficient. Although substantial progress has been made, a number of key questions have not been answered. This dissertation investigates two of these key questions. The first is whether different types of algorithms to those currently employed are required when using large data sets. This is answered by analysis of the way in which the bias plus variance decomposition of predictive classification error changes as training set size is increased. Experiments find that larger training sets require different types of algorithms to those currently used. Some insight into the characteristics of suitable algorithms is provided, and this may provide some direction for the development of future classification prediction algorithms which are specifically designed for use with large data sets. The second question investigated is that of the role of sampling in machine learning with large data sets. Sampling has long been used as a means of avoiding the need to scale up algorithms to suit the size of the data set by scaling down the size of the data sets to suit the algorithm. However, the costs of performing sampling have not been widely explored. Two popular sampling methods are compared with learning from all available data in terms of predictive accuracy, model complexity, and execution time. The comparison shows that sub-sampling generally products models with accuracy close to, and sometimes greater than, that obtainable from learning with all available data. This result suggests that it may be possible to develop algorithms that take advantage of the sub-sampling methodology to reduce the time required to infer a model while sacrificing little if any accuracy. Methods of improving effective and efficient learning via sampling are also investigated, and now sampling methodologies proposed. These methodologies include using a varying-proportion of instances to determine the next inference step and using a statistical calculation at each inference step to determine sufficient sample size. Experiments show that using a statistical calculation of sample size can not only substantially reduce execution time but can do so with only a small loss, and occasional gain, in accuracy. One of the common uses of sampling is in the construction of learning curves. Learning curves are often used to attempt to determine the optimal training size which will maximally reduce execution time while nut being detrimental to accuracy. An analysis of the performance of methods for detection of convergence of learning curves is performed, with the focus of the analysis on methods that calculate the gradient, of the tangent to the curve. Given that such methods can be susceptible to local accuracy plateaus, an investigation into the frequency of local plateaus is also performed. It is shown that local accuracy plateaus are a common occurrence, and that ensuring a small loss of accuracy often results in greater computational cost than learning from all available data. These results cast doubt over the applicability of gradient of tangent methods for detecting convergence, and of the viability of learning curves for reducing execution time in general.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important problem in designing RFIC in CMOS technology is the parasitic elements of passive and active devices that complicate design calculations. This article presents three LNA topologies including cascode, folded cascade, and differential cascode and then introduces image rejection filters for low-side and high-side injection. Then, a new method for design and optimization of the circuits based on a Pareto-based multiobjective genetic algorithm is proposed. A set of optimum device values and dimensions that best match design specifications are obtained. The optimization method is layout aware, parasitic aware, and simulation based. Circuit simulations are carried out based on TSMC 0.18 um CMOS technology by using Hspice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background
Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.

Results
One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.

Conclusion
The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a technique for improving the performance of parallel genetic algorithms on multi-modal numerical optimisation problems. It employs a cluster analysis algorithm to identify regions of the search space in which more than one sub-population is sampling. Overlapping clusters are merged in one sub-population whilst a simple derating function is applied to samples in all other sub-populations to discourage them from further sampling in that region. This approach leads to a better distribution of the search effort across multiple subpopulations and helps to prevent premature convergence. On the test problems used, significant performance improvements over the traditional island model implementation are realised.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Finite Element (FE) model updating has been attracting research attentions in structural engineering fields for over 20 years. Its immense importance to the design, construction and maintenance of civil and mechanical structures has been highly recognised. However, many sources of uncertainties may affect the updating results. These uncertainties may be caused by FE modelling errors, measurement noises, signal processing techniques, and so on. Therefore, research efforts on model updating have been focusing on tackling with uncertainties for a long time. Recently, a new type of evolutionary algorithms has been developed to address uncertainty problems, known as Estimation of Distribution Algorithms (EDAs). EDAs are evolutionary algorithms based on estimation and sampling from probabilistic models and able to overcome some of the drawbacks exhibited by traditional genetic algorithms (GAs). In this paper, a numerical steel simple beam is constructed in commercial software ANSYS. The various damage scenarios are simulated and EDAs are employed to identify damages via FE model updating process. The results show that the performances of EDAs for model updating are efficient and reliable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, researches have shown that the performance of metaheuristics can be affected by population initialization. Opposition-based Differential Evolution (ODE), Quasi-Oppositional Differential Evolution (QODE), and Uniform-Quasi-Opposition Differential Evolution (UQODE) are three state-of-the-art methods that improve the performance of the Differential Evolution algorithm based on population initialization and different search strategies. In a different approach to achieve similar results, this paper presents a technique to discover promising regions in a continuous search-space of an optimization problem. Using machine-learning techniques, the algorithm named Smart Sampling (SS) finds regions with high possibility of containing a global optimum. Next, a metaheuristic can be initialized inside each region to find that optimum. SS and DE were combined (originating the SSDE algorithm) to evaluate our approach, and experiments were conducted in the same set of benchmark functions used by ODE, QODE and UQODE authors. Results have shown that the total number of function evaluations required by DE to reach the global optimum can be significantly reduced and that the success rate improves if SS is employed first. Such results are also in consonance with results from the literature, stating the importance of an adequate starting population. Moreover, SS presents better efficacy to find initial populations of superior quality when compared to the other three algorithms that employ oppositional learning. Finally and most important, the SS performance in finding promising regions is independent of the employed metaheuristic with which SS is combined, making SS suitable to improve the performance of a large variety of optimization techniques. (C) 2012 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An extensive sample (2%) of private vehicles in Italy are equipped with a GPS device that periodically measures their position and dynamical state for insurance purposes. Having access to this type of data allows to develop theoretical and practical applications of great interest: the real-time reconstruction of traffic state in a certain region, the development of accurate models of vehicle dynamics, the study of the cognitive dynamics of drivers. In order for these applications to be possible, we first need to develop the ability to reconstruct the paths taken by vehicles on the road network from the raw GPS data. In fact, these data are affected by positioning errors and they are often very distanced from each other (~2 Km). For these reasons, the task of path identification is not straightforward. This thesis describes the approach we followed to reliably identify vehicle paths from this kind of low-sampling data. The problem of matching data with roads is solved with a bayesian approach of maximum likelihood. While the identification of the path taken between two consecutive GPS measures is performed with a specifically developed optimal routing algorithm, based on A* algorithm. The procedure was applied on an off-line urban data sample and proved to be robust and accurate. Future developments will extend the procedure to real-time execution and nation-wide coverage.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this study a new, fully non-linear, approach to Local Earthquake Tomography is presented. Local Earthquakes Tomography (LET) is a non-linear inversion problem that allows the joint determination of earthquakes parameters and velocity structure from arrival times of waves generated by local sources. Since the early developments of seismic tomography several inversion methods have been developed to solve this problem in a linearized way. In the framework of Monte Carlo sampling, we developed a new code based on the Reversible Jump Markov Chain Monte Carlo sampling method (Rj-McMc). It is a trans-dimensional approach in which the number of unknowns, and thus the model parameterization, is treated as one of the unknowns. I show that our new code allows overcoming major limitations of linearized tomography, opening a new perspective in seismic imaging. Synthetic tests demonstrate that our algorithm is able to produce a robust and reliable tomography without the need to make subjective a-priori assumptions about starting models and parameterization. Moreover it provides a more accurate estimate of uncertainties about the model parameters. Therefore, it is very suitable for investigating the velocity structure in regions that lack of accurate a-priori information. Synthetic tests also reveal that the lack of any regularization constraints allows extracting more information from the observed data and that the velocity structure can be detected also in regions where the density of rays is low and standard linearized codes fails. I also present high-resolution Vp and Vp/Vs models in two widespread investigated regions: the Parkfield segment of the San Andreas Fault (California, USA) and the area around the Alto Tiberina fault (Umbria-Marche, Italy). In both the cases, the models obtained with our code show a substantial improvement in the data fit, if compared with the models obtained from the same data set with the linearized inversion codes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of my thesis is to parallelize the Weighting Histogram Analysis Method (WHAM), which is a popular algorithm used to calculate the Free Energy of a molucular system in Molecular Dynamics simulations. WHAM works in post processing in cooperation with another algorithm called Umbrella Sampling. Umbrella Sampling has the purpose to add a biasing in the potential energy of the system in order to force the system to sample a specific region in the configurational space. Several N independent simulations are performed in order to sample all the region of interest. Subsequently, the WHAM algorithm is used to estimate the original system energy starting from the N atomic trajectories. The parallelization of WHAM has been performed through CUDA, a language that allows to work in GPUs of NVIDIA graphic cards, which have a parallel achitecture. The parallel implementation may sensibly speed up the WHAM execution compared to previous serial CPU imlementations. However, the WHAM CPU code presents some temporal criticalities to very high numbers of interactions. The algorithm has been written in C++ and executed in UNIX systems provided with NVIDIA graphic cards. The results were satisfying obtaining an increase of performances when the model was executed on graphics cards with compute capability greater. Nonetheless, the GPUs used to test the algorithm is quite old and not designated for scientific calculations. It is likely that a further performance increase will be obtained if the algorithm would be executed in clusters of GPU at high level of computational efficiency. The thesis is organized in the following way: I will first describe the mathematical formulation of Umbrella Sampling and WHAM algorithm with their apllications in the study of ionic channels and in Molecular Docking (Chapter 1); then, I will present the CUDA architectures used to implement the model (Chapter 2); and finally, the results obtained on model systems will be presented (Chapter 3).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new control scheme has been presented in this thesis. Based on the NonLinear Geometric Approach, the proposed Active Control System represents a new way to see the reconfigurable controllers for aerospace applications. The presence of the Diagnosis module (providing the estimation of generic signals which, based on the case, can be faults, disturbances or system parameters), mean feature of the depicted Active Control System, is a characteristic shared by three well known control systems: the Active Fault Tolerant Controls, the Indirect Adaptive Controls and the Active Disturbance Rejection Controls. The standard NonLinear Geometric Approach (NLGA) has been accurately investigated and than improved to extend its applicability to more complex models. The standard NLGA procedure has been modified to take account of feasible and estimable sets of unknown signals. Furthermore the application of the Singular Perturbations approximation has led to the solution of Detection and Isolation problems in scenarios too complex to be solved by the standard NLGA. Also the estimation process has been improved, where multiple redundant measuremtent are available, by the introduction of a new algorithm, here called "Least Squares - Sliding Mode". It guarantees optimality, in the sense of the least squares, and finite estimation time, in the sense of the sliding mode. The Active Control System concept has been formalized in two controller: a nonlinear backstepping controller and a nonlinear composite controller. Particularly interesting is the integration, in the controller design, of the estimations coming from the Diagnosis module. Stability proofs are provided for both the control schemes. Finally, different applications in aerospace have been provided to show the applicability and the effectiveness of the proposed NLGA-based Active Control System.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Advanced Very High Resolution Radiometer (AVHRR) carried on board the National Oceanic and Atmospheric Administration (NOAA) and the Meteorological Operational Satellite (MetOp) polar orbiting satellites is the only instrument offering more than 25 years of satellite data to analyse aerosols on a daily basis. The present study assessed a modified AVHRR aerosol optical depth τa retrieval over land for Europe. The algorithm might also be applied to other parts of the world with similar surface characteristics like Europe, only the aerosol properties would have to be adapted to a new region. The initial approach used a relationship between Sun photometer measurements from the Aerosol Robotic Network (AERONET) and the satellite data to post-process the retrieved τa. Herein a quasi-stand-alone procedure, which is more suitable for the pre-AERONET era, is presented. In addition, the estimation of surface reflectance, the aerosol model, and other processing steps have been adapted. The method's cross-platform applicability was tested by validating τa from NOAA-17 and NOAA-18 AVHRR at 15 AERONET sites in Central Europe (40.5° N–50° N, 0° E–17° E) from August 2005 to December 2007. Furthermore, the accuracy of the AVHRR retrieval was related to products from two newer instruments, the Medium Resolution Imaging Spectrometer (MERIS) on board the Environmental Satellite (ENVISAT) and the Moderate Resolution Imaging Spectroradiometer (MODIS) on board Aqua/Terra. Considering the linear correlation coefficient R, the AVHRR results were similar to those of MERIS with even lower root mean square error RMSE. Not surprisingly, MODIS, with its high spectral coverage, gave the highest R and lowest RMSE. Regarding monthly averaged τa, the results were ambiguous. Focusing on small-scale structures, R was reduced for all sensors, whereas the RMSE solely for MERIS substantially increased. Regarding larger areas like Central Europe, the error statistics were similar to the individual match-ups. This was mainly explained with sampling issues. With the successful validation of AVHRR we are now able to concentrate on our large data archive dating back to 1985. This is a unique opportunity for both climate and air pollution studies over land surfaces.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this dissertation, the problem of creating effective large scale Adaptive Optics (AO) systems control algorithms for the new generation of giant optical telescopes is addressed. The effectiveness of AO control algorithms is evaluated in several respects, such as computational complexity, compensation error rejection and robustness, i.e. reasonable insensitivity to the system imperfections. The results of this research are summarized as follows: 1. Robustness study of Sparse Minimum Variance Pseudo Open Loop Controller (POLC) for multi-conjugate adaptive optics (MCAO). The AO system model that accounts for various system errors has been developed and applied to check the stability and performance of the POLC algorithm, which is one of the most promising approaches for the future AO systems control. It has been shown through numerous simulations that, despite the initial assumption that the exact system knowledge is necessary for the POLC algorithm to work, it is highly robust against various system errors. 2. Predictive Kalman Filter (KF) and Minimum Variance (MV) control algorithms for MCAO. The limiting performance of the non-dynamic Minimum Variance and dynamic KF-based phase estimation algorithms for MCAO has been evaluated by doing Monte-Carlo simulations. The validity of simple near-Markov autoregressive phase dynamics model has been tested and its adequate ability to predict the turbulence phase has been demonstrated both for single- and multiconjugate AO. It has also been shown that there is no performance improvement gained from the use of the more complicated KF approach in comparison to the much simpler MV algorithm in the case of MCAO. 3. Sparse predictive Minimum Variance control algorithm for MCAO. The temporal prediction stage has been added to the non-dynamic MV control algorithm in such a way that no additional computational burden is introduced. It has been confirmed through simulations that the use of phase prediction makes it possible to significantly reduce the system sampling rate and thus overall computational complexity while both maintaining the system stable and effectively compensating for the measurement and control latencies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The problem of re-sampling spatially distributed data organized into regular or irregular grids to finer or coarser resolution is a common task in data processing. This procedure is known as 'gridding' or 're-binning'. Depending on the quantity the data represents, the gridding-algorithm has to meet different requirements. For example, histogrammed physical quantities such as mass or energy have to be re-binned in order to conserve the overall integral. Moreover, if the quantity is positive definite, negative sampling values should be avoided. The gridding process requires a re-distribution of the original data set to a user-requested grid according to a distribution function. The distribution function can be determined on the basis of the given data by interpolation methods. In general, accurate interpolation with respect to multiple boundary conditions of heavily fluctuating data requires polynomial interpolation functions of second or even higher order. However, this may result in unrealistic deviations (overshoots or undershoots) of the interpolation function from the data. Accordingly, the re-sampled data may overestimate or underestimate the given data by a significant amount. The gridding-algorithm presented in this work was developed in order to overcome these problems. Instead of a straightforward interpolation of the given data using high-order polynomials, a parametrized Hermitian interpolation curve was used to approximate the integrated data set. A single parameter is determined by which the user can control the behavior of the interpolation function, i.e. the amount of overshoot and undershoot. Furthermore, it is shown how the algorithm can be extended to multidimensional grids. The algorithm was compared to commonly used gridding-algorithms using linear and cubic interpolation functions. It is shown that such interpolation functions may overestimate or underestimate the source data by about 10-20%, while the new algorithm can be tuned to significantly reduce these interpolation errors. The accuracy of the new algorithm was tested on a series of x-ray CT-images (head and neck, lung, pelvis). The new algorithm significantly improves the accuracy of the sampled images in terms of the mean square error and a quality index introduced by Wang and Bovik (2002 IEEE Signal Process. Lett. 9 81-4).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: Development of an interpolation algorithm for re‐sampling spatially distributed CT‐data with the following features: global and local integral conservation, avoidance of negative interpolation values for positively defined datasets and the ability to control re‐sampling artifacts. Method and Materials: The interpolation can be separated into two steps: first, the discrete CT‐data has to be continuously distributed by an analytic function considering the boundary conditions. Generally, this function is determined by piecewise interpolation. Instead of using linear or high order polynomialinterpolations, which do not fulfill all the above mentioned features, a special form of Hermitian curve interpolation is used to solve the interpolation problem with respect to the required boundary conditions. A single parameter is determined, by which the behavior of the interpolation function is controlled. Second, the interpolated data have to be re‐distributed with respect to the requested grid. Results: The new algorithm was compared with commonly used interpolation functions based on linear and second order polynomial. It is demonstrated that these interpolation functions may over‐ or underestimate the source data by about 10%–20% while the parameter of the new algorithm can be adjusted in order to significantly reduce these interpolation errors. Finally, the performance and accuracy of the algorithm was tested by re‐gridding a series of X‐ray CT‐images. Conclusion: Inaccurate sampling values may occur due to the lack of integral conservation. Re‐sampling algorithms using high order polynomialinterpolation functions may result in significant artifacts of the re‐sampled data. Such artifacts can be avoided by using the new algorithm based on Hermitian curve interpolation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article we propose an exact efficient simulation algorithm for the generalized von Mises circular distribution of order two. It is an acceptance-rejection algorithm with a piecewise linear envelope based on the local extrema and the inflexion points of the generalized von Mises density of order two. We show that these points can be obtained from the roots of polynomials and degrees four and eight, which can be easily obtained by the methods of Ferrari and Weierstrass. A comparative study with the von Neumann acceptance-rejection, with the ratio-of-uniforms and with a Markov chain Monte Carlo algorithms shows that this new method is generally the most efficient.