869 resultados para Probabilistic Algorithms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design of a network is a solution to several engineering and science problems. Several network design problems are known to be NP-hard, and population-based metaheuristics like evolutionary algorithms (EAs) have been largely investigated for such problems. Such optimization methods simultaneously generate a large number of potential solutions to investigate the search space in breadth and, consequently, to avoid local optima. Obtaining a potential solution usually involves the construction and maintenance of several spanning trees, or more generally, spanning forests. To efficiently explore the search space, special data structures have been developed to provide operations that manipulate a set of spanning trees (population). For a tree with n nodes, the most efficient data structures available in the literature require time O(n) to generate a new spanning tree that modifies an existing one and to store the new solution. We propose a new data structure, called node-depth-degree representation (NDDR), and we demonstrate that using this encoding, generating a new spanning forest requires average time O(root n). Experiments with an EA based on NDDR applied to large-scale instances of the degree-constrained minimum spanning tree problem have shown that the implementation adds small constants and lower order terms to the theoretical bound.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diffuse large B-cell lymphoma can be subclassified into at least two molecular subgroups by gene expression profiling: germinal center B-cell like and activated B-cell like diffuse large B-cell lymphoma. Several immunohistological algorithms have been proposed as surrogates to gene expression profiling at the level of protein expression, but their reliability has been an issue of controversy. Furthermore, the proportion of misclassified cases of germinal center B-cell subgroup by immunohistochemistry, in all reported algorithms, is higher compared with germinal center B-cell cases defined by gene expression profiling. We analyzed 424 cases of nodal diffuse large B-cell lymphoma with the panel of markers included in the three previously described algorithms: Hans, Choi, and Tally. To test whether the sensitivity of detecting germinal center B-cell cases could be improved, the germinal center B-cell marker HGAL/GCET2 was also added to all three algorithms. Our results show that the inclusion of HGAL/GCET2 significantly increased the detection of germinal center B-cell cases in all three algorithms (P<0.001). The proportions of germinal center B-cell cases in the original algorithms were 27%, 34%, and 19% for Hans, Choi, and Tally, respectively. In the modified algorithms, with the inclusion of HGAL/GCET2, the frequencies of germinal center B-cell cases were increased to 38%, 48%, and 35%, respectively. Therefore, HGAL/GCET2 protein expression may function as a marker for germinal center B-cell type diffuse large B-cell lymphoma. Consideration should be given to the inclusion of HGAL/GCET2 analysis in algorithms to better predict the cell of origin. These findings bear further validation, from comparison to gene expression profiles and from clinical/therapeutic data. Modern Pathology (2012) 25, 1439-1445; doi: 10.1038/modpathol.2012.119; published online 29 June 2012

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Patterns of species interactions affect the dynamics of food webs. An important component of species interactions that is rarely considered with respect to food webs is the strengths of interactions, which may affect both structure and dynamics. In natural systems, these strengths are variable, and can be quantified as probability distributions. We examined how variation in strengths of interactions can be described hierarchically, and how this variation impacts the structure of species interactions in predator-prey networks, both of which are important components of ecological food webs. The stable isotope ratios of predator and prey species may be particularly useful for quantifying this variability, and we show how these data can be used to build probabilistic predator-prey networks. Moreover, the distribution of variation in strengths among interactions can be estimated from a limited number of observations. This distribution informs network structure, especially the key role of dietary specialization, which may be useful for predicting structural properties in systems that are difficult to observe. Finally, using three mammalian predator-prey networks ( two African and one Canadian) quantified from stable isotope data, we show that exclusion of link-strength variability results in biased estimates of nestedness and modularity within food webs, whereas the inclusion of body size constraints only marginally increases the predictive accuracy of the isotope-based network. We find that modularity is the consequence of strong link-strengths in both African systems, while nestedness is not significantly present in any of the three predator-prey networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work aimed to apply genetic algorithms (GA) and particle swarm optimization (PSO) in cash balance management using Miller-Orr model, which consists in a stochastic model that does not define a single ideal point for cash balance, but an oscillation range between a lower bound, an ideal balance and an upper bound. Thus, this paper proposes the application of GA and PSO to minimize the Total Cost of cash maintenance, obtaining the parameter of the lower bound of the Miller-Orr model, using for this the assumptions presented in literature. Computational experiments were applied in the development and validation of the models. The results indicated that both the GA and PSO are applicable in determining the cash level from the lower limit, with best results of PSO model, which had not yet been applied in this type of problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Solution of structural reliability problems by the First Order method require optimization algorithms to find the smallest distance between a limit state function and the origin of standard Gaussian space. The Hassofer-Lind-Rackwitz-Fiessler (HLRF) algorithm, developed specifically for this purpose, has been shown to be efficient but not robust, as it fails to converge for a significant number of problems. On the other hand, recent developments in general (augmented Lagrangian) optimization techniques have not been tested in aplication to structural reliability problems. In the present article, three new optimization algorithms for structural reliability analysis are presented. One algorithm is based on the HLRF, but uses a new differentiable merit function with Wolfe conditions to select step length in linear search. It is shown in the article that, under certain assumptions, the proposed algorithm generates a sequence that converges to the local minimizer of the problem. Two new augmented Lagrangian methods are also presented, which use quadratic penalties to solve nonlinear problems with equality constraints. Performance and robustness of the new algorithms is compared to the classic augmented Lagrangian method, to HLRF and to the improved HLRF (iHLRF) algorithms, in the solution of 25 benchmark problems from the literature. The new proposed HLRF algorithm is shown to be more robust than HLRF or iHLRF, and as efficient as the iHLRF algorithm. The two augmented Lagrangian methods proposed herein are shown to be more robust and more efficient than the classical augmented Lagrangian method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fraud is a global problem that has required more attention due to an accentuated expansion of modern technology and communication. When statistical techniques are used to detect fraud, whether a fraud detection model is accurate enough in order to provide correct classification of the case as a fraudulent or legitimate is a critical factor. In this context, the concept of bootstrap aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the adjusted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper, for the first time, we aim to present a pioneer study of the performance of the discrete and continuous k-dependence probabilistic networks within the context of bagging predictors classification. Via a large simulation study and various real datasets, we discovered that the probabilistic networks are a strong modeling option with high predictive capacity and with a high increment using the bagging procedure when compared to traditional techniques. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Structural durability is an important criterion that must be evaluated for every type of structure. Concerning reinforced concrete members, chloride diffusion process is widely used to evaluate durability, especially when these structures are constructed in aggressive atmospheres. The chloride ingress triggers the corrosion of reinforcements; therefore, by modelling this phenomenon, the corrosion process can be better evaluated as well as the structural durability. The corrosion begins when a threshold level of chloride concentration is reached at the steel bars of reinforcements. Despite the robustness of several models proposed in literature, deterministic approaches fail to predict accurately the corrosion time initiation due the inherent randomness observed in this process. In this regard, structural durability can be more realistically represented using probabilistic approaches. This paper addresses the analyses of probabilistic corrosion time initiation in reinforced concrete structures exposed to chloride penetration. The chloride penetration is modelled using the Fick's diffusion law. This law simulates the chloride diffusion process considering time-dependent effects. The probability of failure is calculated using Monte Carlo simulation and the first order reliability method, with a direct coupling approach. Some examples are considered in order to study these phenomena. Moreover, a simplified method is proposed to determine optimal values for concrete cover.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Semi-qualitative probabilistic networks (SQPNs) merge two important graphical model formalisms: Bayesian networks and qualitative probabilistic networks. They provade a very Complexity of inferences in polytree-shaped semi-qualitative probabilistic networks and qualitative probabilistic networks. They provide a very general modeling framework by allowing the combination of numeric and qualitative assessments over a discrete domain, and can be compactly encoded by exploiting the same factorization of joint probability distributions that are behind the bayesian networks. This paper explores the computational complexity of semi-qualitative probabilistic networks, and takes the polytree-shaped networks as its main target. We show that the inference problem is coNP-Complete for binary polytrees with multiple observed nodes. We also show that interferences can be performed in time linear in the number of nodes if there is a single observed node. Because our proof is construtive, we obtain an efficient linear time algorithm for SQPNs under such assumptions. To the best of our knowledge, this is the first exact polynominal-time algorithm for SQPn. Together these results provide a clear picture of the inferential complexity in polytree-shaped SQPNs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The spectral reflectance of the sea surface recorded using ocean colour satellite sensors has been used to estimate chlorophyll-a concentrations for decades. However, in bio-optically complex coastal waters, these estimates are compromised by the presence of several other coloured components besides chlorophyll, especially in regions affected by low-salinity waters. The present work aims to (a) describe the influence of the freshwater plume from the La Plata River on the variability of in situ remote sensing reflectance and (b) evaluate the performance of operational ocean colour chlorophyll algorithms applied to Southwestern Atlantic waters, which receive a remarkable seasonal contribution from La Plata River discharges. Data from three oceanographic cruises are used, in addition to a historical regional bio-optical dataset. Deviations found between measured and estimated concentrations of chlorophyll-a are examined in relation to surface water salinity and turbidity gradients to investigate the source of errors in satellite estimates of pigment concentrations. We observed significant seasonal variability in surface reflectance properties that are strongly driven by La Plata River plume dynamics and arise from the presence of high levels of inorganic suspended solids and coloured dissolved materials. As expected, existing operational algorithms overestimate the concentration of chlorophyll-a, especially in waters of low salinity (S<33.5) and high turbidity (Rrs(670)>0.0012 sr−1). Additionally, an updated version of the regional algorithm is presented, which clearly improves the chlorophyll estimation in those types of coastal environment. In general, the techniques presented here allow us to directly distinguish the bio-optical types of waters to be considered in algorithm studies by the ocean colour community.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel kinematic structures are considered very adequate architectures for positioning and orienti ng the tools of robotic mechanisms. However, developing dynamic models for this kind of systems is sometimes a difficult task. In fact, the direct application of traditional methods of robotics, for modelling and analysing such systems, usually does not lead to efficient and systematic algorithms. This work addre sses this issue: to present a modular approach to generate the dynamic model and through some convenient modifications, how we can make these methods more applicable to parallel structures as well. Kane’s formulati on to obtain the dynamic equations is shown to be one of the easiest ways to deal with redundant coordinates and kinematic constraints, so that a suitable c hoice of a set of coordinates allows the remaining of the modelling procedure to be computer aided. The advantages of this approach are discussed in the modelling of a 3-dof parallel asymmetric mechanisms.