856 resultados para inference algorithms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multicommodity flow (MF) problems have a wide variety of applications in areas such as VLSI circuit design, network design, etc., and are therefore very well studied. The fractional MF problems are polynomial time solvable while integer versions are NP-complete. However, exact algorithms to solve the fractional MF problems have high computational complexity. Therefore approximation algorithms to solve the fractional MF problems have been explored in the literature to reduce their computational complexity. Using these approximation algorithms and the randomized rounding technique, polynomial time approximation algorithms have been explored in the literature. In the design of high-speed networks, such as optical wavelength division multiplexing (WDM) networks, providing survivability carries great significance. Survivability is the ability of the network to recover from failures. It further increases the complexity of network design and presents network designers with more formidable challenges. In this work we formulate the survivable versions of the MF problems. We build approximation algorithms for the survivable multicommodity flow (SMF) problems based on the framework of the approximation algorithms for the MF problems presented in [1] and [2]. We discuss applications of the SMF problems to solve survivable routing in capacitated networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The emergence of Wavelength Division Multiplexing (WDM) technology provides the capability for increasing the bandwidth of Synchronous Optical Network (SONET) rings by grooming low-speed traffic streams onto different high-speed wavelength channels. Since the cost of SONET add-drop multiplexers (SADM) at each node dominates the total cost of these networks, how to assign the wavelength, groom in the traffic and bypass the traffic through the intermediate nodes has received a lot of attention from researchers recently.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships between abundance and environmental covariates. Spatial Poisson process likelihoods can be used to simultaneously estimate detection and intensity parameters by modeling distance sampling data as a thinned spatial point process. A model-based spatial approach to distance sampling data has three main benefits: it allows complex and opportunistic transect designs to be employed, it allows estimation of abundance in small subregions, and it provides a framework to assess the effects of habitat or experimental manipulation on density. We demonstrate the model-based methodology with a small simulation study and analysis of the Dubbo weed data set. In addition, a simple ad hoc method for handling overdispersion is also proposed. The simulation study showed that the model-based approach compared favorably to conventional distance sampling methods for abundance estimation. In addition, the overdispersion correction performed adequately when the number of transects was high. Analysis of the Dubbo data set indicated a transect effect on abundance via Akaike’s information criterion model selection. Further goodness-of-fit analysis, however, indicated some potential confounding of intensity with the detection function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The design of a network is a solution to several engineering and science problems. Several network design problems are known to be NP-hard, and population-based metaheuristics like evolutionary algorithms (EAs) have been largely investigated for such problems. Such optimization methods simultaneously generate a large number of potential solutions to investigate the search space in breadth and, consequently, to avoid local optima. Obtaining a potential solution usually involves the construction and maintenance of several spanning trees, or more generally, spanning forests. To efficiently explore the search space, special data structures have been developed to provide operations that manipulate a set of spanning trees (population). For a tree with n nodes, the most efficient data structures available in the literature require time O(n) to generate a new spanning tree that modifies an existing one and to store the new solution. We propose a new data structure, called node-depth-degree representation (NDDR), and we demonstrate that using this encoding, generating a new spanning forest requires average time O(root n). Experiments with an EA based on NDDR applied to large-scale instances of the degree-constrained minimum spanning tree problem have shown that the implementation adds small constants and lower order terms to the theoretical bound.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Arboviral diseases are major global public health threats. Yet, our understanding of infection risk factors is, with a few exceptions, considerably limited. A crucial shortcoming is the widespread use of analytical methods generally not suited for observational data - particularly null hypothesis-testing (NHT) and step-wise regression (SWR). Using Mayaro virus (MAYV) as a case study, here we compare information theory-based multimodel inference (MMI) with conventional analyses for arboviral infection risk factor assessment. Methodology/Principal Findings: A cross-sectional survey of anti-MAYV antibodies revealed 44% prevalence (n = 270 subjects) in a central Amazon rural settlement. NHT suggested that residents of village-like household clusters and those using closed toilet/latrines were at higher risk, while living in non-village-like areas, using bednets, and owning fowl, pigs or dogs were protective. The "minimum adequate" SWR model retained only residence area and bednet use. Using MMI, we identified relevant covariates, quantified their relative importance, and estimated effect-sizes (beta +/- SE) on which to base inference. Residence area (beta(Village) = 2.93 +/- 0.41; beta(Upland) = -0.56 +/- 0.33, beta(Riverbanks) = -2.37 +/- 0.55) and bednet use (beta = -0.95 +/- 0.28) were the most important factors, followed by crop-plot ownership (beta = 0.39 +/- 0.22) and regular use of a closed toilet/latrine (beta = 0.19 +/- 0.13); domestic animals had insignificant protective effects and were relatively unimportant. The SWR model ranked fifth among the 128 models in the final MMI set. Conclusions/Significance: Our analyses illustrate how MMI can enhance inference on infection risk factors when compared with NHT or SWR. MMI indicates that forest crop-plot workers are likely exposed to typical MAYV cycles maintained by diurnal, forest dwelling vectors; however, MAYV might also be circulating in nocturnal, domestic-peridomestic cycles in village-like areas. This suggests either a vector shift (synanthropic mosquitoes vectoring MAYV) or a habitat/habits shift (classical MAYV vectors adapting to densely populated landscapes and nocturnal biting); any such ecological/adaptive novelty could increase the likelihood of MAYV emergence in Amazonia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diffuse large B-cell lymphoma can be subclassified into at least two molecular subgroups by gene expression profiling: germinal center B-cell like and activated B-cell like diffuse large B-cell lymphoma. Several immunohistological algorithms have been proposed as surrogates to gene expression profiling at the level of protein expression, but their reliability has been an issue of controversy. Furthermore, the proportion of misclassified cases of germinal center B-cell subgroup by immunohistochemistry, in all reported algorithms, is higher compared with germinal center B-cell cases defined by gene expression profiling. We analyzed 424 cases of nodal diffuse large B-cell lymphoma with the panel of markers included in the three previously described algorithms: Hans, Choi, and Tally. To test whether the sensitivity of detecting germinal center B-cell cases could be improved, the germinal center B-cell marker HGAL/GCET2 was also added to all three algorithms. Our results show that the inclusion of HGAL/GCET2 significantly increased the detection of germinal center B-cell cases in all three algorithms (P<0.001). The proportions of germinal center B-cell cases in the original algorithms were 27%, 34%, and 19% for Hans, Choi, and Tally, respectively. In the modified algorithms, with the inclusion of HGAL/GCET2, the frequencies of germinal center B-cell cases were increased to 38%, 48%, and 35%, respectively. Therefore, HGAL/GCET2 protein expression may function as a marker for germinal center B-cell type diffuse large B-cell lymphoma. Consideration should be given to the inclusion of HGAL/GCET2 analysis in algorithms to better predict the cell of origin. These findings bear further validation, from comparison to gene expression profiles and from clinical/therapeutic data. Modern Pathology (2012) 25, 1439-1445; doi: 10.1038/modpathol.2012.119; published online 29 June 2012

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work aimed to apply genetic algorithms (GA) and particle swarm optimization (PSO) in cash balance management using Miller-Orr model, which consists in a stochastic model that does not define a single ideal point for cash balance, but an oscillation range between a lower bound, an ideal balance and an upper bound. Thus, this paper proposes the application of GA and PSO to minimize the Total Cost of cash maintenance, obtaining the parameter of the lower bound of the Miller-Orr model, using for this the assumptions presented in literature. Computational experiments were applied in the development and validation of the models. The results indicated that both the GA and PSO are applicable in determining the cash level from the lower limit, with best results of PSO model, which had not yet been applied in this type of problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Solution of structural reliability problems by the First Order method require optimization algorithms to find the smallest distance between a limit state function and the origin of standard Gaussian space. The Hassofer-Lind-Rackwitz-Fiessler (HLRF) algorithm, developed specifically for this purpose, has been shown to be efficient but not robust, as it fails to converge for a significant number of problems. On the other hand, recent developments in general (augmented Lagrangian) optimization techniques have not been tested in aplication to structural reliability problems. In the present article, three new optimization algorithms for structural reliability analysis are presented. One algorithm is based on the HLRF, but uses a new differentiable merit function with Wolfe conditions to select step length in linear search. It is shown in the article that, under certain assumptions, the proposed algorithm generates a sequence that converges to the local minimizer of the problem. Two new augmented Lagrangian methods are also presented, which use quadratic penalties to solve nonlinear problems with equality constraints. Performance and robustness of the new algorithms is compared to the classic augmented Lagrangian method, to HLRF and to the improved HLRF (iHLRF) algorithms, in the solution of 25 benchmark problems from the literature. The new proposed HLRF algorithm is shown to be more robust than HLRF or iHLRF, and as efficient as the iHLRF algorithm. The two augmented Lagrangian methods proposed herein are shown to be more robust and more efficient than the classical augmented Lagrangian method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is no consensus regarding the accuracy of bioimpedance for the determination of body composition in older persons. This study aimed to compare the assessment of lean body mass of healthy older volunteers obtained by the deuterium dilution method (reference) with those obtained by two frequently used bioelectrical impedance formulas and one formula specifically developed for a Latin-American population. A cross-sectional study. Twenty one volunteers were studied, 12 women, with mean age 72 +/- 6.7 years. Urban community, Ribeiro Preto, Brazil. Fat free mass was determined, simultaneously, by the deuterium dilution method and bioelectrical impedance; results were compared. In bioelectrical impedance, body composition was calculated by the formulas of Deuremberg, Lukaski and Bolonchuck and Valencia et al. Lean body mass of the studied volunteers, as determined by bioelectrical impedance was 37.8 +/- 9.2 kg by the application of the Lukaski e Bolonchuk formula, 37.4 +/- 9.3 kg (Deuremberg) and 43.2 +/- 8.9 kg (Valencia et. al.). The results were significantly correlated to those obtained by the deuterium dilution method (41.6 +/- 9.3 Kg), with r=0.963, 0.932 and 0.971, respectively. Lean body mass obtained by the Valencia formula was the most accurate. In this study, lean body mass of older persons obtained by the bioelectrical impedance method showed good correlation with the values obtained by the deuterium dilution method. The formula of Valencia et al., developed for a Latin-American population, showed the best accuracy.