880 resultados para Weak Greedy Algorithms


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a two-step pseudo likelihood estimation technique for generalized linear mixed models with the random effects being correlated between groups. The core idea is to deal with the intractable integrals in the likelihood function by multivariate Taylor's approximation. The accuracy of the estimation technique is assessed in a Monte-Carlo study. An application of it with a binary response variable is presented using a real data set on credit defaults from two Swedish banks. Thanks to the use of two-step estimation technique, the proposed algorithm outperforms conventional pseudo likelihood algorithms in terms of computational time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The emergence of wavelength-division multiplexing (WDM) technology provides the capability for increasing the bandwidth of synchronous optical network (SONET) rings by grooming low-speed traffic streams onto different high-speed wavelength channels. Since the cost of SONET add–drop multiplexers (SADM) at each node dominates the total cost of these networks, how to assign the wavelength, groom the traffic, and bypass the traffic through the intermediate nodes has received a lot of attention from researchers recently. Moreover, the traffic pattern of the optical network changes from time to time. How to develop dynamic reconfiguration algorithms for traffic grooming is an important issue. In this paper, two cases (best fit and full fit) for handling reconfigurable SONET over WDM networks are proposed. For each approach, an integer linear programming model and heuristic algorithms (TS-1 and TS-2, based on the tabu search method) are given. The results demonstrate that the TS-1 algorithm can yield better solutions but has a greater running time than the greedy algorithm for the best fit case. For the full fit case, the tabu search heuristic yields competitive results compared with an earlier simulated annealing based method and it is more stable for the dynamic case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present two new constraint qualifications (CQs) that are weaker than the recently introduced relaxed constant positive linear dependence (RCPLD) CQ. RCPLD is based on the assumption that many subsets of the gradients of the active constraints preserve positive linear dependence locally. A major open question was to identify the exact set of gradients whose properties had to be preserved locally and that would still work as a CQ. This is done in the first new CQ, which we call the constant rank of the subspace component (CRSC) CQ. This new CQ also preserves many of the good properties of RCPLD, such as local stability and the validity of an error bound. We also introduce an even weaker CQ, called the constant positive generator (CPG), which can replace RCPLD in the analysis of the global convergence of algorithms. We close this work by extending convergence results of algorithms belonging to all the main classes of nonlinear optimization methods: sequential quadratic programming, augmented Lagrangians, interior point algorithms, and inexact restoration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study quasi-random properties of k-uniform hypergraphs. Our central notion is uniform edge distribution with respect to large vertex sets. We will find several equivalent characterisations of this property and our work can be viewed as an extension of the well known Chung-Graham-Wilson theorem for quasi-random graphs. Moreover, let K(k) be the complete graph on k vertices and M(k) the line graph of the graph of the k-dimensional hypercube. We will show that the pair of graphs (K(k),M(k)) has the property that if the number of copies of both K(k) and M(k) in another graph G are as expected in the random graph of density d, then G is quasi-random (in the sense of the Chung-Graham-Wilson theorem) with density close to d. (C) 2011 Wiley Periodicals, Inc. Random Struct. Alg., 40, 1-38, 2012

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The variability of results from different automated methods of detection and tracking of extratropical cyclones is assessed in order to identify uncertainties related to the choice of method. Fifteen international teams applied their own algorithms to the same dataset - the period 1989-2009 of interim European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERAInterim) data. This experiment is part of the community project Intercomparison of Mid Latitude Storm Diagnostics (IMILAST; see www.proclim.ch/imilast/index.html). The spread of results for cyclone frequency, intensity, life cycle, and track location is presented to illustrate the impact of using different methods. Globally, methods agree well for geographical distribution in large oceanic regions, interannual variability of cyclone numbers, geographical patterns of strong trends, and distribution shape for many life cycle characteristics. In contrast, the largest disparities exist for the total numbers of cyclones, the detection of weak cyclones, and distribution in some densely populated regions. Consistency between methods is better for strong cyclones than for shallow ones. Two case studies of relatively large, intense cyclones reveal that the identification of the most intense part of the life cycle of these events is robust between methods, but considerable differences exist during the development and the dissolution phases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La familia de algoritmos de Boosting son un tipo de técnicas de clasificación y regresión que han demostrado ser muy eficaces en problemas de Visión Computacional. Tal es el caso de los problemas de detección, de seguimiento o bien de reconocimiento de caras, personas, objetos deformables y acciones. El primer y más popular algoritmo de Boosting, AdaBoost, fue concebido para problemas binarios. Desde entonces, muchas han sido las propuestas que han aparecido con objeto de trasladarlo a otros dominios más generales: multiclase, multilabel, con costes, etc. Nuestro interés se centra en extender AdaBoost al terreno de la clasificación multiclase, considerándolo como un primer paso para posteriores ampliaciones. En la presente tesis proponemos dos algoritmos de Boosting para problemas multiclase basados en nuevas derivaciones del concepto margen. El primero de ellos, PIBoost, está concebido para abordar el problema descomponiéndolo en subproblemas binarios. Por un lado, usamos una codificación vectorial para representar etiquetas y, por otro, utilizamos la función de pérdida exponencial multiclase para evaluar las respuestas. Esta codificación produce un conjunto de valores margen que conllevan un rango de penalizaciones en caso de fallo y recompensas en caso de acierto. La optimización iterativa del modelo genera un proceso de Boosting asimétrico cuyos costes dependen del número de etiquetas separadas por cada clasificador débil. De este modo nuestro algoritmo de Boosting tiene en cuenta el desbalanceo debido a las clases a la hora de construir el clasificador. El resultado es un método bien fundamentado que extiende de manera canónica al AdaBoost original. El segundo algoritmo propuesto, BAdaCost, está concebido para problemas multiclase dotados de una matriz de costes. Motivados por los escasos trabajos dedicados a generalizar AdaBoost al terreno multiclase con costes, hemos propuesto un nuevo concepto de margen que, a su vez, permite derivar una función de pérdida adecuada para evaluar costes. Consideramos nuestro algoritmo como la extensión más canónica de AdaBoost para este tipo de problemas, ya que generaliza a los algoritmos SAMME, Cost-Sensitive AdaBoost y PIBoost. Por otro lado, sugerimos un simple procedimiento para calcular matrices de coste adecuadas para mejorar el rendimiento de Boosting a la hora de abordar problemas estándar y problemas con datos desbalanceados. Una serie de experimentos nos sirven para demostrar la efectividad de ambos métodos frente a otros conocidos algoritmos de Boosting multiclase en sus respectivas áreas. En dichos experimentos se usan bases de datos de referencia en el área de Machine Learning, en primer lugar para minimizar errores y en segundo lugar para minimizar costes. Además, hemos podido aplicar BAdaCost con éxito a un proceso de segmentación, un caso particular de problema con datos desbalanceados. Concluimos justificando el horizonte de futuro que encierra el marco de trabajo que presentamos, tanto por su aplicabilidad como por su flexibilidad teórica. Abstract The family of Boosting algorithms represents a type of classification and regression approach that has shown to be very effective in Computer Vision problems. Such is the case of detection, tracking and recognition of faces, people, deformable objects and actions. The first and most popular algorithm, AdaBoost, was introduced in the context of binary classification. Since then, many works have been proposed to extend it to the more general multi-class, multi-label, costsensitive, etc... domains. Our interest is centered in extending AdaBoost to two problems in the multi-class field, considering it a first step for upcoming generalizations. In this dissertation we propose two Boosting algorithms for multi-class classification based on new generalizations of the concept of margin. The first of them, PIBoost, is conceived to tackle the multi-class problem by solving many binary sub-problems. We use a vectorial codification to represent class labels and a multi-class exponential loss function to evaluate classifier responses. This representation produces a set of margin values that provide a range of penalties for failures and rewards for successes. The stagewise optimization of this model introduces an asymmetric Boosting procedure whose costs depend on the number of classes separated by each weak-learner. In this way the Boosting procedure takes into account class imbalances when building the ensemble. The resulting algorithm is a well grounded method that canonically extends the original AdaBoost. The second algorithm proposed, BAdaCost, is conceived for multi-class problems endowed with a cost matrix. Motivated by the few cost-sensitive extensions of AdaBoost to the multi-class field, we propose a new margin that, in turn, yields a new loss function appropriate for evaluating costs. Since BAdaCost generalizes SAMME, Cost-Sensitive AdaBoost and PIBoost algorithms, we consider our algorithm as a canonical extension of AdaBoost to this kind of problems. We additionally suggest a simple procedure to compute cost matrices that improve the performance of Boosting in standard and unbalanced problems. A set of experiments is carried out to demonstrate the effectiveness of both methods against other relevant Boosting algorithms in their respective areas. In the experiments we resort to benchmark data sets used in the Machine Learning community, firstly for minimizing classification errors and secondly for minimizing costs. In addition, we successfully applied BAdaCost to a segmentation task, a particular problem in presence of imbalanced data. We conclude the thesis justifying the horizon of future improvements encompassed in our framework, due to its applicability and theoretical flexibility.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cognitive Radio has been proposed as a key technology to significantly improve spectrum usage in wireless networks by enabling unlicensed users to access unused resource. We present new algorithms that are needed for the implementation of opportunistic scheduling policies that maximize the throughput utilization of resources by secondary users, under maximum interference constraints imposed by existing primary users. Our approach is based on the Belief Propagation (BP) algorithm, which is advantageous due to its simplicity and potential for distributed implementation. We examine convergence properties and evaluate the performance of the proposed BP algorithms via simulations and demonstrate that the results compare favorably with a benchmark greedy strategy. © 2013 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Today, due to globalization of the world the size of data set is increasing, it is necessary to discover the knowledge. The discovery of knowledge can be typically in the form of association rules, classification rules, clustering, discovery of frequent episodes and deviation detection. Fast and accurate classifiers for large databases are an important task in data mining. There is growing evidence that integrating classification and association rules mining, classification approaches based on heuristic, greedy search like decision tree induction. Emerging associative classification algorithms have shown good promises on producing accurate classifiers. In this paper we focus on performance of associative classification and present a parallel model for classifier building. For classifier building some parallel-distributed algorithms have been proposed for decision tree induction but so far no such work has been reported for associative classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is motivated by a practical application observed at a printed circuit board (PCB) manufacturing facility. After assembly, the PCBs (or jobs) are tested in environmental stress screening (ESS) chambers (or batch processing machines) to detect early failures. Several PCBs can be simultaneously tested as long as the total size of all the PCBs in the batch does not violate the chamber capacity. PCBs from different production lines arrive dynamically to a queue in front of a set of identical ESS chambers, where they are grouped into batches for testing. Each line delivers PCBs that vary in size and require different testing (or processing) times. Once a batch is formed, its processing time is the longest processing time among the PCBs in the batch, and its ready time is given by the PCB arriving last to the batch. ESS chambers are expensive and a bottleneck. Consequently, its makespan has to be minimized. ^ A mixed-integer formulation is proposed for the problem under study and compared to a formulation recently published. The proposed formulation is better in terms of the number of decision variables, linear constraints and run time. A procedure to compute the lower bound is proposed. For sparse problems (i.e. when job ready times are dispersed widely), the lower bounds are close to optimum. ^ The problem under study is NP-hard. Consequently, five heuristics, two metaheuristics (i.e. simulated annealing (SA) and greedy randomized adaptive search procedure (GRASP)), and a decomposition approach (i.e. column generation) are proposed—especially to solve problem instances which require prohibitively long run times when a commercial solver is used. Extensive experimental study was conducted to evaluate the different solution approaches based on the solution quality and run time. ^ The decomposition approach improved the lower bounds (or linear relaxation solution) of the mixed-integer formulation. At least one of the proposed heuristic outperforms the Modified Delay heuristic from the literature. For sparse problems, almost all the heuristics report a solution close to optimum. GRASP outperforms SA at a higher computational cost. The proposed approaches are viable to implement as the run time is very short. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report measurements of single- and double-spin asymmetries for W^{±} and Z/γ^{*} boson production in longitudinally polarized p+p collisions at sqrt[s]=510  GeV by the STAR experiment at RHIC. The asymmetries for W^{±} were measured as a function of the decay lepton pseudorapidity, which provides a theoretically clean probe of the proton's polarized quark distributions at the scale of the W mass. The results are compared to theoretical predictions, constrained by polarized deep inelastic scattering measurements, and show a preference for a sizable, positive up antiquark polarization in the range 0.05

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract In this paper, we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer-linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Condensation processes are of key importance in nature and play a fundamental role in chemistry and physics. Owing to size effects at the nanoscale, it is conceptually desired to experimentally probe the dependence of condensate structure on the number of constituents one by one. Here we present an approach to study a condensation process atom-by-atom with the scanning tunnelling microscope, which provides a direct real-space access with atomic precision to the aggregates formed in atomically defined 'quantum boxes'. Our analysis reveals the subtle interplay of competing directional and nondirectional interactions in the emergence of structure and provides unprecedented input for the structural comparison with quantum mechanical models. This approach focuses on-but is not limited to-the model case of xenon condensation and goes significantly beyond the well-established statistical size analysis of clusters in atomic or molecular beams by mass spectrometry.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe and present initial results of a weak lensing survey of nearby (z less than or similar to 0.1) galaxy clusters in the Sloan Digital Sky Survey (SDSS). In this first study, galaxy clusters are selected from the SDSS spectroscopic galaxy cluster catalogs of Miller et al. and Berlind et al. We report a total of seven individual low-redshift cluster weak lensing measurements that include A2048, A1767, A2244, A1066, A2199, and two clusters specifically identified with the C4 algorithm. Our program of weak lensing of nearby galaxy clusters in the SDSS will eventually reach similar to 200 clusters, making it the largest weak lensing survey of individual galaxy clusters to date.