63 resultados para CLUSTERING PROBLEM

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

60.00% 60.00%

Publicador:

Resumo:

A novel global optimization method based on an Augmented Lagrangian framework is introduced for continuous constrained nonlinear optimization problems. At each outer iteration k the method requires the epsilon(k)-global minimization of the Augmented Lagrangian with simple constraints, where epsilon(k) -> epsilon. Global convergence to an epsilon-global minimizer of the original problem is proved. The subproblems are solved using the alpha BB method. Numerical experiments are presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall`s indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the capacitated lot sizing problem (CLSP) with a single stage composed of multiple plants, items and periods with setup carry-over among the periods. The CLSP is well studied and many heuristics have been proposed to solve it. Nevertheless, few researches explored the multi-plant capacitated lot sizing problem (MPCLSP), which means that few solution methods were proposed to solve it. Furthermore, to our knowledge, no study of the MPCLSP with setup carry-over was found in the literature. This paper presents a mathematical model and a GRASP (Greedy Randomized Adaptive Search Procedure) with path relinking to the MPCLSP with setup carry-over. This solution method is an extension and adaptation of a previously adopted methodology without the setup carry-over. Computational tests showed that the improvement of the setup carry-over is significant in terms of the solution value with a low increase in computational time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: Work disability is a major consequence of rheumatoid arthritis (RA), associated not only with traditional disease activity variables, but also more significantly with demographic, functional, occupational, and societal variables. Recent reports suggest that the use of biologic agents offers potential for reduced work disability rates, but the conclusions are based on surrogate disease activity measures derived from studies primarily from Western countries. Methods: The Quantitative Standard Monitoring of Patients with RA (QUEST-RA) multinational database of 8,039 patients in 86 sites in 32 countries, 16 with high gross domestic product (GDP) (>24K US dollars (USD) per capita) and 16 low-GDP countries (<11K USD), was analyzed for work and disability status at onset and over the course of RA and clinical status of patients who continued working or had stopped working in high-GDP versus low-GDP countries according to all RA Core Data Set measures. Associations of work disability status with RA Core Data Set variables and indices were analyzed using descriptive statistics and regression analyses. Results: At the time of first symptoms, 86% of men (range 57%-100% among countries) and 64% (19%-87%) of women <65 years were working. More than one third (37%) of these patients reported subsequent work disability because of RA. Among 1,756 patients whose symptoms had begun during the 2000s, the probabilities of continuing to work were 80% (95% confidence interval (CI) 78%-82%) at 2 years and 68% (95% CI 65%-71%) at 5 years, with similar patterns in high-GDP and low-GDP countries. Patients who continued working versus stopped working had significantly better clinical status for all clinical status measures and patient self-report scores, with similar patterns in high-GDP and low-GDP countries. However, patients who had stopped working in high-GDP countries had better clinical status than patients who continued working in low-GDP countries. The most significant identifier of work disability in all subgroups was Health Assessment Questionnaire (HAQ) functional disability score. Conclusions: Work disability rates remain high among people with RA during this millennium. In low-GDP countries, people remain working with high levels of disability and disease activity. Cultural and economic differences between societies affect work disability as an outcome measure for RA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aims. An analytical solution for the discrepancy between observed core-like profiles and predicted cusp profiles in dark matter halos is studied. Methods. We calculate the distribution function for Navarro-Frenk-White halos and extract energy from the distribution, taking into account the effects of baryonic physics processes. Results. We show with a simple argument that we can reproduce the evolution of a cusp to a flat density profile by a decrease of the initial potential energy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The energy spectrum of an electron confined in a quantum dot (QD) with a three-dimensional anisotropic parabolic potential in a tilted magnetic field was found analytically. The theory describes exactly the mixing of in-plane and out-of-plane motions of an electron caused by a tilted magnetic field, which could be seen, for example, in the level anticrossing. For charged QDs in a tilted magnetic field we predict three strong resonant lines in the far-infrared-absorption spectra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The width of a closed convex subset of n-dimensional Euclidean space is the distance between two parallel supporting hyperplanes. The Blaschke-Lebesgue problem consists of minimizing the volume in the class of convex sets of fixed constant width and is still open in dimension n >= 3. In this paper we describe a necessary condition that the minimizer of the Blaschke-Lebesgue must satisfy in dimension n = 3: we prove that the smooth components of the boundary of the minimizer have their smaller principal curvature constant and therefore are either spherical caps or pieces of tubes (canal surfaces).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The first problem of the Seleucid mathematical cuneiform tablet BM 34 568 calculates the diagonal of a rectangle from its sides without resorting to the Pythagorean rule. For this reason, it has been a source of discussion among specialists ever since its first publication. but so far no consensus in relation to its mathematical meaning has been attained. This paper presents two new interpretations of the scribe`s procedure. based on the assumption that he was able to reduce the problem to a standard Mesopotamian question about reciprocal numbers. These new interpretations are then linked to interpretations of the Old Babylonian tablet Plimpton 322 and to the presence of Pythagorean triples in the contexts of Old Babylonian and Hellenistic mathematics. (C) 2007 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider a class of two-dimensional problems in classical linear elasticity for which material overlapping occurs in the absence of singularities. Of course, material overlapping is not physically realistic, and one possible way to prevent it uses a constrained minimization theory. In this theory, a minimization problem consists of minimizing the total potential energy of a linear elastic body subject to the constraint that the deformation field must be locally invertible. Here, we use an interior and an exterior penalty formulation of the minimization problem together with both a standard finite element method and classical nonlinear programming techniques to compute the minimizers. We compare both formulations by solving a plane problem numerically in the context of the constrained minimization theory. The problem has a closed-form solution, which is used to validate the numerical results. This solution is regular everywhere, including the boundary. In particular, we show numerical results which indicate that, for a fixed finite element mesh, the sequences of numerical solutions obtained with both the interior and the exterior penalty formulations converge to the same limit function as the penalization is enforced. This limit function yields an approximate deformation field to the plane problem that is locally invertible at all points in the domain. As the mesh is refined, this field converges to the exact solution of the plane problem.