29 resultados para document clustering
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.
Resumo:
One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.
Resumo:
The Capacitated Centered Clustering Problem (CCCP) consists of defining a set of p groups with minimum dissimilarity on a network with n points. Demand values are associated with each point and each group has a demand capacity. The problem is well known to be NP-hard and has many practical applications. In this paper, the hybrid method Clustering Search (CS) is implemented to solve the CCCP. This method identifies promising regions of the search space by generating solutions with a metaheuristic, such as Genetic Algorithm, and clustering them into clusters that are then explored further with local search heuristics. Computational results considering instances available in the literature are presented to demonstrate the efficacy of CS. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
One objective of the feeder reconfiguration problem in distribution systems is to minimize the power losses for a specific load. For this problem, mathematical modeling is a nonlinear mixed integer problem that is generally hard to solve. This paper proposes an algorithm based on artificial neural network theory. In this context, clustering techniques to determine the best training set for a single neural network with generalization ability are also presented. The proposed methodology was employed for solving two electrical systems and presented good results. Moreover, the methodology can be employed for large-scale systems in real-time environment.
Resumo:
The development of strategies for structural health monitoring (SHM) has become increasingly important because of the necessity of preventing undesirable damage. This paper describes an approach to this problem using vibration data. It involves a three-stage process: reduction of the time-series data using principle component analysis (PCA), the development of a data-based model using an auto-regressive moving average (ARMA) model using data from an undamaged structure, and the classification of whether or not the structure is damaged using a fuzzy clustering approach. The approach is applied to data from a benchmark structure from Los Alamos National Laboratory, USA. Two fuzzy clustering algorithms are compared: fuzzy c-means (FCM) and Gustafson-Kessel (GK) algorithms. It is shown that while both fuzzy clustering algorithms are effective, the GK algorithm marginally outperforms the FCM algorithm. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Chiral symmetry breaking at finite baryon density is usually discussed in the context of quark matter, i.e. a system of deconfined quarks. Many systems like stable nuclei and neutron stars however have quarks confined within nucleons. In this paper we construct a Fermi sea of three-quark nucleon clusters and investigate the change of the quark condensate as a function of baryon density. We study the effect of quark clustering on the in-medium quark condensate and compare results with the traditional approach of modeling hadronic matter in terms of a Fermi sea of deconfined quarks.
Resumo:
The frequency of adenine mononucleotides (A), dinucleotides (AA) and clusters, and the positions of clusters, were studied in 502 molecules of the 5S rRNA.All frequencies were reduced in the evolutive lines of vertebrates, plants and fungi, in parallel with increasing organismic complexity. No change was observed in invertebrates. All frequencies were increased in mitochondria, plastids and mycoplasmas. The presumed relatives to the ancestors of the organelles, Rhodobacteria alfa and Cyanobacteria, showed intermediate values, relative to the eubacterial averages. Firmibacterid showed very high number of cluster sites.Clusters were more frequent in single-stranded regions in all organisms. The routes of organelles and mycoplasmas accummulated clusters at faster rates in double-stranded regions. Rates of change were higher for AA and clusters than for A in plants, vertebrates and organeltes, higher for cluster sites and A in mycoplasmas, and higher for AA and A in fungi. These data indicated that selection pressures acted more strongly on adenine clustering than on adenine frequency.It is proposed that AA and clusters, as sites of lower informational content. have the property of tolerating positional variation in the sites of other molecules (or other regions of the same molecule) that interact with the adenines. This reasoning was consistent with the degrees of genic polymorphism. low in plants and vertebrates and high in invertebrates. In the eubacteria endosymbiontic or parasitic to eukaryotes, the more tolerant RNA would be better adapted to interactions with the homologous nucleus-derived ribosomal proteins: the intermediate values observed in their precursors were interpreted as preadaptive.Among other groups, only the Deinococcus-Thermus eubacteria showed excessive AA and cluster contents, possibly related to their peculiar tolerance to mutagens, and the Ciliates showed excessive AA contents, indicative of retention of primitive characters.
Resumo:
Luminescent spectra of Eu3+-doped sol-gel glasses have been analyzed during the densification process and compared according to the presence or not of aluminum as a codoping ion. A transition temperature from hydrated to dehydroxyled environments has been found different for doped and codoped samples. However, only slight modifications have been displayed from luminescence measurements beyond this transition. To support the experimental analysis, molecular dynamics simulations have been performed to model the doped and codoped glass structures. Despite no evidence of rare earth clustering reduction due to aluminum has been found, the modeled structures have shown that the luminescent ions are mainly located in aluminum-rich domains. The synthesis of both experimental and numerical analyses has lead us to interpret the aluminum effect as responsible for differences in structure of the luminescent sites rather than for an effective dispersion of the rare earth ions. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
Anelastic spectra (elastic energy absorption as a function of temperature) are reported which provide evidence that excess O in La2CuO4+delta starts forming two different types of defects already at very low concentrations, where no phase separation or changes in the type of O intercalation are believed to occur. The absorption peak with the lowest activation enthalpy, H/k(B) = 5600 K, is visible at lowest values of delta and is attributed to the hopping of single interstitial O2- ions. The second process, with a slightly slower dynamics, appears at higher values of delta and soon becomes preponderant over the former process. The latter process is proposed to be due to stable pairs of O atoms and is put in connection with the formation of partially covalent bonds between interstitial and apical oxygen; such bonds would reduce the doping efficiency of excess O at increasing delta. The geometry of the interstitial O defect is discussed. O 1998 Published by Elsevier B.V. B.V. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
It's believed that the simple Su-Schrieffer-Heeger Hamiltonian can not predict the insulator to metal transition of transpolyacetylene (t-PA). The soliton lattice configuration at a doping level y=6% still has a semiconductor gap. Disordered distributions of solitons close the gap, but the electronic states around the Fermi energy are localized. However, within the same framework, it is possible to show that a cluster of solitons can produce dramatic changes in the electronic structure, allowing an insulator-to-metal transition.
Resumo:
Land use classification has been paramount in the last years, since we can identify illegal land use and also to monitor deforesting areas. Although one can find several research works in the literature that address this problem, we propose here the land use recognition by means of Optimum-Path Forest Clustering (OPF), which has never been applied to this context up to date. Experiments among Optimum-Path Forest, Mean Shift and K-Means demonstrated the robustness of OPF for automatic land use classification of images obtained by CBERS-2B and Ikonos-2 satellites. © 2011 IEEE.
Resumo:
The significant volume of work accidents in the cities causes an expressive loss to society. The development of Spatial Data Mining technologies presents a new perspective for the extraction of knowledge from the correlation between conventional and spatial attributes. One of the most important techniques of the Spatial Data Mining is the Spatial Clustering, which clusters similar spatial objects to find a distribution of patterns, taking into account the geographical position of the objects. Applying this technique to the health area, will provide information that can contribute towards the planning of more adequate strategies for the prevention of work accidents. The original contribution of this work is to present an application of tools developed for Spatial Clustering which supply a set of graphic resources that have helped to discover knowledge and support for management in the work accidents area. © 2011 IEEE.
Resumo:
The post-processing of association rules is a difficult task, since a large number of patterns can be obtained. Many approaches have been developed to overcome this problem, as objective measures and clustering, which are respectively used to: (i) highlight the potentially interesting knowledge in domain; (ii) structure the domain, organizing the rules in groups that contain, somehow, similar knowledge. However, objective measures don't reduce nor organize the collection of rules, making the understanding of the domain difficult. On the other hand, clustering doesn't reduce the exploration space nor direct the user to find interesting knowledge, making the search for relevant knowledge not so easy. This work proposes the PAR-COM (Post-processing Association Rules with Clustering and Objective Measures) methodology that, combining clustering and objective measures, reduces the association rule exploration space directing the user to what is potentially interesting. Thereby, PAR-COM minimizes the user's effort during the post-processing process.