20 resultados para Clustering a large document collection

em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification is a fundamental activity for the organization and management of archival document collection, as it is from this function that are based the priorities for the description and the bases for appraisal procedures. This article seeks through the fundamental works to the theoretical and methodological development of the Archival Science characterizes the historical and conceptual background from the classification notion. This article aims to answering some questions about the expansion of its importance during the development of its theory and its use on today. Is also sought to characterize the history of the Archival Science as a discipline since the classification was one of the first activities to be theorized.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The post-processing of association rules is a difficult task, since a large number of patterns can be obtained. Many approaches have been developed to overcome this problem, as objective measures and clustering, which are respectively used to: (i) highlight the potentially interesting knowledge in domain; (ii) structure the domain, organizing the rules in groups that contain, somehow, similar knowledge. However, objective measures don't reduce nor organize the collection of rules, making the understanding of the domain difficult. On the other hand, clustering doesn't reduce the exploration space nor direct the user to find interesting knowledge, making the search for relevant knowledge not so easy. This work proposes the PAR-COM (Post-processing Association Rules with Clustering and Objective Measures) methodology that, combining clustering and objective measures, reduces the association rule exploration space directing the user to what is potentially interesting. Thereby, PAR-COM minimizes the user's effort during the post-processing process.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One objective of the feeder reconfiguration problem in distribution systems is to minimize the power losses for a specific load. For this problem, mathematical modeling is a nonlinear mixed integer problem that is generally hard to solve. This paper proposes an algorithm based on artificial neural network theory. In this context, clustering techniques to determine the best training set for a single neural network with generalization ability are also presented. The proposed methodology was employed for solving two electrical systems and presented good results. Moreover, the methodology can be employed for large-scale systems in real-time environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST),program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present study aimed at evaluating the histo-morphological changes resulting from different fasting periods before the collection of tissue samples in different segments of the small intestine (duodenum, jejunum and ileum) of 7-d-old male chicks of a broiler and a layer strain. A completely randomized experimental design in in a 2x7 factorial arrangement, being two strains with different growth rates (Ross 308 and HyLine® W36) and seven fasting periods (0, 2, 4, 6, 8, 10 and 12 hours ), with six replicates, totaling 84 birds. The comparison of the morphometrics of the duodenum, jejunum and ileum of broiler and layer chicks demonstrated faster digestive tract development in broilers relative to layers. The fasting period caused morphological changes in the liver and small and large intestines in both strains. Therefore, it must be highlighted that in studies involving organ weights and intestinal morphometrics, birds must not be submitted to fasting before tissue collection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We derive constraints on a simple quintessential inflation model, based on a spontaneously broken Phi(4) theory, imposed by the Wilkinson Microwave Anisotropy Probe three-year data (WMAP3) and by galaxy clustering results from the Sloan Digital Sky Survey (SDSS). We find that the scale of symmetry breaking must be larger than about 3 Planck masses in order for inflation to generate acceptable values of the scalar spectral index and of the tensor-to-scalar ratio. We also show that the resulting quintessence equation of state can evolve rapidly at recent times and hence can potentially be distinguished from a simple cosmological constant in this parameter regime.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study a model for dynamical localization of topology using ideas from non-commutative geometry and topology in quantum mechanics. We consider a collection X of N one-dimensional manifolds and the corresponding set of boundary conditions (self-adjoint extensions) of the Dirac operator D. The set of boundary conditions encodes the topology and is parameterized by unitary matrices g. A particular geometry is described by a spectral triple x(g) = (A X, script H sign X, D(g)). We define a partition function for the sum over all g. In this model topology fluctuates but the dimension is kept fixed. We use the spectral principle to obtain an action for the set of boundary conditions. Together with invariance principles the procedure fixes the partition function for fluctuating topologies. The model has one free-parameter β and it is equivalent to a one plaquette gauge theory. We argue that topology becomes localized at β = ∞ for any value of N. Moreover, the system undergoes a third-order phase transition at β = 1 for large-N. We give a topological interpretation of the phase transition by looking how it affects the topology. © SISSA/ISAS 2004.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wireless Sensor Networks (WSN) are a special kind of ad-hoc networks that is usually deployed in a monitoring field in order to detect some physical phenomenon. Due to the low dependability of individual nodes, small radio coverage and large areas to be monitored, the organization of nodes in small clusters is generally used. Moreover, a large number of WSN nodes is usually deployed in the monitoring area to increase WSN dependability. Therefore, the best cluster head positioning is a desirable characteristic in a WSN. In this paper, we propose a hybrid clustering algorithm based on community detection in complex networks and traditional K-means clustering technique: the QK-Means algorithm. Simulation results show that QK-Means detect communities and sub-communities thus lost message rate is decreased and WSN coverage is increased. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose a nature-inspired approach that can boost the Optimum-Path Forest (OPF) clustering algorithm by optimizing its parameters in a discrete lattice. The experiments in two public datasets have shown that the proposed algorithm can achieve similar parameters' values compared to the exhaustive search. Although, the proposed technique is faster than the traditional one, being interesting for intrusion detection in large scale traffic networks. © 2012 IEEE.