962 resultados para Clustering a large document collection


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose a nature-inspired approach that can boost the Optimum-Path Forest (OPF) clustering algorithm by optimizing its parameters in a discrete lattice. The experiments in two public datasets have shown that the proposed algorithm can achieve similar parameters' values compared to the exhaustive search. Although, the proposed technique is faster than the traditional one, being interesting for intrusion detection in large scale traffic networks. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes bibliography

Relevância:

30.00% 30.00%

Publicador:

Resumo:

O avanço nas áreas de comunicação sem fio e microeletrônica permite o desenvolvimento de equipamentos micro sensores com capacidade de monitorar grandes regiões. Formadas por milhares de nós sensores, trabalhando de forma colaborativa, as Redes de Sensores sem Fio apresentam severas restrições de energia, devido à capacidade limitada das baterias dos nós que compõem a rede. O consumo de energia pode ser minimizado, permitindo que apenas alguns nós especiais, chamados de Cluster Head, sejam responsáveis por receber os dados dos nós que formam seu cluster e propagar estes dados para um ponto de coleta denominado Estação Base. A escolha do Cluster Head ideal influencia no aumento do período de estabilidade da rede, maximizando seu tempo de vida útil. A proposta, apresentada nesta dissertação, utiliza Lógica Fuzzy e algoritmo k-means com base em informações centralizadas na Estação Base para eleição do Cluster Head ideal em Redes de Sensores sem Fio heterogêneas. Os critérios usados para seleção do Cluster Head são baseados na centralidade do nó, nível de energia e proximidade para a Estação Base. Esta dissertação apresenta as desvantagens de utilização de informações locais para eleição do líder do cluster e a importância do tratamento discriminatório sobre as discrepâncias energéticas dos nós que formam a rede. Esta proposta é comparada com os algoritmos Low Energy Adaptative Clustering Hierarchy (LEACH) e Distributed energy-efficient clustering algorithm for heterogeneous Wireless sensor networks (DEEC). Esta comparação é feita, utilizando o final do período de estabilidade, como também, o tempo de vida útil da rede.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pós-graduação em Agronomia (Entomologia Agrícola) - FCAV

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Issues related to association mining have received attention, especially the ones aiming to discover and facilitate the search for interesting patterns. A promising approach, in this context, is the application of clustering in the pre-processing step. In this paper, eleven metrics are proposed to provide an assessment procedure in order to support the evaluation of this kind of approach. To propose the metrics, a subjective evaluation was done. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Besides, the metrics do the users think about aspects related to the problems and provide a flexible way to solve them. Some experiments were done in order to present how the metrics can be used and their usefulness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Thomas Jefferson Johnston Papers consists of the Civil War diary of Thomas Jefferson Johnston (1837-1894) from 1861 to August 7, 1864. Also included is a transcription of the journal as well as contextual notes of what was occurring during the war at large by Robert James Johnston (1945-) (great grandson of Thomas Jefferson Johnston) in 1992.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The CUPID (Cultural and Psychosocial Influences on Disability) study was established to explore the hypothesis that common musculoskeletal disorders (MSDs) and associated disability are importantly influenced by culturally determined health beliefs and expectations. This paper describes the methods of data collection and various characteristics of the study sample. Methods/Principal Findings: A standardised questionnaire covering musculoskeletal symptoms, disability and potential risk factors, was used to collect information from 47 samples of nurses, office workers, and other (mostly manual) workers in 18 countries from six continents. In addition, local investigators provided data on economic aspects of employment for each occupational group. Participation exceeded 80% in 33 of the 47 occupational groups, and after pre-specified exclusions, analysis was based on 12,426 subjects (92 to 1018 per occupational group). As expected, there was high usage of computer keyboards by office workers, while nurses had the highest prevalence of heavy manual lifting in all but one country. There was substantial heterogeneity between occupational groups in economic and psychosocial aspects of work; three-to fivefold variation in awareness of someone outside work with musculoskeletal pain; and more than ten-fold variation in the prevalence of adverse health beliefs about back and arm pain, and in awareness of terms such as "repetitive strain injury" (RSI). Conclusions/Significance: The large differences in psychosocial risk factors (including knowledge and beliefs about MSDs) between occupational groups should allow the study hypothesis to be addressed effectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims. Spectroscopic, polarimetric, and high spectral resolution interferometric data covering the period 1995-2011 are analyzed to document the transition into a new phase of circumstellar disk activity in the classical Be-shell star 48 Lib. The objective is to use this broad data set to additionally test disk oscillations as the basic underlying dynamical process. Methods. The long-term disk evolution is described using the V/R ratio of the violet and red emission components of H alpha and Br gamma, radial velocities and profiles of He I and optical metal shell lines, as well as multi-band BVRI polarimetry. Single-epoch broad-band and high-resolution interferometric visibilities and phases are discussed with respect to a classical disk model and the given baseline orientations. Results. Spectroscopic signatures of disk asymmetries in 48 Lib vanished in the late nineties but recovered some time between 2004 and 2007, as shown by a new large-amplitude and long-duration V/R cycle. Variations in the radial velocity and line profile of conventional shell lines correlate with the V/R behavior. They are shared by narrow absorption cores superimposed on otherwise seemingly photospheric He I lines, which may form in high-density gas at the inner disk close to the photosphere. Large radial velocity variations continued also during the V/R-quiet years, suggesting that V/R may not always be a good indicator of global density waves in the disk. The comparison of the polarization after the recovery of the V/R activity shows a slight increase, while the polarization angle has been constant for more than 20 years, placing tight limits on any 3-D precession or warping of the disk. The broad H-band interferometry gives a disk diameter of (1.72 +/- 0.2) mas (equivalent to 15 stellar radii), position angle of the disk (50 +/- 9)degrees and a relatively low disk flattening of 1.66 +/- 0.3. Within the errors the same disk position angle is derived from polarimetric observations and from photocenter shifts across Br gamma. The high-resolution interferometric visibility and phase profiles show a double or even multiple-component structure. A preliminary estimate based on the size of the Br gamma emitting region indicates a large diameter for the disk (tens of stellar radii). Overall, no serious contradiction between the observations and the disk-oscillation model could be construed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Questa dissertazione esamina le sfide e i limiti che gli algoritmi di analisi di grafi incontrano in architetture distribuite costituite da personal computer. In particolare, analizza il comportamento dell'algoritmo del PageRank così come implementato in una popolare libreria C++ di analisi di grafi distribuiti, la Parallel Boost Graph Library (Parallel BGL). I risultati qui presentati mostrano che il modello di programmazione parallela Bulk Synchronous Parallel è inadatto all'implementazione efficiente del PageRank su cluster costituiti da personal computer. L'implementazione analizzata ha infatti evidenziato una scalabilità negativa, il tempo di esecuzione dell'algoritmo aumenta linearmente in funzione del numero di processori. Questi risultati sono stati ottenuti lanciando l'algoritmo del PageRank della Parallel BGL su un cluster di 43 PC dual-core con 2GB di RAM l'uno, usando diversi grafi scelti in modo da facilitare l'identificazione delle variabili che influenzano la scalabilità. Grafi rappresentanti modelli diversi hanno dato risultati differenti, mostrando che c'è una relazione tra il coefficiente di clustering e l'inclinazione della retta che rappresenta il tempo in funzione del numero di processori. Ad esempio, i grafi Erdős–Rényi, aventi un basso coefficiente di clustering, hanno rappresentato il caso peggiore nei test del PageRank, mentre i grafi Small-World, aventi un alto coefficiente di clustering, hanno rappresentato il caso migliore. Anche le dimensioni del grafo hanno mostrato un'influenza sul tempo di esecuzione particolarmente interessante. Infatti, si è mostrato che la relazione tra il numero di nodi e il numero di archi determina il tempo totale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Sub-Saharan Africa, non-democratic events, like civil wars and coup d'etat, destroy economic development. This study investigates both domestic and spatial effects on the likelihood of civil wars and coup d'etat. To civil wars, an increase of income growth is one of common research conclusions to stop wars. This study adds a concern on ethnic fractionalization. IV-2SLS is applied to overcome causality problem. The findings document that income growth is significant to reduce number and degree of violence in high ethnic fractionalized countries, otherwise they are trade-off. Income growth reduces amount of wars, but increases its violent level, in the countries with few large ethnic groups. Promoting growth should consider ethnic composition. This study also investigates the clustering and contagion of civil wars using spatial panel data models. Onset, incidence and end of civil conflicts spread across the network of neighboring countries while peace, the end of conflicts, diffuse only with the nearest neighbor. There is an evidence of indirect links from neighboring income growth, without too much inequality, to reduce the likelihood of civil wars. To coup d'etat, this study revisits its diffusion for both all types of coups and only successful ones. The results find an existence of both domestic and spatial determinants in different periods. Domestic income growth plays major role to reduce the likelihood of coup before cold war ends, while spatial effects do negative afterward. Results on probability to succeed coup are similar. After cold war ends, international organisations seriously promote democracy with pressure against coup d'etat, and it seems to be effective. In sum, this study indicates the role of domestic ethnic fractionalization and the spread of neighboring effects to the likelihood of non-democratic events in a country. Policy implementation should concern these factors.