21 resultados para Labeling hierarchical clustering

em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"


Relevância:

100.00% 100.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Os solos submetidos aos sistemas de produção sem preparo estão sujeitos à compactação, provocada pelo tráfego de máquinas, tornando necessário o acompanhamento das alterações do ambiente físico, que, quando desfavorável, restringe o crescimento radicular, podendo reduzir a produtividade das culturas. O objetivo do trabalho foi avaliar o efeito de diferentes intensidades de compactação na qualidade física de um Latossolo Vermelho textura média, localizado em Jaboticabal (SP), sob cultivo de milho, usando métodos de estatística multivariada. O delineamento experimental foi inteiramente casualizado, com seis intensidades de compactação e quatro repetições. Foram coletadas amostras indeformadas do solo nas camadas de 0,02-0,05, 0,08-0,11 e 0,15-0,18 m para determinação da densidade do solo (Ds), na camada de 0-0,20 m. As características da cultura avaliadas foram: densidade radicular, diâmetro radicular, matéria seca das raízes, altura das plantas, altura de inserção da primeira espiga, diâmetro do colmo e matéria seca das plantas. As análises de agrupamentos e componentes principais permitiram identificar três grupos de alta, média e baixa produtividade de plantas de milho, segundo variáveis do solo, do sistema radicular e da parte aérea das plantas. A classificação dos acessos em grupos foi feita por três métodos: método de agrupamentos hierárquico, método não-hierárquico k-means e análise de componentes principais. Os componentes principais evidenciaram que elevadas produtividades de milho estão correlacionadas com o bom crescimento da parte aérea das plantas, em condições de menor densidade do solo, proporcionando elevada produção de matéria seca das raízes, contudo, de pequeno diâmetro. A qualidade física do Latossolo Vermelho para o cultivo do milho foi assegurada até à densidade do solo de 1,38 Mg m-3.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Paracoccidioides brasiliensis is a thermally dimorphic fungus, and causes the most prevalent systemic mycosis in Latin America. Infection is initiated by inhalation of conidia or mycelial fragments by the host, followed by further differentiation into the yeast form. Information regarding gene expression by either form has rarely been addressed with respect to multiple time points of growth in culture. Here, we report on the construction of a genomic DNA microarray, covering approximately 25% of the genome of the organism, and its utilization in identifying genes and gene expression patterns during growth in vitro. Cloned, amplified inserts from randomly sheared genomic DNA (gDNA) and known control genes were printed onto glass slides to generate a microarray of over 12 000 elements. To examine gene expression, mRNA was extracted and amplified from mycelial or yeast cultures grown in semi-defined medium for 5, 8 and 14 days. Principal components analysis and hierarchical clustering indicated that yeast gene expression profiles differed greatly from those of mycelia, especially at earlier time points, and that mycelial gene expression changed less than gene expression in yeasts over time. Genes upregulated in yeasts were found to encode proteins shown to be involved in methionine/cysteine metabolism, respiratory and metabolic processes (of sugars, amino acids, proteins and lipids), transporters (small peptides, sugars, ions and toxins), regulatory proteins and transcription factors. Mycelial genes involved in processes such as cell division, protein catabolism, nucleotide biosynthesis and toxin and sugar transport showed differential expression. Sequenced clones were compared with Histoplasma capsulatum and Coccidioides posadasii genome sequences to assess potentially common pathways across species, such as sulfur and lipid metabolism, amino acid transporters, transcription factors and genes possibly related to virulence. We also analysed gene expression with time in culture and found that while transposable elements and components of respiratory pathways tended to increase in expression with time, genes encoding ribosomal structural proteins and protein catabolism tended to sharply decrease in expression over time, particularly in yeast. These findings expand our knowledge of the different morphological forms of P. brasiliensis during growth in culture.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sessenta e nove acessos de Psidium, coletados em seis estados brasileiros, foram analisados para dois métodos não hierárquicos de agrupamento e por componentes principais (CP), visando orientar programas de melhoramento. Foram analisadas as variáveis ácido ascórbico, β-caroteno, licopeno, fenóis totais, flavonóides totais, atividade antioxidante, acidez titulável, sólidos solúveis, açúcares solúveis totais, teor de umidade, diâmetro lateral e transversal do fruto, peso da polpa e das sementes/fruto, número e produção de frutos/planta. Foram observados agrupamentos específicos para os acessos de araçazeiros no método de Tocher e do k-means e na dispersão tridimensional dos quatro CPs. Os acessos de araçazeiros foram separados dos de goiabeira. Não foi observado nenhum agrupamento específico por estado de coleta, indicando a inexistência de barreiras na propagação dos acessos de goiabeira. As análises sugerem a prospecção de maior número de amostras de germoplasma num menor número de regiões, bem como acessos divergentes com alto teor de compostos nutricionais.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O objetivo deste trabalho foi comparar diferentes técnicas multivariadas na caracterização de 35 genótipos de gergelim mediante 769 marcadores RAPD. As distâncias genéticas foram obtidas pelo complemento aritmético do coeficiente de Jaccard e agrupadas pelos métodos hierárquicos do vizinho mais próximo, do vizinho mais distante, das médias aritméticas não ponderadas (UPGMA), do método de otimização de Tocher e análises de coordenadas principais. O agrupamento dos genótipos foi alterado em função dos diferentes métodos usados. Adotando-se a mesma distância genética (0,36) como valor de corte, diferenciaram-se quatro grupos no método do vizinho mais próximo, 13 para o vizinho mais distante, 11 no UPGMA e quatro no Tocher. Entre os métodos hierárquicos, o UPGMA apresentou o melhor ajuste das distâncias originais e estimadas (CCC = 0,89). As análises das coordenadas principais confirmaram a baixa diversidade existente entre os genótipos. A maior divergência ocorreu entre as cultivares Seridó 1 e Arawaca 4, e a menor, entre os genótipos VCR-101 e GP-3314. As três primeiras coordenadas principais contabilizaram 35,13% do total da variabilidade, e 18 autovalores foram necessários para explicar 81% da variação genética. Os métodos UPGMA, de otimização de Tocher, e as análises de coordenadas principais são complementares na formação dos grupos.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Biomass burning is an important primary and secondary source of aerosol particles. The presence of carbonaceous particles in the respirable size range makes the study of this fraction important in view of possible health and climatic effects. The annual burning of sugar cane plantations causes emission of huge amounts of pyrogenic particles. Aerosol samples were collected in Araraquara city, São Paulo state, Brazil, during the harvest season for fine and coarse particles and bulk; they were analysed by electron-probe microanalysis, including facilities for low-Z element determination (low-Z EPMA) and by energy-dispersive X-ray fluorescence (EDXRF), in order to investigate the elemental composition of individual particles and bulk samples, respectively. Numerical analysis of the EPMA results by hierarchical clustering shows high contributions of carbonaceous particles that can be distinguished mainly in two different types: biogenic and carbon-rich. Additionally, two significant contributions of aluminosilicate particles were identified: as rather pure aluminosilicates or mixed with carbonaceous species. The EDXRF results are compatible with those of aerosol particles in Amazon, which is nowadays one of the main sources of biogenic particles in the world.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

(10) Hygiea is the fourth largest asteroid of the main belt, by volume and mass, and it is the largest member of its family, that is made mostly by low-albedo, C-type asteroids, typical of the outer main belt. Like many other large families, it is associated with a 'halo' of objects, that extends far beyond the boundary of the core family, as detected by traditional hierarchical clustering methods (HCM) in proper element domains. Numerical simulations of the orbital evolution of family members may help in estimating the family and halo family age, and the original ejection velocity field. But, in order to minimize the errors associated with including too many interlopers, it is important to have good estimates of family membership that include available data on local asteroid taxonomy, geometrical albedo and local dynamics. For this purpose, we obtained synthetic proper elements and frequencies of asteroids in the Hygiea orbital region, with their errors. We revised the current knowledge on asteroid taxonomy, including Sloan Digital Sky Survey-Moving Object Catalog 4th release (SDSS-MOC 4) data, and geometric albedo data from Wide-field Infrared Survey Explorer (WISE) and Near-Earth Object WISE (NEOWISE). We identified asteroid family members using HCM in the domain of proper elements (a, e, sin (i)) and in the domains of proper frequencies most appropriate to study diffusion in the local web of secular resonances, and eliminated possible interlopers based on taxonomic and geometrical albedo considerations. To identify the family halo, we devised a new hierarchical clustering method in an extended domain that includes proper elements, principal components PC1, PC2 obtained based on SDSS photometric data and, for the first time, WISE and NEOWISE geometric albedo. Data on asteroid size distribution, light curves and rotations were also revised for the Hygiea family. The Hygiea family is the largest group in its region, with two smaller families in proper element domain and 18 families in various frequencies domains identified in this work for the first time. Frequency groups tend to extend vertically in the (a, sin (i)) plane and cross not only the Hygiea family but also the near C-type families of Themis and Veritas, causing a mixture of objects all of relatively low albedo in the Hygiea family area. A few high-albedo asteroids, most likely associated with the Eos family, are also present in the region. Finally, the new multidomains hierarchical clustering method allowed us to obtain a good and robust estimate of the membership of the Hygiea family halo, quite separated from other asteroids families halo in the region, and with a very limited (about 3 per cent) presence of likely interlopers. © 2013 The Author Published by Oxford University Press on behalf of the Royal Astronomical Society.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Undifferentiated high-grade pleomorphic sarcomas (UPSs) display aggressive clinical behavior and frequently develop local recurrence and distant metastasis. Because these sarcomas often share similar morphological patterns with other tumors, particularly leiomyosarcomas (LMSs), classification by exclusion is frequently used. In this study, array-based comparative genomic hybridization (array CGH) was used to analyze 20 UPS and 17 LMS samples from untreated patients. The LMS samples presented a lower frequency of genomic alterations compared with the UPS samples. The most frequently altered UPS regions involved gains at 20q13.33 and 7q22.1 and losses at 3p26.3. Gains at 8q24.3 and 19q13.12 and losses at 9p21.3 were frequently detected in the LMS samples. Of these regions, gains at 1q21.3, 11q12.2-q12.3, 16p11.2, and 19q13.12 were significantly associated with reduced overall survival times in LMS patients. A multivariate analysis revealed that gains at 1q21.3 were an independent prognostic marker of shorter survival times in LMS patients (HR = 13.76; P = 0.019). Although the copy number profiles of the UPS and LMS samples could not be distinguished using unsupervised hierarchical clustering analysis, one of the three clusters presented cases associated with poor prognostic outcome (P = 0.022). A relative copy number analysis for the ARNT, SLC27A3, and PBXIP1 genes was performed using quantitative real-time PCR in 11 LMS and 16 UPS samples. Gains at 1q21-q22 were observed in both tumor types, particularly in the UPS samples. These findings provide strong evidence for the existence of a genomic signature to predict poor outcome in a subset of UPS and LMS patients. © 2013 Silveira et al.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)