40 resultados para Hierarchical Clustering Model
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.
Resumo:
One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Os solos submetidos aos sistemas de produção sem preparo estão sujeitos à compactação, provocada pelo tráfego de máquinas, tornando necessário o acompanhamento das alterações do ambiente físico, que, quando desfavorável, restringe o crescimento radicular, podendo reduzir a produtividade das culturas. O objetivo do trabalho foi avaliar o efeito de diferentes intensidades de compactação na qualidade física de um Latossolo Vermelho textura média, localizado em Jaboticabal (SP), sob cultivo de milho, usando métodos de estatística multivariada. O delineamento experimental foi inteiramente casualizado, com seis intensidades de compactação e quatro repetições. Foram coletadas amostras indeformadas do solo nas camadas de 0,02-0,05, 0,08-0,11 e 0,15-0,18 m para determinação da densidade do solo (Ds), na camada de 0-0,20 m. As características da cultura avaliadas foram: densidade radicular, diâmetro radicular, matéria seca das raízes, altura das plantas, altura de inserção da primeira espiga, diâmetro do colmo e matéria seca das plantas. As análises de agrupamentos e componentes principais permitiram identificar três grupos de alta, média e baixa produtividade de plantas de milho, segundo variáveis do solo, do sistema radicular e da parte aérea das plantas. A classificação dos acessos em grupos foi feita por três métodos: método de agrupamentos hierárquico, método não-hierárquico k-means e análise de componentes principais. Os componentes principais evidenciaram que elevadas produtividades de milho estão correlacionadas com o bom crescimento da parte aérea das plantas, em condições de menor densidade do solo, proporcionando elevada produção de matéria seca das raízes, contudo, de pequeno diâmetro. A qualidade física do Latossolo Vermelho para o cultivo do milho foi assegurada até à densidade do solo de 1,38 Mg m-3.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Paracoccidioides brasiliensis is a thermally dimorphic fungus, and causes the most prevalent systemic mycosis in Latin America. Infection is initiated by inhalation of conidia or mycelial fragments by the host, followed by further differentiation into the yeast form. Information regarding gene expression by either form has rarely been addressed with respect to multiple time points of growth in culture. Here, we report on the construction of a genomic DNA microarray, covering approximately 25% of the genome of the organism, and its utilization in identifying genes and gene expression patterns during growth in vitro. Cloned, amplified inserts from randomly sheared genomic DNA (gDNA) and known control genes were printed onto glass slides to generate a microarray of over 12 000 elements. To examine gene expression, mRNA was extracted and amplified from mycelial or yeast cultures grown in semi-defined medium for 5, 8 and 14 days. Principal components analysis and hierarchical clustering indicated that yeast gene expression profiles differed greatly from those of mycelia, especially at earlier time points, and that mycelial gene expression changed less than gene expression in yeasts over time. Genes upregulated in yeasts were found to encode proteins shown to be involved in methionine/cysteine metabolism, respiratory and metabolic processes (of sugars, amino acids, proteins and lipids), transporters (small peptides, sugars, ions and toxins), regulatory proteins and transcription factors. Mycelial genes involved in processes such as cell division, protein catabolism, nucleotide biosynthesis and toxin and sugar transport showed differential expression. Sequenced clones were compared with Histoplasma capsulatum and Coccidioides posadasii genome sequences to assess potentially common pathways across species, such as sulfur and lipid metabolism, amino acid transporters, transcription factors and genes possibly related to virulence. We also analysed gene expression with time in culture and found that while transposable elements and components of respiratory pathways tended to increase in expression with time, genes encoding ribosomal structural proteins and protein catabolism tended to sharply decrease in expression over time, particularly in yeast. These findings expand our knowledge of the different morphological forms of P. brasiliensis during growth in culture.
Resumo:
Sessenta e nove acessos de Psidium, coletados em seis estados brasileiros, foram analisados para dois métodos não hierárquicos de agrupamento e por componentes principais (CP), visando orientar programas de melhoramento. Foram analisadas as variáveis ácido ascórbico, β-caroteno, licopeno, fenóis totais, flavonóides totais, atividade antioxidante, acidez titulável, sólidos solúveis, açúcares solúveis totais, teor de umidade, diâmetro lateral e transversal do fruto, peso da polpa e das sementes/fruto, número e produção de frutos/planta. Foram observados agrupamentos específicos para os acessos de araçazeiros no método de Tocher e do k-means e na dispersão tridimensional dos quatro CPs. Os acessos de araçazeiros foram separados dos de goiabeira. Não foi observado nenhum agrupamento específico por estado de coleta, indicando a inexistência de barreiras na propagação dos acessos de goiabeira. As análises sugerem a prospecção de maior número de amostras de germoplasma num menor número de regiões, bem como acessos divergentes com alto teor de compostos nutricionais.
Resumo:
O objetivo deste trabalho foi comparar diferentes técnicas multivariadas na caracterização de 35 genótipos de gergelim mediante 769 marcadores RAPD. As distâncias genéticas foram obtidas pelo complemento aritmético do coeficiente de Jaccard e agrupadas pelos métodos hierárquicos do vizinho mais próximo, do vizinho mais distante, das médias aritméticas não ponderadas (UPGMA), do método de otimização de Tocher e análises de coordenadas principais. O agrupamento dos genótipos foi alterado em função dos diferentes métodos usados. Adotando-se a mesma distância genética (0,36) como valor de corte, diferenciaram-se quatro grupos no método do vizinho mais próximo, 13 para o vizinho mais distante, 11 no UPGMA e quatro no Tocher. Entre os métodos hierárquicos, o UPGMA apresentou o melhor ajuste das distâncias originais e estimadas (CCC = 0,89). As análises das coordenadas principais confirmaram a baixa diversidade existente entre os genótipos. A maior divergência ocorreu entre as cultivares Seridó 1 e Arawaca 4, e a menor, entre os genótipos VCR-101 e GP-3314. As três primeiras coordenadas principais contabilizaram 35,13% do total da variabilidade, e 18 autovalores foram necessários para explicar 81% da variação genética. Os métodos UPGMA, de otimização de Tocher, e as análises de coordenadas principais são complementares na formação dos grupos.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
A structure modeling of two families of sol-gel derived Eu3+-doped organic/inorganic hybrids based on the results of small-angle X-ray scattering experiments is reported. The materials are composed of poly(oxyethylene) chains grafted at one or both ends to siloxane groups and are called mono- and di-urethanesils, respectively. A theoretical function corresponding to a two-level hierarchical structure model fits well the experimental Scattering curves. The first level corresponds to small siloxane clusters embedded in a polymeric matrix. The secondary level is associated to the existence of siloxane cluster rich domains surrounded by a cluster-depleted polymeric matrix. Results show that increasing europium doping favors the growth of the secondary domains. (C) 2002 Elsevier B.V. B.V. All rights reserved.
Resumo:
Biomass burning is an important primary and secondary source of aerosol particles. The presence of carbonaceous particles in the respirable size range makes the study of this fraction important in view of possible health and climatic effects. The annual burning of sugar cane plantations causes emission of huge amounts of pyrogenic particles. Aerosol samples were collected in Araraquara city, São Paulo state, Brazil, during the harvest season for fine and coarse particles and bulk; they were analysed by electron-probe microanalysis, including facilities for low-Z element determination (low-Z EPMA) and by energy-dispersive X-ray fluorescence (EDXRF), in order to investigate the elemental composition of individual particles and bulk samples, respectively. Numerical analysis of the EPMA results by hierarchical clustering shows high contributions of carbonaceous particles that can be distinguished mainly in two different types: biogenic and carbon-rich. Additionally, two significant contributions of aluminosilicate particles were identified: as rather pure aluminosilicates or mixed with carbonaceous species. The EDXRF results are compatible with those of aerosol particles in Amazon, which is nowadays one of the main sources of biogenic particles in the world.
Resumo:
North's clustering method, which is based on a much used ecological model, the nearest neighbor distance, was applied to the objective reconstruction of the chain of household-to-household transmission of variola minor (the mild form of smallpox). The discrete within-household outbreaks were considered as points which were ordered in a time sequence using a 10-40 day interval between introduction of the disease into a source household and a receptor household. The closer points in the plane were assumed to have a larger probability of being links of a chain of household-to-household spread of the disease. The five defining distances (Manhattan or city-block distance between presumptive source and receptor dwellings) were 100, 200, 300, 400 and 500 m. The subchain sets obtained with the five defining distances were compared with the subchains empirically reconstructed during the field study of the epidemic through direct investigation of personal contacts of the introductory cases with either introductory or subsequent cases from previously affected households. The criteria of fit of theoretical to empirical clusters were: (a) the number of clustered dwellings and subchains, (b) number of dwellings in a subchain and (c) position of dwellings in a subchain. The defining distance closet to the empirical findings was 200 m, which fully agrees with the travelling habits of the study population. Less close but acceptable approximations were obtained with 100, 300, 400 and 500 m. The latter two distances gave identical results, as if a clustering ceiling had been reached. It seems that North's clustering model may be used for an objective reconstruction of the chain of contagious whose links are discrete within-household outbreaks. © 1984.
Resumo:
(10) Hygiea is the fourth largest asteroid of the main belt, by volume and mass, and it is the largest member of its family, that is made mostly by low-albedo, C-type asteroids, typical of the outer main belt. Like many other large families, it is associated with a 'halo' of objects, that extends far beyond the boundary of the core family, as detected by traditional hierarchical clustering methods (HCM) in proper element domains. Numerical simulations of the orbital evolution of family members may help in estimating the family and halo family age, and the original ejection velocity field. But, in order to minimize the errors associated with including too many interlopers, it is important to have good estimates of family membership that include available data on local asteroid taxonomy, geometrical albedo and local dynamics. For this purpose, we obtained synthetic proper elements and frequencies of asteroids in the Hygiea orbital region, with their errors. We revised the current knowledge on asteroid taxonomy, including Sloan Digital Sky Survey-Moving Object Catalog 4th release (SDSS-MOC 4) data, and geometric albedo data from Wide-field Infrared Survey Explorer (WISE) and Near-Earth Object WISE (NEOWISE). We identified asteroid family members using HCM in the domain of proper elements (a, e, sin (i)) and in the domains of proper frequencies most appropriate to study diffusion in the local web of secular resonances, and eliminated possible interlopers based on taxonomic and geometrical albedo considerations. To identify the family halo, we devised a new hierarchical clustering method in an extended domain that includes proper elements, principal components PC1, PC2 obtained based on SDSS photometric data and, for the first time, WISE and NEOWISE geometric albedo. Data on asteroid size distribution, light curves and rotations were also revised for the Hygiea family. The Hygiea family is the largest group in its region, with two smaller families in proper element domain and 18 families in various frequencies domains identified in this work for the first time. Frequency groups tend to extend vertically in the (a, sin (i)) plane and cross not only the Hygiea family but also the near C-type families of Themis and Veritas, causing a mixture of objects all of relatively low albedo in the Hygiea family area. A few high-albedo asteroids, most likely associated with the Eos family, are also present in the region. Finally, the new multidomains hierarchical clustering method allowed us to obtain a good and robust estimate of the membership of the Hygiea family halo, quite separated from other asteroids families halo in the region, and with a very limited (about 3 per cent) presence of likely interlopers. © 2013 The Author Published by Oxford University Press on behalf of the Royal Astronomical Society.