795 resultados para hierarchical clustering
Resumo:
Besides the spinal deformity, scoliosis modifies notably the general appearance of the trunk resulting in trunk rotation, imbalance, and asymmetries that constitutes patients' major concern. Existing classifications of scoliosis, based on the type of spinal curve as depicted on radiographs, are currently used to guide treatment strategies. Unfortunately, even though a perfect correction of the spinal curve is achieved, some trunk deformities remain, making patients dissatisfied with the treatment received. The purpose of this study is to identify possible shape patterns of trunk surface deformity associated with scoliosis. First, trunk surface is represented by a multivariate functional trunk shape descriptor based on 3-D clinical measurements computed on cross sections of the trunk. Then, the classical formulation of hierarchical clustering is adapted to the case of multivariate functional data and applied to a set of 236 trunk surface 3-D reconstructions. The highest internal validity is obtained when considering 11 clusters that explain up to 65% of the variance in our dataset. Our clustering result shows a concordance with the radiographic classification of spinal curves in 68% of the cases. As opposed to radiographic evaluation, the trunk descriptor is 3-D and its functional nature offers a compact and elegant description of not only the type, but also the severity and extent of the trunk surface deformity along the trunk length. In future work, new management strategies based on the resulting trunk shape patterns could be thought of in order to improve the esthetic outcome after treatment, and thus patients satisfaction.
Resumo:
The diversity of social bees was assessed at 15 sites across five locations of the Nilgiri Biosphere Reserve, Western Ghats, India, from January to December 2007. We also conducted floristic analyses of local vegetation in each site using one-hectare sample plots. All woody species with a dbh (diameter at breast height) : 30 cm were recorded within the plots. A total area of 9.72 ha was assessed for floristic composition. Similarity of floristic composition between sites was determined using the Jaccard's distance measure and a dendrogram constructed based on the hierarchical clustering of floristic dissimilarities between sites. A Bee Importance Index (BII) was developed to give a measure of the bee diversity at each site. This index was a sum of the species richness of bee species in a site and their visitation frequencies to flowers, calculated as mean flower visits hour 1 within 2 focal patches within one hectare plots. The visits of bee species to flowers were also recorded. The Jaccard distance measure indicated that the montane sites were quite dissimilar to the low elevation sites in floristic diversity. The BII was 7-9 for the wet forest sites and ranged from 4-6 for drier forest sites. Seventy three plant species were identified as social bee plants and of them 45% were visited by one species of bee, 37% by two bee species and 18% by more than two bee species, indicating a certain degree of floral specialization among bees.
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
The polar winter stratospheric vortex is a coherent structure that undergoes different types of deformation that can be revealed by the geometric invariant moments. Three moments are used—the aspect ratio, the centroid latitude, and the area of the vortex based on stratospheric data from the 40-yr ECMWF Re-Analysis (ERA-40) project—to study sudden stratospheric warmings. Hierarchical clustering combined with data image visualization techniques is used as well. Using the gap statistic, three optimal clusters are obtained based on the three geometric moments considered here. The 850-K potential vorticity field, as well as the vertical profiles of polar temperature and zonal wind, provides evidence that the clusters represent, respectively, the undisturbed (U), displaced (D), and split (S) states of the polar vortex. This systematic method for identifying and characterizing the state of the polar vortex using objective methods is useful as a tool for analyzing observations and as a test for climate models to simulate the observations. The method correctly identifies all previously identified major warmings and also identifies significant minor warmings where the atmosphere is substantially disturbed but does not quite meet the criteria to qualify as a major stratospheric warming.
Resumo:
The presence of 10 virulence genes was examined using polymerase chain reaction (PCR) in 365 European O157 and non-O157 Escherichia coli isolates associated with verotoxin production. Strain-specific PCR data were analysed using hierarchical clustering. The resulting dendrogram clearly separated O157 from non-O157 strains. The former clustered typical high-risk seropathotype (SPT) A strains from all regions, including Sweden and Spain, which were homogenous by Cramer's V statistic, and strains with less typical O157 features mostly from Hungary. The non-O157 strains divided into a high-risk SPTB harbouring O26, O111 and O103 strains, a group pathogenic to pigs, and a group with few virulence genes other than for verotoxin. The data demonstrate SPT designation and selected PCR separated verotoxigenic E. coli of high and low risk to humans; although more virulence genes or pulsed-field gel electrophoresis will need to be included to separate high-risk strains further for epidemiological tracing.
Resumo:
The urban heat island is a well-known phenomenon that impacts a wide variety of city operations. With greater availability of cheap meteorological sensors, it is possible to measure the spatial patterns of urban atmospheric characteristics with greater resolution. To develop robust and resilient networks, recognizing sensors may malfunction, it is important to know when measurement points are providing additional information and also the minimum number of sensors needed to provide spatial information for particular applications. Here we consider the example of temperature data, and the urban heat island, through analysis of a network of sensors in the Tokyo metropolitan area (Extended METROS). The effect of reducing observation points from an existing meteorological measurement network is considered, using random sampling and sampling with clustering. The results indicated the sampling with hierarchical clustering can yield similar temperature patterns with up to a 30% reduction in measurement sites in Tokyo. The methods presented have broader utility in evaluating the robustness and resilience of existing urban temperature networks and in how networks can be enhanced by new mobile and open data sources.
Resumo:
Mesenchymal stem cells (MSC) are multipotent cells which can be obtained from several adult and fetal tissues including human umbilical cord units. We have recently shown that umbilical cord tissue (UC) is richer in MSC than umbilical cord blood (UCB) but their origin and characteristics in blood as compared to the cord remains unknown. Here we compared, for the first time, the exonic protein-coding and intronic noncoding RNA (ncRNA) expression profiles of MSC from match-paired UC and UCB samples, harvested from the same donors, processed simultaneously and under the same culture conditions. The patterns of intronic ncRNA expression in MSC from UC and UCB paired units were highly similar, indicative of their common donor origin. The respective exonic protein-coding transcript expression profiles, however, were significantly different. Hierarchical clustering based on protein-coding expression similarities grouped MSC according to their tissue location rather than original donor. Genes related to systems development, osteogenesis and immune system were expressed at higher levels in UCB, whereas genes related to cell adhesion, morphogenesis, secretion, angiogenesis and neurogenesis were more expressed in UC cells. These molecular differences verified in tissue-specific MSC gene expression may reflect functional activities influenced by distinct niches and should be considered when developing clinical protocols involving MSC from different sources. In addition, these findings reinforce our previous suggestion on the importance of banking the whole umbilical cord unit for research or future therapeutic use.
Resumo:
Os solos submetidos aos sistemas de produção sem preparo estão sujeitos à compactação, provocada pelo tráfego de máquinas, tornando necessário o acompanhamento das alterações do ambiente físico, que, quando desfavorável, restringe o crescimento radicular, podendo reduzir a produtividade das culturas. O objetivo do trabalho foi avaliar o efeito de diferentes intensidades de compactação na qualidade física de um Latossolo Vermelho textura média, localizado em Jaboticabal (SP), sob cultivo de milho, usando métodos de estatística multivariada. O delineamento experimental foi inteiramente casualizado, com seis intensidades de compactação e quatro repetições. Foram coletadas amostras indeformadas do solo nas camadas de 0,02-0,05, 0,08-0,11 e 0,15-0,18 m para determinação da densidade do solo (Ds), na camada de 0-0,20 m. As características da cultura avaliadas foram: densidade radicular, diâmetro radicular, matéria seca das raízes, altura das plantas, altura de inserção da primeira espiga, diâmetro do colmo e matéria seca das plantas. As análises de agrupamentos e componentes principais permitiram identificar três grupos de alta, média e baixa produtividade de plantas de milho, segundo variáveis do solo, do sistema radicular e da parte aérea das plantas. A classificação dos acessos em grupos foi feita por três métodos: método de agrupamentos hierárquico, método não-hierárquico k-means e análise de componentes principais. Os componentes principais evidenciaram que elevadas produtividades de milho estão correlacionadas com o bom crescimento da parte aérea das plantas, em condições de menor densidade do solo, proporcionando elevada produção de matéria seca das raízes, contudo, de pequeno diâmetro. A qualidade física do Latossolo Vermelho para o cultivo do milho foi assegurada até à densidade do solo de 1,38 Mg m-3.
Resumo:
In this work calibration models were constructed to determine the content of total lipids and moisture in powdered milk samples. For this, used the near-infrared spectroscopy by diffuse reflectance, combined with multivariate calibration. Initially, the spectral data were submitted to correction of multiplicative light scattering (MSC) and Savitzsky-Golay smoothing. Then, the samples were divided into subgroups by application of hierarchical clustering analysis of the classes (HCA) and Ward Linkage criterion. Thus, it became possible to build regression models by partial least squares (PLS) that allowed the calibration and prediction of the content total lipid and moisture, based on the values obtained by the reference methods of Soxhlet and 105 ° C, respectively . Therefore, conclude that the NIR had a good performance for the quantification of samples of powdered milk, mainly by minimizing the analysis time, not destruction of the samples and not waste. Prediction models for determination of total lipids correlated (R) of 0.9955, RMSEP of 0.8952, therefore the average error between the Soxhlet and NIR was ± 0.70%, while the model prediction to content moisture correlated (R) of 0.9184, RMSEP, 0.3778 and error of ± 0.76%
Resumo:
Heavy metals can cause problems of human poisoning by ingestion of contaminated food, and the environment, a negative impact on the aquatic fauna and flora. And for the presence of these metals have been used for aquatic animals biomonitoramento environment. This research was done in order to assess the environmental impact of industrial and domestic sewage dumped in estuaries potiguares, from measures of heavy metals in mullet. The methods used for these determinations are those in the literature for analysis of food and water. Collections were 20 samples of mullet in several municipality of the state of Rio Grande do Norte, from the estuaries potiguares. Were analyzed the content of humidity, ash and heavy metals. The data were subjected to two methods of exploratory analysis: analysis of the main components (PCA), which provided a multivariate interpretation, showing that the samples are grouped according to similarities in the levels of metals and analysis of hierarchical groupings (HCA), producing similar results. These tests have proved useful for the treatment of the data producing information that would hardly viewed directly in the matrix of data. The analysis of the results shows the high levels of metallic species in samples Mugil brasiliensis collected in Estuaries /Potengi, Piranhas/Açu, Guaraíra / Papeba / Arês and Curimataú
Resumo:
Peng was the first to work with the Technical DFA (Detrended Fluctuation Analysis), a tool capable of detecting auto-long-range correlation in time series with non-stationary. In this study, the technique of DFA is used to obtain the Hurst exponent (H) profile of the electric neutron porosity of the 52 oil wells in Namorado Field, located in the Campos Basin -Brazil. The purpose is to know if the Hurst exponent can be used to characterize spatial distribution of wells. Thus, we verify that the wells that have close values of H are spatially close together. In this work we used the method of hierarchical clustering and non-hierarchical clustering method (the k-mean method). Then compare the two methods to see which of the two provides the best result. From this, was the parameter � (index neighborhood) which checks whether a data set generated by the k- average method, or at random, so in fact spatial patterns. High values of � indicate that the data are aggregated, while low values of � indicate that the data are scattered (no spatial correlation). Using the Monte Carlo method showed that combined data show a random distribution of � below the empirical value. So the empirical evidence of H obtained from 52 wells are grouped geographically. By passing the data of standard curves with the results obtained by the k-mean, confirming that it is effective to correlate well in spatial distribution
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Paracoccidioides brasiliensis is a thermally dimorphic fungus, and causes the most prevalent systemic mycosis in Latin America. Infection is initiated by inhalation of conidia or mycelial fragments by the host, followed by further differentiation into the yeast form. Information regarding gene expression by either form has rarely been addressed with respect to multiple time points of growth in culture. Here, we report on the construction of a genomic DNA microarray, covering approximately 25% of the genome of the organism, and its utilization in identifying genes and gene expression patterns during growth in vitro. Cloned, amplified inserts from randomly sheared genomic DNA (gDNA) and known control genes were printed onto glass slides to generate a microarray of over 12 000 elements. To examine gene expression, mRNA was extracted and amplified from mycelial or yeast cultures grown in semi-defined medium for 5, 8 and 14 days. Principal components analysis and hierarchical clustering indicated that yeast gene expression profiles differed greatly from those of mycelia, especially at earlier time points, and that mycelial gene expression changed less than gene expression in yeasts over time. Genes upregulated in yeasts were found to encode proteins shown to be involved in methionine/cysteine metabolism, respiratory and metabolic processes (of sugars, amino acids, proteins and lipids), transporters (small peptides, sugars, ions and toxins), regulatory proteins and transcription factors. Mycelial genes involved in processes such as cell division, protein catabolism, nucleotide biosynthesis and toxin and sugar transport showed differential expression. Sequenced clones were compared with Histoplasma capsulatum and Coccidioides posadasii genome sequences to assess potentially common pathways across species, such as sulfur and lipid metabolism, amino acid transporters, transcription factors and genes possibly related to virulence. We also analysed gene expression with time in culture and found that while transposable elements and components of respiratory pathways tended to increase in expression with time, genes encoding ribosomal structural proteins and protein catabolism tended to sharply decrease in expression over time, particularly in yeast. These findings expand our knowledge of the different morphological forms of P. brasiliensis during growth in culture.
Resumo:
Sessenta e nove acessos de Psidium, coletados em seis estados brasileiros, foram analisados para dois métodos não hierárquicos de agrupamento e por componentes principais (CP), visando orientar programas de melhoramento. Foram analisadas as variáveis ácido ascórbico, β-caroteno, licopeno, fenóis totais, flavonóides totais, atividade antioxidante, acidez titulável, sólidos solúveis, açúcares solúveis totais, teor de umidade, diâmetro lateral e transversal do fruto, peso da polpa e das sementes/fruto, número e produção de frutos/planta. Foram observados agrupamentos específicos para os acessos de araçazeiros no método de Tocher e do k-means e na dispersão tridimensional dos quatro CPs. Os acessos de araçazeiros foram separados dos de goiabeira. Não foi observado nenhum agrupamento específico por estado de coleta, indicando a inexistência de barreiras na propagação dos acessos de goiabeira. As análises sugerem a prospecção de maior número de amostras de germoplasma num menor número de regiões, bem como acessos divergentes com alto teor de compostos nutricionais.