811 resultados para hierarchical clustering
Resumo:
In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
The present study compares the higher-level dimensions and the hierarchical structures of the fifth edition of the 16 PF with those of the NEO PI-R. Both inventories measure personality according to five higher-level dimensions. These inventories were however constructed according to different methods (bottom-up vs. top-down). 386 participants filled out both questionnaires. Correlations, regressions and canonical correlations made it possible to compare the inventories. As expected they roughly measure the same aspects of personality. There is a coherent association among four of the five dimensions measured in the tests. However Agreeableness, the remaining dimension in the NEO PI-R, is not represented in the 16 PF 5. Our analyses confirmed the hierarchical structures of both instruments, but this confirmation was more complete in the case of the NEO PI-R. Indeed, a parallel analysis indicated that a four-factor solution should be considered in the case of the 16 PF 5. On the other hand, the NEO PI-R's five-factor solution was confirmed. The top-down construction of this instrument seems to make for a more legible structure. Of the two five-dimension constructs, the NEO PI-R thus seems the more reliable. This confirms the relevance of the Five Factor Model of personality.
Resumo:
Estudi, disseny i implementació de diferents tècniques d’agrupament defibres (clustering) per tal d’integrar a la plataforma DTIWeb diferentsalgorismes de clustering i tècniques de visualització de clústers de fibres de forma quefaciliti la interpretació de dades de DTI als especialistes
Resumo:
A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
El terme paisatge i les seves aplicacions són cada dia més utilitzats per les administracions i altres entitats com a eina de gestió del territori. Aprofitant la gran quantitat de dades en bases compatibles amb SIG (Sistemes d’Informació Geogràfica) existents a Catalunya s’ha desenvolupat una síntesi cartogràfica on s’identifiquen els Paisatges Funcionals (PF) de Catalunya, concepte que fa referència al comportament fisico-ecològic del terreny a partir de variables topogràfiques i climàtiques convenientment transformades i agregades. S’ha utilitzat un mètode semiautomàtic i iteratiu de classificació no supervisada (clustering) que permet la creació d’una llegenda jeràrquica o nivells de generalització. S’ha obtingut com a resultat el Mapa de Paisatges Funcionals de Catalunya (MPFC) amb una llegenda de 26 categories de paisatges i 5 nivells de generalització amb una resolució espacial de 180 m. Paral·lelament, s’han realitzat validacions indirectes sobre el mapa obtingut a partir dels coneixements naturalistes i la cartografia existent, així com també d’un mapa d’incertesa (aplicant lògica difusa) que aporten informació de la fiabilitat de la classificació realitzada. Els Paisatges Funcionals obtinguts permeten relacionar zones de condicions topo-climàtiques homogènies i dividir el territori en zones caracteritzades ambientalment i no políticament amb la intenció que sigui d’utilitat a l’hora de millorar la gestió dels recursos naturals i la planificació d’actuacions humanes.
Resumo:
OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.
Resumo:
Rare species have restricted geographic ranges, habitat specialization, and/or small population sizes. Datasets on rare species distribution usually have few observations, limited spatial accuracy and lack of valid absences; conversely they provide comprehensive views of species distributions allowing to realistically capture most of their realized environmental niche. Rare species are the most in need of predictive distribution modelling but also the most difficult to model. We refer to this contrast as the "rare species modelling paradox" and propose as a solution developing modelling approaches that deal with a sufficiently large set of predictors, ensuring that statistical models aren't overfitted. Our novel approach fulfils this condition by fitting a large number of bivariate models and averaging them with a weighted ensemble approach. We further propose that this ensemble forecasting is conducted within a hierarchic multi-scale framework. We present two ensemble models for a test species, one at regional and one at local scale, each based on the combination of 630 models. In both cases, we obtained excellent spatial projections, unusual when modelling rare species. Model results highlight, from a statistically sound approach, the effects of multiple drivers in a same modelling framework and at two distinct scales. From this added information, regional models can support accurate forecasts of range dynamics under climate change scenarios, whereas local models allow the assessment of isolated or synergistic impacts of changes in multiple predictors. This novel framework provides a baseline for adaptive conservation, management and monitoring of rare species at distinct spatial and temporal scales.
Resumo:
Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable
Resumo:
Microsatellites are used to unravel the fine-scale genetic structure of a hybrid zone between chromosome races Valais and Cordon of the common shrew (Sorex araneus) located in the French Alps. A total of 269 individuals collected between 1992 and 1995 was typed for seven microsatellite loci. A modified version of the classical multiple correspondence analysis is carried out. This analysis clearly shows the dichotomy between the two races. Several approaches are used to study genetic structuring. Gene flow is clearly reduced between these chromosome races and is estimated at one migrant every two generations using X-statistics and one migrant per generation using F-statistics. Hierarchical F- and R-statistics are compared and their efficiency to detect inter- and intraracial patterns of divergence is discussed. Within-race genetic structuring is significant, but remains weak. F-ST displays similar values on both sides of the hybrid zone, although no environmental barriers are found on the Cordon side, whereas the Valais side is divided by several mountain rivers. We introduce the exact G-test to microsatellite data which proved to be a powerful test to detect genetic differentiation within as well as among races. The genetic background of karyotypic hybrids was compared with the genetic background of pure parental forms using a CRT-MCA. Our results indicate that, without knowledge of the karyotypes, we would not have been able to distinguish these hybrids from karyotypically pure samples.
Resumo:
In the context of Systems Biology, computer simulations of gene regulatory networks provide a powerful tool to validate hypotheses and to explore possible system behaviors. Nevertheless, modeling a system poses some challenges of its own: especially the step of model calibration is often difficult due to insufficient data. For example when considering developmental systems, mostly qualitative data describing the developmental trajectory is available while common calibration techniques rely on high-resolution quantitative data. Focusing on the calibration of differential equation models for developmental systems, this study investigates different approaches to utilize the available data to overcome these difficulties. More specifically, the fact that developmental processes are hierarchically organized is exploited to increase convergence rates of the calibration process as well as to save computation time. Using a gene regulatory network model for stem cell homeostasis in Arabidopsis thaliana the performance of the different investigated approaches is evaluated, documenting considerable gains provided by the proposed hierarchical approach.
Resumo:
Background/Aims. Recently, peripheral blood mononuclear cell transcriptome analysis has identified genes that are upregulated in relapsing minimal-change nephrotic syndrome (MCNS). In order to investigate protein expression in peripheral blood mononuclear cells (PBMC) from relapsing MCNS patients, we performed proteomic comparisons of PBMC from patients with MCNS in relapse and controls. METHODS: PBMC from a total of 20 patients were analysed. PBMC were taken from five patients with relapsing MCNS, four in remission, five patients with other glomerular diseases and six controls. Two dimensional electrophoresis was performed and proteome patterns were compared. RESULTS: Automatic heuristic clustering analysis allowed us to pool correctly the gels from the MCNS patients in the relapse and in the control groups. Using hierarchical population matching, nine spots were found to be increased in PBMC from MCNS patients in relapse. Four spots were identified by mass spectrometry. Three of the four proteins identified (L-plastin, alpha-tropomyosin and annexin III) were cytoskeletal-associated proteins. Using western blot and immunochemistry, L-plastin and alpha-tropomyosin 3 concentrations were found to be enhanced in PBMC from MCNS patients in relapse. Conclusions. These data indicate that a specific proteomic profile characterizes PBMC from MCNS patients in relapse. Proteins involved in PBMC cytoskeletal rearrangement are increased in relapsing MCNS. We hypothesize that T-cell cytoskeletal rearrangement may play a role in the pathogenesis of MCNS by altering the expression of cell surface receptors and by modifying the interaction of these cells with glomerular cells.
Resumo:
In the scenario of social bookmarking, a user browsing the Web bookmarks web pages and assigns free-text labels (i.e., tags) to them according to their personal preferences. In this technical report, we approach one of the practical aspects when it comes to represent users' interests from their tagging activity, namely the categorization of tags into high-level categories of interest. The reason is that the representation of user profiles on the basis of the myriad of tags available on the Web is certainly unfeasible from various practical perspectives; mainly concerning the unavailability of data to reliably, accurately measure interests across such fine-grained categorisation, and, should the data be available, its overwhelming computational intractability. Motivated by this, our study presents the results of a categorization process whereby a collection of tags posted at Delicious #http://delicious.com# are classified into 200 subcategories of interest.