29 resultados para Hierarchical clustering model
Resumo:
Species distribution models (SDMs) studies suggest that, without control measures, the distribution of many alien invasive plant species (AIS) will increase under climate and land-use changes. Due to limited resources and large areas colonised by invaders, management and monitoring resources must be prioritised. Choices depend on the conservation value of the invaded areas and can be guided by SDM predictions. Here, we use a hierarchical SDM framework, complemented by connectivity analysis of AIS distributions, to evaluate current and future conflicts between AIS and high conservation value areas. We illustrate the framework with three Australian wattle (Acacia) species and patterns of conservation value in Northern Portugal. Results show that protected areas will likely suffer higher pressure from all three Acacia species under future climatic conditions. Due to this higher predicted conflict in protected areas, management might be prioritised for Acacia dealbata and Acacia melanoxylon. Connectivity of AIS suitable areas inside protected areas is currently lower than across the full study area, but this would change under future environmental conditions. Coupled SDM and connectivity analysis can support resource prioritisation for anticipation and monitoring of AIS impacts. However, further tests of this framework over a wide range of regions and organisms are still required before wide application.
Resumo:
The present study compares the higher-level dimensions and the hierarchical structures of the fifth edition of the 16 PF with those of the NEO PI-R. Both inventories measure personality according to five higher-level dimensions. These inventories were however constructed according to different methods (bottom-up vs. top-down). 386 participants filled out both questionnaires. Correlations, regressions and canonical correlations made it possible to compare the inventories. As expected they roughly measure the same aspects of personality. There is a coherent association among four of the five dimensions measured in the tests. However Agreeableness, the remaining dimension in the NEO PI-R, is not represented in the 16 PF 5. Our analyses confirmed the hierarchical structures of both instruments, but this confirmation was more complete in the case of the NEO PI-R. Indeed, a parallel analysis indicated that a four-factor solution should be considered in the case of the 16 PF 5. On the other hand, the NEO PI-R's five-factor solution was confirmed. The top-down construction of this instrument seems to make for a more legible structure. Of the two five-dimension constructs, the NEO PI-R thus seems the more reliable. This confirms the relevance of the Five Factor Model of personality.
Resumo:
A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .
Resumo:
Rare species have restricted geographic ranges, habitat specialization, and/or small population sizes. Datasets on rare species distribution usually have few observations, limited spatial accuracy and lack of valid absences; conversely they provide comprehensive views of species distributions allowing to realistically capture most of their realized environmental niche. Rare species are the most in need of predictive distribution modelling but also the most difficult to model. We refer to this contrast as the "rare species modelling paradox" and propose as a solution developing modelling approaches that deal with a sufficiently large set of predictors, ensuring that statistical models aren't overfitted. Our novel approach fulfils this condition by fitting a large number of bivariate models and averaging them with a weighted ensemble approach. We further propose that this ensemble forecasting is conducted within a hierarchic multi-scale framework. We present two ensemble models for a test species, one at regional and one at local scale, each based on the combination of 630 models. In both cases, we obtained excellent spatial projections, unusual when modelling rare species. Model results highlight, from a statistically sound approach, the effects of multiple drivers in a same modelling framework and at two distinct scales. From this added information, regional models can support accurate forecasts of range dynamics under climate change scenarios, whereas local models allow the assessment of isolated or synergistic impacts of changes in multiple predictors. This novel framework provides a baseline for adaptive conservation, management and monitoring of rare species at distinct spatial and temporal scales.
Resumo:
BACKGROUND: Little is known about engagement in multiple health behaviours in childhood cancer survivors. METHODS: Using latent class analysis, we identified health behaviour patterns in 835 adult survivors of childhood cancer (age 20-35 years) and 1670 age- and sex-matched controls from the general population. Behaviour groups were determined from replies to questions on smoking, drinking, cannabis use, sporting activities, diet, sun protection and skin examination. RESULTS: The model identified four health behaviour patterns: 'risk-avoidance', with a generally healthy behaviour; 'moderate drinking', with higher levels of sporting activities, but moderate alcohol-consumption; 'risk-taking', engaging in several risk behaviours; and 'smoking', smoking but not drinking. Similar proportions of survivors and controls fell into the 'risk-avoiding' (42% vs 44%) and the 'risk-taking' cluster (14% vs 12%), but more survivors were in the 'moderate drinking' (39% vs 28%) and fewer in the 'smoking' cluster (5% vs 16%). Determinants of health behaviour clusters were gender, migration background, income and therapy. CONCLUSION: A comparable proportion of childhood cancer survivors as in the general population engage in multiple health-compromising behaviours. Because of increased vulnerability of survivors, multiple risk behaviours should be addressed in targeted health interventions.
Resumo:
Analysis of variance is commonly used in morphometry in order to ascertain differences in parameters between several populations. Failure to detect significant differences between populations (type II error) may be due to suboptimal sampling and lead to erroneous conclusions; the concept of statistical power allows one to avoid such failures by means of an adequate sampling. Several examples are given in the morphometry of the nervous system, showing the use of the power of a hierarchical analysis of variance test for the choice of appropriate sample and subsample sizes. In the first case chosen, neuronal densities in the human visual cortex, we find the number of observations to be of little effect. For dendritic spine densities in the visual cortex of mice and humans, the effect is somewhat larger. A substantial effect is shown in our last example, dendritic segmental lengths in monkey lateral geniculate nucleus. It is in the nature of the hierarchical model that sample size is always more important than subsample size. The relative weight to be attributed to subsample size thus depends on the relative magnitude of the between observations variance compared to the between individuals variance.
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
To study the stress-induced effects caused by wounding under a new perspective, a metabolomic strategy based on HPLC-MS has been devised for the model plant Arabidopsis thaliana. To detect induced metabolites and precisely localise these compounds among the numerous constitutive metabolites, HPLC-MS analyses were performed in a two-step strategy. In a first step, rapid direct TOF-MS measurements of the crude leaf extract were performed with a ballistic gradient on a short LC-column. The HPLC-MS data were investigated by multivariate analysis as total mass spectra (TMS). Principal components analysis (PCA) and hierarchical cluster analysis (HCA) on principal coordinates were combined for data treatment. PCA and HCA demonstrated a clear clustering of plant specimens selecting the highest discriminating ions given by the complete data analysis, leading to the specific detection of discrete-induced ions (m/z values). Furthermore, pool constitution with plants of homogeneous behaviour was achieved for confirmatory analysis. In this second step, long high-resolution LC profilings on an UPLC-TOF-MS system were used on pooled samples. This allowed to precisely localise the putative biological marker induced by wounding and by specific extraction of accurate m/z values detected in the screening procedure with the TMS spectra.
Resumo:
T cell receptor (TCR-CD3) triggering involves both receptor clustering and conformational changes at the cytoplasmic tails of the CD3 subunits. The mechanism by which TCRalphabeta ligand binding confers conformational changes to CD3 is unknown. By using well-defined ligands, we showed that induction of the conformational change requires both multivalent engagement and the mobility restriction of the TCR-CD3 imposed by the plasma membrane. The conformational change is elicited by cooperative rearrangements of two TCR-CD3 complexes and does not require accompanying changes in the structure of the TCRalphabeta ectodomains. This conformational change at CD3 reverts upon ligand dissociation and is required for T cell activation. Thus, our permissive geometry model provides a molecular mechanism that rationalizes how the information of ligand binding to TCRalphabeta is transmitted to the CD3 subunits and to the intracellular signaling machinery.
Resumo:
OBJECTIVE: Hierarchical modeling has been proposed as a solution to the multiple exposure problem. We estimate associations between metabolic syndrome and different components of antiretroviral therapy using both conventional and hierarchical models. STUDY DESIGN AND SETTING: We use discrete time survival analysis to estimate the association between metabolic syndrome and cumulative exposure to 16 antiretrovirals from four drug classes. We fit a hierarchical model where the drug class provides a prior model of the association between metabolic syndrome and exposure to each antiretroviral. RESULTS: One thousand two hundred and eighteen patients were followed for a median of 27 months, with 242 cases of metabolic syndrome (20%) at a rate of 7.5 cases per 100 patient years. Metabolic syndrome was more likely to develop in patients exposed to stavudine, but was less likely to develop in those exposed to atazanavir. The estimate for exposure to atazanavir increased from hazard ratio of 0.06 per 6 months' use in the conventional model to 0.37 in the hierarchical model (or from 0.57 to 0.81 when using spline-based covariate adjustment). CONCLUSION: These results are consistent with trials that show the disadvantage of stavudine and advantage of atazanavir relative to other drugs in their respective classes. The hierarchical model gave more plausible results than the equivalent conventional model.
Resumo:
Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints.
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
The analysis of rockfall characteristics and spatial distribution is fundamental to understand and model the main factors that predispose to failure. In our study we analysed LiDAR point clouds aiming to: (1) detect and characterise single rockfalls; (2) investigate their spatial distribution. To this end, different cluster algorithms were applied: 1a) Nearest Neighbour Clutter Removal (NNCR) in combination with the Expectation?Maximization (EM) in order to separate feature points from clutter; 1b) a density based algorithm (DBSCAN) was applied to isolate the single clusters (i.e. the rockfall events); 2) finally we computed the Ripley's K-function to investigate the global spatial pattern of the extracted rockfalls. The method allowed proper identification and characterization of more than 600 rockfalls occurred on a cliff located in Puigcercos (Catalonia, Spain) during a time span of six months. The spatial distribution of these events proved that rockfall were clustered distributed at a welldefined distance-range. Computations were carried out using R free software for statistical computing and graphics. The understanding of the spatial distribution of precursory rockfalls may shed light on the forecasting of future failures.