978 resultados para NIRS. Plum. Multivariate calibration. Variables selection


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study subdivides the Weddell Sea, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis uses 28 environmental variables for the sea surface, 25 variables for the seabed and 9 variables for the analysis between surface and bottom variables. The data were taken during the years 1983-2013. Some data were interpolated. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared for the identification of the most reasonable method. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested. For the seabed 8 and 12 clusters were identified as reasonable numbers for clustering the Weddell Sea. For the sea surface the numbers 8 and 13 and for the top/bottom analysis 8 and 3 were identified, respectively. Additionally, the results of 20 clusters are presented for the three alternatives offering the first small scale environmental regionalization of the Weddell Sea. Especially the results of 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Key life history traits such as breeding time and clutch size are frequently both heritable and under directional selection, yet many studies fail to document micro-evolutionary responses. One general explanation is that selection estimates are biased by the omission of correlated traits that have causal effects on fitness, but few valid tests of this exist. Here we show, using a quantitative genetic framework and six decades of life-history data on two free-living populations of great tits Parus major, that selection estimates for egg-laying date and clutch size are relatively unbiased. Predicted responses to selection based on the Robertson-Price Identity were similar to those based on the multivariate breeder’s equation, indicating that unmeasured covarying traits were not missing from the analysis. Changing patterns of phenotypic selection on these traits (for laying date, linked to climate change) therefore reflect changing selection on breeding values, and genetic constraints appear not to limit their independent evolution. Quantitative genetic analysis of correlational data from pedigreed populations can be a valuable complement to experimental approaches to help identify whether apparent associations between traits and fitness are biased by missing traits, and to parse the roles of direct versus indirect selection across a range of environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Topographic variation, the spatial variation in elevation and terrain features, underpins a myriad of patterns and processes in geography and ecology and is key to understanding the variation of life on the planet. The characterization of this variation is scale-dependent, i.e. it varies with the distance over which features are assessed and with the spatial grain (grid cell resolution) of analysis. A fully standardized and global multivariate product of different terrain features has the potential to support many large-scale basic research and analytical applications, however to date, such technique is unavailable. Here we used the digital elevation model products of global 250 m GMTED and near-global 90 m SRTM to derive a suite of topographic variables: elevation, slope, aspect, eastness, northness, roughness, terrain roughness index, topographic position index, vector ruggedness measure, profile and tangential curvature, and 10 geomorphological landform classes. We aggregated each variable to 1, 5, 10, 50 and 100 km spatial grains using several aggregation approaches (median, average, minimum, maximum, standard deviation, percent cover, count, majority, Shannon Index, entropy, uniformity). While a global cross-correlation underlines the high similarity of many variables, a more detailed view in four mountain regions reveals local differences, as well as scale variations in the aggregated variables at different spatial grains. All newly-developed variables are available for download at http://www.earthenv.org and can serve as a basis for standardized hydrological, environmental and biodiversity modeling at a global extent.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A field experiment was conducted on a real continuous steel Gerber-truss bridge with artificial damage applied. This article summarizes the results of the experiment for bridge damage detection utilizing traffic-induced vibrations. It investigates the sensitivities of a number of quantities to bridge damage including the identified modal parameters and their statistical patterns, Nair’s damage indicator and its statistical pattern and different sets of measurement points. The modal parameters are identified by autoregressive time-series models. The decision on bridge health condition is made and the sensitivity of variables is evaluated with the aid of the Mahalanobis–Taguchi system, a multivariate pattern recognition tool. Several observations are made as follows. For the modal parameters, although bridge damage detection can be achieved by performing Mahalanobis–Taguchi system on certain modal parameters of certain sets of measurement points, difficulties were faced in subjective selection of meaningful bridge modes and low sensitivity of the statistical pattern of the modal parameters to damage. For Nair’s damage indicator, bridge damage detection could be achieved by performing Mahalanobis–Taguchi system on Nair’s damage indicators of most sets of measurement points. As a damage indicator, Nair’s damage indicator was superior to the modal parameters. Three main advantages were observed: it does not require any subjective decision in calculating Nair’s damage indicator, thus potential human errors can be prevented and an automatic detection task can be achieved; its statistical pattern has high sensitivity to damage and, finally, it is flexible regarding the choice of sets of measurement points.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To maintain the pace of development set by Moore's law, production processes in semiconductor manufacturing are becoming more and more complex. The development of efficient and interpretable anomaly detection systems is fundamental to keeping production costs low. As the dimension of process monitoring data can become extremely high anomaly detection systems are impacted by the curse of dimensionality, hence dimensionality reduction plays an important role. Classical dimensionality reduction approaches, such as Principal Component Analysis, generally involve transformations that seek to maximize the explained variance. In datasets with several clusters of correlated variables the contributions of isolated variables to explained variance may be insignificant, with the result that they may not be included in the reduced data representation. It is then not possible to detect an anomaly if it is only reflected in such isolated variables. In this paper we present a new dimensionality reduction technique that takes account of such isolated variables and demonstrate how it can be used to build an interpretable and robust anomaly detection system for Optical Emission Spectroscopy data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La stratégie actuelle de contrôle de la qualité de l’anode est inadéquate pour détecter les anodes défectueuses avant qu’elles ne soient installées dans les cuves d’électrolyse. Des travaux antérieurs ont porté sur la modélisation du procédé de fabrication des anodes afin de prédire leurs propriétés directement après la cuisson en utilisant des méthodes statistiques multivariées. La stratégie de carottage des anodes utilisée à l’usine partenaire fait en sorte que ce modèle ne peut être utilisé que pour prédire les propriétés des anodes cuites aux positions les plus chaudes et les plus froides du four à cuire. Le travail actuel propose une stratégie pour considérer l’histoire thermique des anodes cuites à n’importe quelle position et permettre de prédire leurs propriétés. Il est montré qu’en combinant des variables binaires pour définir l’alvéole et la position de cuisson avec les données routinières mesurées sur le four à cuire, les profils de température des anodes cuites à différentes positions peuvent être prédits. Également, ces données ont été incluses dans le modèle pour la prédiction des propriétés des anodes. Les résultats de prédiction ont été validés en effectuant du carottage supplémentaire et les performances du modèle sont concluantes pour la densité apparente et réelle, la force de compression, la réactivité à l’air et le Lc et ce peu importe la position de cuisson.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La gestion intégrée de la ressource en eau implique de distinguer les parcours de l’eau qui sont accessibles aux sociétés de ceux qui ne le sont pas. Les cheminements de l’eau sont nombreux et fortement variables d’un lieu à l’autre. Il est possible de simplifier cette question en s’attardant plutôt aux deux destinations de l’eau. L’eau bleue forme les réserves et les flux dans l’hydrosystème : cours d’eau, nappes et écoulements souterrains. L’eau verte est le flux invisible de vapeur d’eau qui rejoint l’atmosphère. Elle inclut l’eau consommée par les plantes et l’eau dans les sols. Or, un grand nombre d’études ne portent que sur un seul type d’eau bleue, en ne s’intéressant généralement qu’au devenir des débits ou, plus rarement, à la recharge des nappes. Le portrait global est alors manquant. Dans un même temps, les changements climatiques viennent impacter ce cheminement de l’eau en faisant varier de manière distincte les différents composants de cycle hydrologique. L’étude réalisée ici utilise l’outil de modélisation SWAT afin de réaliser le suivi de toutes les composantes du cycle hydrologique et de quantifier l’impact des changements climatiques sur l’hydrosystème du bassin versant de la Garonne. Une première partie du travail a permis d’affiner la mise en place du modèle pour répondre au mieux à la problématique posée. Un soin particulier a été apporté à l’utilisation de données météorologiques sur grille (SAFRAN) ainsi qu’à la prise en compte de la neige sur les reliefs. Le calage des paramètres du modèle a été testé dans un contexte differential split sampling, en calant puis validant sur des années contrastées en terme climatique afin d’appréhender la robustesse de la simulation dans un contexte de changements climatiques. Cette étape a permis une amélioration substantielle des performances sur la période de calage (2000-2010) ainsi que la mise en évidence de la stabilité du modèle face aux changements climatiques. Par suite, des simulations sur une période d’un siècle (1960-2050) ont été produites puis analysées en deux phases : i) La période passée (1960-2000), basée sur les observations climatiques, a servi de période de validation à long terme du modèle sur la simulation des débits, avec de très bonnes performances. L’analyse des différents composants hydrologiques met en évidence un impact fort sur les flux et stocks d’eau verte, avec une diminution de la teneur en eau des sols et une augmentation importante de l’évapotranspiration. Les composantes de l’eau bleue sont principalement perturbées au niveau du stock de neige et des débits qui présentent tous les deux une baisse substantielle. ii) Des projections hydrologiques ont été réalisées (2010-2050) en sélectionnant une gamme de scénarios et de modèles climatiques issus d’une mise à l’échelle dynamique. L’analyse de simulation vient en bonne part confirmer les conclusions tirées de la période passée : un impact important sur l’eau verte, avec toujours une baisse de la teneur en eau des sols et une augmentation de l’évapotranspiration potentielle. Les simulations montrent que la teneur en eau des sols pendant la période estivale est telle qu’elle en vient à réduire les flux d’évapotranspiration réelle, mettant en évidence le possible déficit futur des stocks d’eau verte. En outre, si l’analyse des composantes de l’eau bleue montre toujours une diminution significative du stock de neige, les débits semblent cette fois en hausse pendant l’automne et l’hiver. Ces résultats sont un signe de l’«accélération» des composantes d’eau bleue de surface, probablement en relation avec l’augmentation des évènements extrêmes de précipitation. Ce travail a permis de réaliser une analyse des variations de la plupart des composantes du cycle hydrologique à l’échelle d’un bassin versant, confirmant l’importance de prendre en compte toutes ces composantes pour évaluer l’impact des changements climatiques et plus largement des changements environnementaux sur la ressource en eau.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Denitrification is a microbially-mediated process that converts nitrate (NO3-) to dinitrogen (N2) gas and has implications for soil fertility, climate change, and water quality. Using PCR, qPCR, and T-RFLP, the effects of environmental drivers and land management on the abundance and composition of functional genes were investigated. Environmental variables affecting gene abundance were soil type, soil depth, nitrogen concentrations, soil moisture, and pH, although each gene was unique in its spatial distribution and controlling factors. The inclusion of microbial variables, specifically genotype and gene abundance, improved denitrification models and highlights the benefit of including microbial data in modeling denitrification. Along with some evidence of niche selection, I show that nirS is a good predictor of denitrification enzyme activity (DEA) and N2O:N2 ratio, especially in alkaline and wetland soils. nirK was correlated to N2O production and became a stronger predictor of DEA in acidic soils, indicating that nirK and nirS are not ecologically redundant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Post-discharge mortality is a frequent but poorly recognized contributor to child mortality in resource limited countries. The identification of children at high risk for post-discharge mortality is a critically important first step in addressing this problem. Objectives: The objective of this project was to determine the variables most likely to be associated with post-discharge mortality which are to be included in a prediction modelling study. Methods: A two-round modified Delphi process was completed for the review of a priori selected variables and selection of new variables. Variables were evaluated on relevance according to (1) prediction (2) availability (3) cost and (4) time required for measurement. Participants included experts in a variety of relevant fields. Results: During the first round of the modified Delphi process, 23 experts evaluated 17 variables. Forty further variables were suggested and were reviewed during the second round by 12 experts. During the second round 16 additional variables were evaluated. Thirty unique variables were compiled for use in the prediction modelling study. Conclusion: A systematic approach was utilized to generate an optimal list of candidate predictor variables for the incorporation into a study on prediction of pediatric post-discharge mortality in a resource poor setting.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The general purpose of this work is to describe and analyse the financing phenomenon of crowdfunding and to investigate the relations among crowdfunders, project creators and crowdfunding websites. More specifically, it also intends to describe the profile differences between major crowdfunding platforms, such as Kickstarter and Indiegogo. The findings are supported by literature, gathered from different scientific research papers. In the empirical part, data about Kickstarter and Indiegogo was collected from their websites and also complemented with further data from other statistical websites. For finding out specific information, such as satisfaction of entrepreneurs from both platforms, a satisfaction survey was applied among 200 entrepreneurs from different countries. To identify the profile of users of the Kickstarter and of the Indiegogo platforms, a multivariate analysis was performed, using a Hierarchical Clusters Analysis for each platform under study. Descriptive analysis was used for exploring information about popularity of platforms, average cost and the most popular area of projects, profile of users and future opportunities of platforms. To assess differences between groups, association between variables, and answering to the research hypothesis, an inferential analysis it was applied. The results showed that the Kickstarter and Indiegogo are one of the most popular crowdfunding platforms. Both of them have thousands of users and they are generally satisfied. Each of them uses individual approach for crowdfunders. Despite this, they both could benefit from further improving their services. Furthermore, according the results it was possible to observe that there is a direct and positive relationship between the money needed for the projects and the money collected from the investors for the projects, per platform.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)