925 resultados para probabilistic principal component analysis (probabilistic PCA)
Resumo:
Notre consommation en eau souterraine, en particulier comme eau potable ou pour l'irrigation, a considérablement augmenté au cours des années. De nombreux problèmes font alors leur apparition, allant de la prospection de nouvelles ressources à la remédiation des aquifères pollués. Indépendamment du problème hydrogéologique considéré, le principal défi reste la caractérisation des propriétés du sous-sol. Une approche stochastique est alors nécessaire afin de représenter cette incertitude en considérant de multiples scénarios géologiques et en générant un grand nombre de réalisations géostatistiques. Nous rencontrons alors la principale limitation de ces approches qui est le coût de calcul dû à la simulation des processus d'écoulements complexes pour chacune de ces réalisations. Dans la première partie de la thèse, ce problème est investigué dans le contexte de propagation de l'incertitude, oú un ensemble de réalisations est identifié comme représentant les propriétés du sous-sol. Afin de propager cette incertitude à la quantité d'intérêt tout en limitant le coût de calcul, les méthodes actuelles font appel à des modèles d'écoulement approximés. Cela permet l'identification d'un sous-ensemble de réalisations représentant la variabilité de l'ensemble initial. Le modèle complexe d'écoulement est alors évalué uniquement pour ce sousensemble, et, sur la base de ces réponses complexes, l'inférence est faite. Notre objectif est d'améliorer la performance de cette approche en utilisant toute l'information à disposition. Pour cela, le sous-ensemble de réponses approximées et exactes est utilisé afin de construire un modèle d'erreur, qui sert ensuite à corriger le reste des réponses approximées et prédire la réponse du modèle complexe. Cette méthode permet de maximiser l'utilisation de l'information à disposition sans augmentation perceptible du temps de calcul. La propagation de l'incertitude est alors plus précise et plus robuste. La stratégie explorée dans le premier chapitre consiste à apprendre d'un sous-ensemble de réalisations la relation entre les modèles d'écoulement approximé et complexe. Dans la seconde partie de la thèse, cette méthodologie est formalisée mathématiquement en introduisant un modèle de régression entre les réponses fonctionnelles. Comme ce problème est mal posé, il est nécessaire d'en réduire la dimensionnalité. Dans cette optique, l'innovation du travail présenté provient de l'utilisation de l'analyse en composantes principales fonctionnelles (ACPF), qui non seulement effectue la réduction de dimensionnalités tout en maximisant l'information retenue, mais permet aussi de diagnostiquer la qualité du modèle d'erreur dans cet espace fonctionnel. La méthodologie proposée est appliquée à un problème de pollution par une phase liquide nonaqueuse et les résultats obtenus montrent que le modèle d'erreur permet une forte réduction du temps de calcul tout en estimant correctement l'incertitude. De plus, pour chaque réponse approximée, une prédiction de la réponse complexe est fournie par le modèle d'erreur. Le concept de modèle d'erreur fonctionnel est donc pertinent pour la propagation de l'incertitude, mais aussi pour les problèmes d'inférence bayésienne. Les méthodes de Monte Carlo par chaîne de Markov (MCMC) sont les algorithmes les plus communément utilisés afin de générer des réalisations géostatistiques en accord avec les observations. Cependant, ces méthodes souffrent d'un taux d'acceptation très bas pour les problèmes de grande dimensionnalité, résultant en un grand nombre de simulations d'écoulement gaspillées. Une approche en deux temps, le "MCMC en deux étapes", a été introduite afin d'éviter les simulations du modèle complexe inutiles par une évaluation préliminaire de la réalisation. Dans la troisième partie de la thèse, le modèle d'écoulement approximé couplé à un modèle d'erreur sert d'évaluation préliminaire pour le "MCMC en deux étapes". Nous démontrons une augmentation du taux d'acceptation par un facteur de 1.5 à 3 en comparaison avec une implémentation classique de MCMC. Une question reste sans réponse : comment choisir la taille de l'ensemble d'entrainement et comment identifier les réalisations permettant d'optimiser la construction du modèle d'erreur. Cela requiert une stratégie itérative afin que, à chaque nouvelle simulation d'écoulement, le modèle d'erreur soit amélioré en incorporant les nouvelles informations. Ceci est développé dans la quatrième partie de la thèse, oú cette méthodologie est appliquée à un problème d'intrusion saline dans un aquifère côtier. -- Our consumption of groundwater, in particular as drinking water and for irrigation, has considerably increased over the years and groundwater is becoming an increasingly scarce and endangered resource. Nofadays, we are facing many problems ranging from water prospection to sustainable management and remediation of polluted aquifers. Independently of the hydrogeological problem, the main challenge remains dealing with the incomplete knofledge of the underground properties. Stochastic approaches have been developed to represent this uncertainty by considering multiple geological scenarios and generating a large number of realizations. The main limitation of this approach is the computational cost associated with performing complex of simulations in each realization. In the first part of the thesis, we explore this issue in the context of uncertainty propagation, where an ensemble of geostatistical realizations is identified as representative of the subsurface uncertainty. To propagate this lack of knofledge to the quantity of interest (e.g., the concentration of pollutant in extracted water), it is necessary to evaluate the of response of each realization. Due to computational constraints, state-of-the-art methods make use of approximate of simulation, to identify a subset of realizations that represents the variability of the ensemble. The complex and computationally heavy of model is then run for this subset based on which inference is made. Our objective is to increase the performance of this approach by using all of the available information and not solely the subset of exact responses. Two error models are proposed to correct the approximate responses follofing a machine learning approach. For the subset identified by a classical approach (here the distance kernel method) both the approximate and the exact responses are knofn. This information is used to construct an error model and correct the ensemble of approximate responses to predict the "expected" responses of the exact model. The proposed methodology makes use of all the available information without perceptible additional computational costs and leads to an increase in accuracy and robustness of the uncertainty propagation. The strategy explored in the first chapter consists in learning from a subset of realizations the relationship between proxy and exact curves. In the second part of this thesis, the strategy is formalized in a rigorous mathematical framework by defining a regression model between functions. As this problem is ill-posed, it is necessary to reduce its dimensionality. The novelty of the work comes from the use of functional principal component analysis (FPCA), which not only performs the dimensionality reduction while maximizing the retained information, but also allofs a diagnostic of the quality of the error model in the functional space. The proposed methodology is applied to a pollution problem by a non-aqueous phase-liquid. The error model allofs a strong reduction of the computational cost while providing a good estimate of the uncertainty. The individual correction of the proxy response by the error model leads to an excellent prediction of the exact response, opening the door to many applications. The concept of functional error model is useful not only in the context of uncertainty propagation, but also, and maybe even more so, to perform Bayesian inference. Monte Carlo Markov Chain (MCMC) algorithms are the most common choice to ensure that the generated realizations are sampled in accordance with the observations. Hofever, this approach suffers from lof acceptance rate in high dimensional problems, resulting in a large number of wasted of simulations. This led to the introduction of two-stage MCMC, where the computational cost is decreased by avoiding unnecessary simulation of the exact of thanks to a preliminary evaluation of the proposal. In the third part of the thesis, a proxy is coupled to an error model to provide an approximate response for the two-stage MCMC set-up. We demonstrate an increase in acceptance rate by a factor three with respect to one-stage MCMC results. An open question remains: hof do we choose the size of the learning set and identify the realizations to optimize the construction of the error model. This requires devising an iterative strategy to construct the error model, such that, as new of simulations are performed, the error model is iteratively improved by incorporating the new information. This is discussed in the fourth part of the thesis, in which we apply this methodology to a problem of saline intrusion in a coastal aquifer.
Resumo:
Aim To disentangle the effects of environmental and geographical processes driving phylogenetic distances among clades of maritime pine (Pinus pinaster). To assess the implications for conservation management of combining molecular information with species distribution models (SDMs; which predict species distribution based on known occurrence records and on environmental variables). Location Western Mediterranean Basin and European Atlantic coast. Methods We undertook two cluster analyses for eight genetically defined pine clades based on climatic niche and genetic similarities. We assessed niche similarity by means of a principal component analysis and Schoener's D metric. To calculate genetic similarity, we used the unweighted pair group method with arithmetic mean based on Nei's distance using 266 single nucleotide polymorphisms. We then assessed the contribution of environmental and geographical distances to phylogenetic distance by means of Mantel regression with variance partitioning. Finally, we compared the projection obtained from SDMs fitted from the species level (SDMsp) and composed from the eight clade-level models (SDMcm). Results Genetically and environmentally defined clusters were identical. Environmental and geographical distances explained 12.6% of the phylogenetic distance variation and, overall, geographical and environmental overlap among clades was low. Large differences were detected between SDMsp and SDMcm (57.75% of disagreement in the areas predicted as suitable). Main conclusions The genetic structure within the maritime pine subspecies complex is primarily a consequence of its demographic history, as seen by the high proportion of unexplained variation in phylogenetic distances. Nevertheless, our results highlight the contribution of local environmental adaptation in shaping the lower-order, phylogeographical distribution patterns and spatial genetic structure of maritime pine: (1) genetically and environmentally defined clusters are consistent, and (2) environment, rather than geography, explained a higher proportion of variation in phylogenetic distance. SDMs, key tools in conservation management, better characterize the fundamental niche of the species when they include molecular information.
Resumo:
Macrofossil analysis of a composite 19 m long sediment core from Rano Raraku Lake (Easter Island)was related to litho-sedimentary and geochemical features of the sediment. Strong stratigraphical patterns are shown by indirect gradient analyses of the data. The good correspondence between the stratigraphical patterns derived from macrofossil (Correspondence Analysis) and sedimentary and geochemical data (Principal Component Analysis) shows that macrofossil associations provide sound palaeolimnological information in conjunction with sedimentary data. The main taphonomic factors in fluencing the macrofossil assemblages are run-off from the catchment, the littoral plant belt, and the depositional environment within the basin. Five main stages during the last 34,000 calibrated years BP (cal yr BP) are characterised from the lithological, geochemical, and macrofossil data. From 34 to 14.6 cal kyr BP (last glacial period) the sediments were largely derived from the catchment, indicating a high energy lake environment with much erosion and run-off bringing abundant plant trichomes, lichens, and mosses into the centre of Raraku Lake.
Resumo:
The modern technological ability to handle large amounts of information confronts the chemist with the necessity to re-evaluate the statistical tools he routinely uses. Multivariate statistics furnishes theoretical bases for analyzing systems involving large numbers of variables. The mathematical calculations required for these systems are no longer an obstacle due to the existence of statistical packages that furnish multivariate analysis options. Here basic concepts of two multivariate statistical techniques, principal component and hierarchical cluster analysis that have received broad acceptance for treating chemical data are discussed.
Resumo:
The input of heavy metals concentrations determinated by ICP-AES, in samples of the Cambé river basin, was evaluated by using the Principal Component Analysis. The results distinguishes clearly one site, which is strongly influenced by almost all elements studied. Special attention was given to Pb, because of the presence of one battery industry in this area. Some downstream samples were associated with the same characteristics of this site, showing residual action of contaminants along the basin. Other sites presented influence of soil elements, plus Cr near a tannery industry. This study allowed to distinguish different sites in the upper basin of the Cambé (Londrina-PR-BR), in accordance to elements input.
Resumo:
This work analyzes sunshine duration variability in the western part of Europe (WEU) over the 1938– 2004 period. A principal component analysis is applied to cluster the original series from 79 sites into 6 regions, and then annual and seasonal mean series are constructed on regional and also for the whole WEU scales. Over the entire period studied here, the linear trend of annual sunshine duration is found to be nonsignificant. However, annual sunshine duration shows an overall decrease since the 1950s until the early 1980s, followed by a subsequent recovery during the last two decades. This behavior is in good agreement with the dimming and brightening phenomena described in previous literature. From the seasonal analysis, the most remarkable result is the similarity between spring and annual series, although the spring series has a negative trend; and the clear significant increase found for the whole WEU winter series, being especially large since the 1970s. The behavior of the major synoptic patterns for two seasons is investigated, resulting in some indications that sunshine duration evolution may be partially explained by changes in the frequency of some of them
Resumo:
Hoitotyön laatu - lasten näkökulma Tämän kolmivaiheisen tutkimuksen tarkoituksena oli kuvailla lasten odotuksia ja arviointeja lasten hoitotyön laadusta sekä kehittää mittari kouluikäisille sairaalassa oleville lapsille laadun arviointiin. Perimmäisenä tavoitteena oli lasten hoitotyön laadun kehittäminen sairaalassa. Ensimmäisessä vaiheessa 20 alle kouluikäistä (4-6v) sekä 20 kouluikäistä (7-11v) lasta kuvailivat odotuksiaan lasten hoitotyön laadusta. Aineisto kerättiin haastattelulla ja lasten piirustusten avulla, sekä analysoitiin sisällön analyysilla. Lasten odotukset lasten hoitotyön laadusta kohdistuivat hoitajaan, hoitotyön toimintoihin ja ympäristöön, fyysinen ympäristö korostui piirustuksissa. Ensimmäisen vaiheen tulosten, aikaisemman kirjallisuuden sekä Leino-Kilven “HYVÄ HOITO” mittarin pohjalta kehitettiin “Lasten Hoidon Laatu Sairaalassa” (LHLS) mittari ja testattiin sen psykometrisiä ominaisuuksia tutkimuksen toisessa vaiheessa. Mittaria kehitettiin ja testattiin kolmen vaiheen kautta. Aluksi asiantuntijapaneeli (n=7) arvioi mittarin sisältöä. Seuraavaksi mittari esitestattiin kahdesti kouluikäisillä sairaalassa olevilla lapsilla (n=41 ja n=16), samassa vaiheessa myös viiden lastenosaston hoitajat (n=19) yhdessä arvioivat mittarin sisältöä sekä 8 lasta. Lopuksi mittaria testattiin kouluikäisillä lapsilla (n=388) sairaalassa sekä hoitajat (n=198) arvioivat mittarin sisällön validiteettia. Mittarin kehittämisen aikana päälaatuluokkien: hoitajan ominaisuudet, hoitotyön toiminnot ja hoitotyön ympäristö Cronbachin alfa kertoimet paranivat. Pääkomponentti analyysi tuki mittarin hoitotyön toimintojen ja ympäristön alaluokkien teoreettista rakennetta. Kolmannessa vaiheessa “Lasten Hoidon Laatu Sairaalassa” (LHLS III, versio neljä) mittarilla kerättiin aineisto Suomen yliopistosairaaloiden lastenosastoilta kouluikäisiltä 7-11 -vuotiailta lapsilta (n=388). Mittarin lopussa lapsia pyydettiin lisäksi kuvailemaan kivointa ja ikävintä kokemustaan sairaalahoidon aikana lauseen täydennystehtävänä. Aineisto analysoitiin tilastollisesti sekä sisällön analyysilla. Lapset arvioivat fyysisen hoitoympäristön, hoitajien inhimillisyyden ja luotettavuuden sekä huolenpidon ja vuorovaikutustoiminnot kiitettäviksi. Lapset arvioivat hoitajien viihdyttämistoiminnot kaikkein alhaisimmiksi. Lapsen ikä ja sairaalantulotapa olivat yhteydessä lasten saamaan tiedon määrään. Lasten kivoimmat kokemukset liittyivät ihmisiin ja heidän ominaisuuksiinsa, toimintoihin, ympäristöön sekä lopputuloksiin. Ikävimmät kokemukset liittyivät potilaana oloon, tuntemuksiin sairauden oireista sekä erossaoloon, hoitotyön fyysisiin toimintoihin sekä ympäristöön. Tutkimuksen tulokset osoittavat lasten olevan kykeneviä arvioimaan omaa hoitoaan ja heidän näkökulmansa tulisi nähdä osana koko laadun kehittämisprosessia parannettaessa laatua käytännössä todella lapsilähtöisemmällä lähestymistavalla. “Lasten Hoidon Laatu Sairaalassa” (LHLS) mittari on mahdollinen väline saada tietoa lasten arvioinneista lasten hoitotyön laadusta, mutta mittarin testaamista tulisi jatkaa tulevaisuudessa
Resumo:
Soils play an important role in the biogeochemical cycle of mercury as a sink for and source of this metallic species to atmospheric and hydrological compartments. In the study reported here, various types of soil were evaluated to ascertain the influence of parameters such as pH, organic matter content, Fe, Al, sand, silt, clay, C/H, C/N, C/O atomic ratios, and cation exchange capacity on the distribution of Hg in Amazonia's mid-Negro River basin. The data obtained were interpreted by multivariate exploratory analyses (hierarchical cluster analysis and principal component analysis), which indicated that organic matter plays an important role in mercury uptake in the various soils studied. The soils in floodable areas were found to contain 1.5 to 2.8-fold higher Hg concentrations than those in non-floodable areas. Since these soils are flooded almost year-round, they are less available to participate in redox processes at the soil/atmosphere interface. Hence, floodable areas, which comprise humic-rich soils, accumulate more mercury than non-floodable soils, thus playing an important role in the biogeochemical cycle of Hg in Amazonia's mid-Negro River basin.
Resumo:
Water quality was monitored at the upper course of the Rio das Velhas, a major tributary of the São Francisco basin located in the state of Minas Gerais, over an extension of 108 km from its source up to the limits with the Sabara district. Monitoring was done at 37 different sites over a period of 2 years (2003-2004) for 39 parameters. Multivariate statistical techniques were applied to interpret the large water-quality data set and to establish an optimal long-term monitoring network. Cluster analysis separated the sampling sites into groups of similarity, and also indicated the stations investigated for correlation and recommended to be removed from the monitoring network. Principal component analysis identified four components, which are responsible for the data structure explaining 80% of the total variance of the data. The principal parameters are characterized as due to mining activities and domestic sewage. Significant data reduction was achieved.
Resumo:
Soil organic matter (SOM) plays an important role in physical, chemical and biological properties of soil. Therefore, the amount of SOM is important for soil management for sustainable agriculture. The objective of this work was to evaluate the amount of SOM in oxisols by different methods and compare them, using principal component analysis, regarding their limitations. The methods used in this work were Walkley-Black, elemental analysis, total organic carbon (TOC) and thermogravimetry. According to our results, TOC and elemental analysis were the most satisfactory methods for carbon quantification, due to their better accuracy and reproducibility.
Resumo:
The Brazilian legislation requires analysis of certain parameters to classify a wine and allow its commercialization. Some physico-chemical and some color parameters were determined in this work in samples of different red wines sold in the metropolitan area of Recife. Multivariate analysis comprising principal component analysis and hierarchical cluster analysis was employed to distinguish the analyzed wines. The results for pH, chloride concentration, color parameters and ammonium content were the most important variables for sample classification. It was also possible to classify the wines as soft or dry wines and amongst the soft wines we could determine two out of four winegrowing producers.
Resumo:
The spatial and temporal retention of metals has been studied in water and sediments of the Gavião River, Anagé and Tremedal Reservoirs, located in the semi-arid region, Bahia - Brazil, in order to identify trends in the fluxes of metals from the sediments to the water column. The determination of metals was made by ICP OES and ET AAS. The application of statistical methods showed that this aquatic system presents suitable conditions to move Cd2+ and Pb2+ from the water column to the sediment.
Resumo:
This study examined the spatial and temporal variations of 13 physico-chemical parameters in water and sediment samples collected along the rural and urban section of Verruga Stream. The metal concentrations were determined by FAAS. The conductivity and the concentration of Na+, Cl-, and Ca2+ showed the largest variations in the urban area demonstrating that these parameters are appropriate indicators of urban contamination. The application of cluster and principal component analysis showed that the Cd2+ and Mn2+ are associated with the use of fertilizers in the rural area.
Resumo:
This work shows results on the characterization, by liquid chromatography coupled to high resolution tandem mass spectrometry (LC-IT-TOF-MS) with electrospray ionization, of organic compounds present in raw and treated effluents from a combined sewage treatment systems (upflow anaerobic sludge blanket-trickling filter). The sewage samples were prepared by C18 solid phase extraction and the spectra obtained from the various extracts were submitted to principal component analysis to evaluate their pattern and identify the major deprotonated species. Some target compounds were submitted to semiquantitative analysis, using phenolphtalein as internal standard. The results showed the anaerobic step had little impact on the removal of anionic surfactants (LAS), fatty acids, and some contaminantes such as bisphenol A and bezafibrate, whereas the aerobic post-treatment was very efficient in removing these organics.
Resumo:
This paper presents the analytical application of a novel electronic tongue based on voltammetric sensors array. This device was used in the classification of wines aged in barrels of different origins and toasting levels. Furthermore, a study of correlation between the response of the electronic tongue and the sensory and chemical characterization of samples was carried out. The results were evaluated by applying both principal component analysis and cluster analysis. The samples were clearly classified. Their distribution showed a high correspondence degree with the characteristics of the analyzed wines, it also showed similarity with the classification obtained from organoleptic analysis.