912 resultados para forward selection component analysis
Resumo:
This paper proposes a very simple method for increasing the algorithm speed for separating sources from PNL mixtures or invertingWiener systems. The method is based on a pertinent initialization of the inverse system, whose computational cost is very low. The nonlinear part is roughly approximated by pushing the observations to be Gaussian; this method provides a surprisingly good approximation even when the basic assumption is not fully satisfied. The linear part is initialized so that outputs are decorrelated. Experiments shows the impressive speed improvement.
Resumo:
In this present work, we are proposing a characteristics reduction system for a facial biometric identification system, using transformed domains such as discrete cosine transformed (DCT) and discrete wavelets transformed (DWT) as parameterization; and Support Vector Machines (SVM) and Neural Network (NN) as classifiers. The size reduction has been done with Principal Component Analysis (PCA) and with Independent Component Analysis (ICA). This system presents a similar success results for both DWT-SVM system and DWT-PCA-SVM system, about 98%. The computational load is improved on training mode due to the decreasing of input’s size and less complexity of the classifier.
Resumo:
Does Independent Component Analysis (ICA) denature EEG signals? We applied ICA to two groups of subjects (mild Alzheimer patients and control subjects). The aim of this study was to examine whether or not the ICA method can reduce both group di®erences and within-subject variability. We found that ICA diminished Leave-One- Out root mean square error (RMSE) of validation (from 0.32 to 0.28), indicative of the reduction of group di®erence. More interestingly, ICA reduced the inter-subject variability within each group (¾ = 2:54 in the ± range before ICA, ¾ = 1:56 after, Bartlett p = 0.046 after Bonfer- roni correction). Additionally, we present a method to limit the impact of human error (' 13:8%, with 75.6% inter-cleaner agreement) during ICA cleaning, and reduce human bias. These ¯ndings suggests the novel usefulness of ICA in clinical EEG in Alzheimer's disease for reduction of subject variability.
Resumo:
In this paper we propose the use of the independent component analysis (ICA) [1] technique for improving the classification rate of decision trees and multilayer perceptrons [2], [3]. The use of an ICA for the preprocessing stage, makes the structure of both classifiers simpler, and therefore improves the generalization properties. The hypothesis behind the proposed preprocessing is that an ICA analysis will transform the feature space into a space where the components are independent, and aligned to the axes and therefore will be more adapted to the way that a decision tree is constructed. Also the inference of the weights of a multilayer perceptron will be much easier because the gradient search in the weight space will follow independent trajectories. The result is that classifiers are less complex and on some databases the error rate is lower. This idea is also applicable to regression
Resumo:
Background: Differences in the distribution of genotypes between individuals of the same ethnicity are an important confounder factor commonly undervalued in typical association studies conducted in radiogenomics. Objective: To evaluate the genotypic distribution of SNPs in a wide set of Spanish prostate cancer patients for determine the homogeneity of the population and to disclose potential bias. Design, Setting, and Participants: A total of 601 prostate cancer patients from Andalusia, Basque Country, Canary and Catalonia were genotyped for 10 SNPs located in 6 different genes associated to DNA repair: XRCC1 (rs25487, rs25489, rs1799782), ERCC2 (rs13181), ERCC1 (rs11615), LIG4 (rs1805388, rs1805386), ATM (rs17503908, rs1800057) and P53 (rs1042522). The SNP genotyping was made in a Biotrove OpenArrayH NT Cycler. Outcome Measurements and Statistical Analysis: Comparisons of genotypic and allelic frequencies among populations, as well as haplotype analyses were determined using the web-based environment SNPator. Principal component analysis was made using the SnpMatrix and XSnpMatrix classes and methods implemented as an R package. Non-supervised hierarchical cluster of SNP was made using MultiExperiment Viewer. Results and Limitations: We observed that genotype distribution of 4 out 10 SNPs was statistically different among the studied populations, showing the greatest differences between Andalusia and Catalonia. These observations were confirmed in cluster analysis, principal component analysis and in the differential distribution of haplotypes among the populations. Because tumor characteristics have not been taken into account, it is possible that some polymorphisms may influence tumor characteristics in the same way that it may pose a risk factor for other disease characteristics. Conclusion: Differences in distribution of genotypes within different populations of the same ethnicity could be an important confounding factor responsible for the lack of validation of SNPs associated with radiation-induced toxicity, especially when extensive meta-analysis with subjects from different countries are carried out.
Resumo:
Objective: To compare lower incisor dentoalveolar compensation and mandible symphysis morphology among Class I and Class III malocclusion patients with different facial vertical skeletal patterns. Materials and Methods: Lower incisor extrusion and inclination, as well as buccal (LA) and lingual (LP) cortex depth, and mandibular symphysis height (LH) were measured in 107 lateral cephalometric x-rays of adult patients without prior orthodontic treatment. In addition, malocclusion type (Class I or III) and facial vertical skeletal pattern were considered. Through a principal component analysis (PCA) related variables were reduced. Simple regression equation and multivariate analyses of variance were also used. Results: Incisor mandibular plane angle (P < .001) and extrusion (P = .03) values showed significant differences between the sagittal malocclusion groups. Variations in the mandibular plane have a negative correlation with LA (Class I P = .03 and Class III P = .01) and a positive correlation with LH (Class I P = .01 and Class III P = .02) in both groups. Within the Class III group, there was a negative correlation between the mandibular plane and LP (P = .02). PCA showed that the tendency toward a long face causes the symphysis to elongate and narrow. In Class III, alveolar narrowing is also found in normal faces. Conclusions: Vertical facial pattern is a significant factor in mandibular symphysis alveolar morphology and lower incisor positioning, both for Class I and Class III patients. Short-faced Class III patients have a widened alveolar bone. However, for long-faced and normal-faced Class III, natural compensation elongates the symphysis and influences lower incisor position.
Resumo:
Notre consommation en eau souterraine, en particulier comme eau potable ou pour l'irrigation, a considérablement augmenté au cours des années. De nombreux problèmes font alors leur apparition, allant de la prospection de nouvelles ressources à la remédiation des aquifères pollués. Indépendamment du problème hydrogéologique considéré, le principal défi reste la caractérisation des propriétés du sous-sol. Une approche stochastique est alors nécessaire afin de représenter cette incertitude en considérant de multiples scénarios géologiques et en générant un grand nombre de réalisations géostatistiques. Nous rencontrons alors la principale limitation de ces approches qui est le coût de calcul dû à la simulation des processus d'écoulements complexes pour chacune de ces réalisations. Dans la première partie de la thèse, ce problème est investigué dans le contexte de propagation de l'incertitude, oú un ensemble de réalisations est identifié comme représentant les propriétés du sous-sol. Afin de propager cette incertitude à la quantité d'intérêt tout en limitant le coût de calcul, les méthodes actuelles font appel à des modèles d'écoulement approximés. Cela permet l'identification d'un sous-ensemble de réalisations représentant la variabilité de l'ensemble initial. Le modèle complexe d'écoulement est alors évalué uniquement pour ce sousensemble, et, sur la base de ces réponses complexes, l'inférence est faite. Notre objectif est d'améliorer la performance de cette approche en utilisant toute l'information à disposition. Pour cela, le sous-ensemble de réponses approximées et exactes est utilisé afin de construire un modèle d'erreur, qui sert ensuite à corriger le reste des réponses approximées et prédire la réponse du modèle complexe. Cette méthode permet de maximiser l'utilisation de l'information à disposition sans augmentation perceptible du temps de calcul. La propagation de l'incertitude est alors plus précise et plus robuste. La stratégie explorée dans le premier chapitre consiste à apprendre d'un sous-ensemble de réalisations la relation entre les modèles d'écoulement approximé et complexe. Dans la seconde partie de la thèse, cette méthodologie est formalisée mathématiquement en introduisant un modèle de régression entre les réponses fonctionnelles. Comme ce problème est mal posé, il est nécessaire d'en réduire la dimensionnalité. Dans cette optique, l'innovation du travail présenté provient de l'utilisation de l'analyse en composantes principales fonctionnelles (ACPF), qui non seulement effectue la réduction de dimensionnalités tout en maximisant l'information retenue, mais permet aussi de diagnostiquer la qualité du modèle d'erreur dans cet espace fonctionnel. La méthodologie proposée est appliquée à un problème de pollution par une phase liquide nonaqueuse et les résultats obtenus montrent que le modèle d'erreur permet une forte réduction du temps de calcul tout en estimant correctement l'incertitude. De plus, pour chaque réponse approximée, une prédiction de la réponse complexe est fournie par le modèle d'erreur. Le concept de modèle d'erreur fonctionnel est donc pertinent pour la propagation de l'incertitude, mais aussi pour les problèmes d'inférence bayésienne. Les méthodes de Monte Carlo par chaîne de Markov (MCMC) sont les algorithmes les plus communément utilisés afin de générer des réalisations géostatistiques en accord avec les observations. Cependant, ces méthodes souffrent d'un taux d'acceptation très bas pour les problèmes de grande dimensionnalité, résultant en un grand nombre de simulations d'écoulement gaspillées. Une approche en deux temps, le "MCMC en deux étapes", a été introduite afin d'éviter les simulations du modèle complexe inutiles par une évaluation préliminaire de la réalisation. Dans la troisième partie de la thèse, le modèle d'écoulement approximé couplé à un modèle d'erreur sert d'évaluation préliminaire pour le "MCMC en deux étapes". Nous démontrons une augmentation du taux d'acceptation par un facteur de 1.5 à 3 en comparaison avec une implémentation classique de MCMC. Une question reste sans réponse : comment choisir la taille de l'ensemble d'entrainement et comment identifier les réalisations permettant d'optimiser la construction du modèle d'erreur. Cela requiert une stratégie itérative afin que, à chaque nouvelle simulation d'écoulement, le modèle d'erreur soit amélioré en incorporant les nouvelles informations. Ceci est développé dans la quatrième partie de la thèse, oú cette méthodologie est appliquée à un problème d'intrusion saline dans un aquifère côtier. -- Our consumption of groundwater, in particular as drinking water and for irrigation, has considerably increased over the years and groundwater is becoming an increasingly scarce and endangered resource. Nofadays, we are facing many problems ranging from water prospection to sustainable management and remediation of polluted aquifers. Independently of the hydrogeological problem, the main challenge remains dealing with the incomplete knofledge of the underground properties. Stochastic approaches have been developed to represent this uncertainty by considering multiple geological scenarios and generating a large number of realizations. The main limitation of this approach is the computational cost associated with performing complex of simulations in each realization. In the first part of the thesis, we explore this issue in the context of uncertainty propagation, where an ensemble of geostatistical realizations is identified as representative of the subsurface uncertainty. To propagate this lack of knofledge to the quantity of interest (e.g., the concentration of pollutant in extracted water), it is necessary to evaluate the of response of each realization. Due to computational constraints, state-of-the-art methods make use of approximate of simulation, to identify a subset of realizations that represents the variability of the ensemble. The complex and computationally heavy of model is then run for this subset based on which inference is made. Our objective is to increase the performance of this approach by using all of the available information and not solely the subset of exact responses. Two error models are proposed to correct the approximate responses follofing a machine learning approach. For the subset identified by a classical approach (here the distance kernel method) both the approximate and the exact responses are knofn. This information is used to construct an error model and correct the ensemble of approximate responses to predict the "expected" responses of the exact model. The proposed methodology makes use of all the available information without perceptible additional computational costs and leads to an increase in accuracy and robustness of the uncertainty propagation. The strategy explored in the first chapter consists in learning from a subset of realizations the relationship between proxy and exact curves. In the second part of this thesis, the strategy is formalized in a rigorous mathematical framework by defining a regression model between functions. As this problem is ill-posed, it is necessary to reduce its dimensionality. The novelty of the work comes from the use of functional principal component analysis (FPCA), which not only performs the dimensionality reduction while maximizing the retained information, but also allofs a diagnostic of the quality of the error model in the functional space. The proposed methodology is applied to a pollution problem by a non-aqueous phase-liquid. The error model allofs a strong reduction of the computational cost while providing a good estimate of the uncertainty. The individual correction of the proxy response by the error model leads to an excellent prediction of the exact response, opening the door to many applications. The concept of functional error model is useful not only in the context of uncertainty propagation, but also, and maybe even more so, to perform Bayesian inference. Monte Carlo Markov Chain (MCMC) algorithms are the most common choice to ensure that the generated realizations are sampled in accordance with the observations. Hofever, this approach suffers from lof acceptance rate in high dimensional problems, resulting in a large number of wasted of simulations. This led to the introduction of two-stage MCMC, where the computational cost is decreased by avoiding unnecessary simulation of the exact of thanks to a preliminary evaluation of the proposal. In the third part of the thesis, a proxy is coupled to an error model to provide an approximate response for the two-stage MCMC set-up. We demonstrate an increase in acceptance rate by a factor three with respect to one-stage MCMC results. An open question remains: hof do we choose the size of the learning set and identify the realizations to optimize the construction of the error model. This requires devising an iterative strategy to construct the error model, such that, as new of simulations are performed, the error model is iteratively improved by incorporating the new information. This is discussed in the fourth part of the thesis, in which we apply this methodology to a problem of saline intrusion in a coastal aquifer.
Resumo:
PURPOSE: Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. METHODS: The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. RESULTS: The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. CONCLUSION: The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.
Resumo:
Aim To disentangle the effects of environmental and geographical processes driving phylogenetic distances among clades of maritime pine (Pinus pinaster). To assess the implications for conservation management of combining molecular information with species distribution models (SDMs; which predict species distribution based on known occurrence records and on environmental variables). Location Western Mediterranean Basin and European Atlantic coast. Methods We undertook two cluster analyses for eight genetically defined pine clades based on climatic niche and genetic similarities. We assessed niche similarity by means of a principal component analysis and Schoener's D metric. To calculate genetic similarity, we used the unweighted pair group method with arithmetic mean based on Nei's distance using 266 single nucleotide polymorphisms. We then assessed the contribution of environmental and geographical distances to phylogenetic distance by means of Mantel regression with variance partitioning. Finally, we compared the projection obtained from SDMs fitted from the species level (SDMsp) and composed from the eight clade-level models (SDMcm). Results Genetically and environmentally defined clusters were identical. Environmental and geographical distances explained 12.6% of the phylogenetic distance variation and, overall, geographical and environmental overlap among clades was low. Large differences were detected between SDMsp and SDMcm (57.75% of disagreement in the areas predicted as suitable). Main conclusions The genetic structure within the maritime pine subspecies complex is primarily a consequence of its demographic history, as seen by the high proportion of unexplained variation in phylogenetic distances. Nevertheless, our results highlight the contribution of local environmental adaptation in shaping the lower-order, phylogeographical distribution patterns and spatial genetic structure of maritime pine: (1) genetically and environmentally defined clusters are consistent, and (2) environment, rather than geography, explained a higher proportion of variation in phylogenetic distance. SDMs, key tools in conservation management, better characterize the fundamental niche of the species when they include molecular information.
Resumo:
Macrofossil analysis of a composite 19 m long sediment core from Rano Raraku Lake (Easter Island)was related to litho-sedimentary and geochemical features of the sediment. Strong stratigraphical patterns are shown by indirect gradient analyses of the data. The good correspondence between the stratigraphical patterns derived from macrofossil (Correspondence Analysis) and sedimentary and geochemical data (Principal Component Analysis) shows that macrofossil associations provide sound palaeolimnological information in conjunction with sedimentary data. The main taphonomic factors in fluencing the macrofossil assemblages are run-off from the catchment, the littoral plant belt, and the depositional environment within the basin. Five main stages during the last 34,000 calibrated years BP (cal yr BP) are characterised from the lithological, geochemical, and macrofossil data. From 34 to 14.6 cal kyr BP (last glacial period) the sediments were largely derived from the catchment, indicating a high energy lake environment with much erosion and run-off bringing abundant plant trichomes, lichens, and mosses into the centre of Raraku Lake.
Resumo:
Horizontal gene transfer is central to microbial evolution, because it enables genetic regions to spread horizontally through diverse communities. However, how gene transfer exerts such a strong effect is not understood. Here we develop an eco-evolutionary model and show how genetic transfer, even when rare, can transform the evolution and ecology of microbes. We recapitulate existing models, which suggest that asexual reproduction will overpower horizontal transfer and greatly limit its effects. We then show that allowing immigration completely changes these predictions. With migration, the rates and impacts of horizontal transfer are greatly increased, and transfer is most frequent for loci under positive natural selection. Our analysis explains how ecologically important loci can sweep through competing strains and species. In this way, microbial genomes can evolve to become ecologically diverse where different genomic regions encode for partially overlapping, but distinct, ecologies. Under these conditions ecological species do not exist, because genes, not species, inhabit niches.
Resumo:
In recent years there has been growing interest in composite indicators as an efficient tool of analysis and a method of prioritizing policies. This paper presents a composite index of intermediary determinants of child health using a multivariate statistical approach. The index shows how specific determinants of child health vary across Colombian departments (administrative subdivisions). We used data collected from the 2010 Colombian Demographic and Health Survey (DHS) for 32 departments and the capital city, Bogotá. Adapting the conceptual framework of Commission on Social Determinants of Health (CSDH), five dimensions related to child health are represented in the index: material circumstances, behavioural factors, psychosocial factors, biological factors and the health system. In order to generate the weight of the variables, and taking into account the discrete nature of the data, principal component analysis (PCA) using polychoric correlations was employed in constructing the index. From this method five principal components were selected. The index was estimated using a weighted average of the retained components. A hierarchical cluster analysis was also carried out. The results show that the biggest differences in intermediary determinants of child health are associated with health care before and during delivery.
Resumo:
This paper presents a composite index of early childhood health using a multivariate statistical approach. The index shows how child health varies across Colombian departments, -administrative subdivisions-. In recent years there has been growing interest in composite indicators as an efficient analysis tool and a way of prioritizing policies. These indicators not only enable multi-dimensional phenomena to be simplified but also make it easier to measure, visualize, monitor and compare a country’s performance in particular issues. We used data collected from the Colombian Demographic and Health Survey, DHS, for 32 departments and the capital city, Bogotá, in 2005 and 2010. The variables included in the index provide a measure of three dimensions related to child health: health status, health determinants and the health system. In order to generate the weight of the variables and take into account the discrete nature of the data, we employed a principal component analysis, PCA, using polychoric correlation. From this method, five principal components were selected. The index was estimated using a weighted average of the components retained. A hierarchical cluster analysis was also carried out. We observed that the departments ranking in the lowest positions are located on the Colombian periphery. They are departments with low per capita incomes and they present critical social indicators. The results suggest that the regional disparities in child health may be associated with differences in parental characteristics, household conditions and economic development levels, which makes clear the importance of context in the study of child health in Colombia.
Resumo:
The modern technological ability to handle large amounts of information confronts the chemist with the necessity to re-evaluate the statistical tools he routinely uses. Multivariate statistics furnishes theoretical bases for analyzing systems involving large numbers of variables. The mathematical calculations required for these systems are no longer an obstacle due to the existence of statistical packages that furnish multivariate analysis options. Here basic concepts of two multivariate statistical techniques, principal component and hierarchical cluster analysis that have received broad acceptance for treating chemical data are discussed.
Resumo:
The input of heavy metals concentrations determinated by ICP-AES, in samples of the Cambé river basin, was evaluated by using the Principal Component Analysis. The results distinguishes clearly one site, which is strongly influenced by almost all elements studied. Special attention was given to Pb, because of the presence of one battery industry in this area. Some downstream samples were associated with the same characteristics of this site, showing residual action of contaminants along the basin. Other sites presented influence of soil elements, plus Cr near a tannery industry. This study allowed to distinguish different sites in the upper basin of the Cambé (Londrina-PR-BR), in accordance to elements input.