929 resultados para Principal component analysis discriminant analysis
Resumo:
The supervised pattern recognition methods K-Nearest Neighbors (KNN), stepwise discriminant analysis (SDA), and soft independent modelling of class analogy (SIMCA) were employed in this work with the aim to investigate the relationship between the molecular structure of 27 cannabinoid compounds and their analgesic activity. Previous analyses using two unsupervised pattern recognition methods (PCA-principal component analysis and HCA-hierarchical cluster analysis) were performed and five descriptors were selected as the most relevants for the analgesic activity of the compounds studied: R (3) (charge density on substituent at position C(3)), Q (1) (charge on atom C(1)), A (surface area), log P (logarithm of the partition coefficient) and MR (molecular refractivity). The supervised pattern recognition methods (SDA, KNN, and SIMCA) were employed in order to construct a reliable model that can be able to predict the analgesic activity of new cannabinoid compounds and to validate our previous study. The results obtained using the SDA, KNN, and SIMCA methods agree perfectly with our previous model. Comparing the SDA, KNN, and SIMCA results with the PCA and HCA ones we could notice that all multivariate statistical methods classified the cannabinoid compounds studied in three groups exactly in the same way: active, moderately active, and inactive.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
OBJECTIVE To analyze the association between concentrations of air pollutants and admissions for respiratory causes in children. METHODS Ecological time series study. Daily figures for hospital admissions of children aged < 6, and daily concentrations of air pollutants (PM10, SO2, NO2, O3 and CO) were analyzed in the Região da Grande Vitória, ES, Southeastern Brazil, from January 2005 to December 2010. For statistical analysis, two techniques were combined: Poisson regression with generalized additive models and principal model component analysis. Those analysis techniques complemented each other and provided more significant estimates in the estimation of relative risk. The models were adjusted for temporal trend, seasonality, day of the week, meteorological factors and autocorrelation. In the final adjustment of the model, it was necessary to include models of the Autoregressive Moving Average Models (p, q) type in the residuals in order to eliminate the autocorrelation structures present in the components. RESULTS For every 10:49 μg/m3 increase (interquartile range) in levels of the pollutant PM10 there was a 3.0% increase in the relative risk estimated using the generalized additive model analysis of main components-seasonal autoregressive – while in the usual generalized additive model, the estimate was 2.0%. CONCLUSIONS Compared to the usual generalized additive model, in general, the proposed aspect of generalized additive model − principal component analysis, showed better results in estimating relative risk and quality of fit.
Resumo:
Pine forests constitute some of the most important renewable resources supplying timber, paper and chemical industries, among other functions. Characterization of the volatiles emitted by different Pinus species has proven to be an important tool to decode the process of host tree selection by herbivore insects, some of which cause serious economic damage to pines. Variations in the relative composition of the bouquet of semiochemicals are responsible for the outcome of different biological processes, such as mate finding, egg-laying site recognition and host selection. The volatiles present in phloem samples of four pine species, P. halepensis, P. sylvestris, P. pinaster and P. pinea, were identified and characterized with the aim of finding possible host-plant attractants for native pests, such as the bark beetle Tomicus piniperda. The volatile compounds emitted by phloem samples of pines were extracted by headspace solid-phase micro extraction, using a 2 cm 50/30 mm divinylbenzene/carboxen/polydimethylsiloxane table flex solid-phase microextraction fiber and its contents analyzed by high-resolution gas chromatography, using flame ionization and a non polar and chiral column phases. The components of the volatile fraction emitted by the phloem samples were identified by mass spectrometry using time-of-flight and quadrupole mass analyzers. The estimated relative composition was used to perform a discriminant analysis among pine species, by means of cluster and principal component analysis. It can be concluded that it is possible to discriminate pine species based on the monoterpenes emissions of phloem samples.
Resumo:
Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.
Resumo:
Objective This study describes the development of two updating measures of working memory (WM): Letter Updating Test (LUT) and Word Updating Test (WUT). Methods In stage 1, items were created and the instruments were assessed by experts and laymen. In stage 2, tests were given to 15 patients with schizophrenia and 15 paired controls. All were able to understand and respond to the instruments. In stage 3, 141 patients with schizophrenia and 119 healthy controls aged 18 to 60 took part; they were assessed on WM, processing speed (PS) and functional outcome. Results The results showed adequate rates of internal consistency for both measures developed, for both the total sample and each group separately, as well as evidence of convergent validity, discriminant validity and sensitivity to differentiate performance among the groups. Principal component analysis yielded two components, one for updating tests and other for PS measures, indicating factorial validity. Positive and significant, yet low, correlations were found with functionality measures. Conclusion These results provide adequate psychometric parameters for the measures developed, applicable to cognitive research settings in schizophrenia.
Resumo:
The skull morphometrics of adult male Antarctic fur seal, Arctocephalus gazella (Peters, 1875) and South American fur seal, A. australis (Zimmermann, 1783) were investigated using a collection of 45 and 38 skulls, respectively. Eighteen measurements were taken for each specimen. Comparative univariate and multivariate statistical analyses included standard statistics, one-way analysis of variance, principal component analysis and discriminant analysis. Individual variation was relatively high for some variables, as expressed by the coefficient of variation. Skulls of A. gazella were larger than those of A. australis for all but two variables: squamosal jugal suture and rostral length. Both species differed significantly as shown by both univariate and multivariate analyses. The discriminant function correctly classified all specimens. The standardized canonical coefficients showed that the variables which most contribute to the differentiation between species were, in decreasing order, the rostral length, palatal length, palatal width at postcanine 5 and braincase width. The present study corroborates that A. gazella and A. australis are phenotipically distinct species.
Resumo:
BACKGROUND AND AIMS: Inflammatory bowel disease (IBD) frequently manifests during childhood and adolescence. For providing and understanding a comprehensive picture of a patients' health status, health-related quality of life (HRQoL) instruments are an essential complement to clinical symptoms and functional limitations. Currently, the IMPACT-III questionnaire is one of the most frequently used disease-specific HRQoL instrument among patients with IBD. However, there is a lack of studies examining the validation and reliability of this instrument. METHODS: 146 paediatric IBD patients from the multicenter Swiss IBD paediatric cohort study database were included in the study. Medical and laboratory data were extracted from the hospital records. HRQoL data were assessed by means of standardized questionnaires filled out by the patients in a face-to-face interview. RESULTS: The original six IMPACT-III domain scales could not be replicated in the current sample. A principal component analysis with the extraction of four factor scores revealed the most robust solution. The four factors indicated good internal reliability (Cronbach's alpha=.64-.86), good concurrent validity measured by correlations with the generic KIDSCREEN-27 scales and excellent discriminant validity for the dimension of physical functioning measured by HRQoL differences for active and inactive severity groups (p<.001, d=1.04). CONCLUSIONS: This study with Swiss children with IBD indicates good validity and reliability for the IMPACT-III questionnaire. However, our findings suggest a slightly different factor structure than originally proposed. The IMPACT-III questionnaire can be recommended for its use in clinical practice. The factor structure should be further examined in other samples.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
Three multivariate statistical tools (principal component analysis, factor analysis, analysis discriminant) have been tested to characterize and model the sags registered in distribution substations. Those models use several features to represent the magnitude, duration and unbalanced grade of sags. They have been obtained from voltage and current waveforms. The techniques are tested and compared using 69 registers of sags. The advantages and drawbacks of each technique are listed
Resumo:
Functional connectivity (FC) as measured by correlation between fMRI BOLD time courses of distinct brain regions has revealed meaningful organization of spontaneous fluctuations in the resting brain. However, an increasing amount of evidence points to non-stationarity of FC; i.e., FC dynamically changes over time reflecting additional and rich information about brain organization, but representing new challenges for analysis and interpretation. Here, we propose a data-driven approach based on principal component analysis (PCA) to reveal hidden patterns of coherent FC dynamics across multiple subjects. We demonstrate the feasibility and relevance of this new approach by examining the differences in dynamic FC between 13 healthy control subjects and 15 minimally disabled relapse-remitting multiple sclerosis patients. We estimated whole-brain dynamic FC of regionally-averaged BOLD activity using sliding time windows. We then used PCA to identify FC patterns, termed "eigenconnectivities", that reflect meaningful patterns in FC fluctuations. We then assessed the contributions of these patterns to the dynamic FC at any given time point and identified a network of connections centered on the default-mode network with altered contribution in patients. Our results complement traditional stationary analyses, and reveal novel insights into brain connectivity dynamics and their modulation in a neurodegenerative disease.
Resumo:
The information provided by the alignment-independent GRid Independent Descriptors (GRIND) can be condensed by the application of principal component analysis, obtaining a small number of principal properties (GRIND-PP), which is more suitable for describing molecular similarity. The objective of the present study is to optimize diverse parameters involved in the obtention of the GRIND-PP and validate their suitability for applications, requiring a biologically relevant description of the molecular similarity. With this aim, GRIND-PP computed with a collection of diverse settings were used to carry out ligand-based virtual screening (LBVS) on standard conditions. The quality of the results obtained was remarkable and comparable with other LBVS methods, and their detailed statistical analysis allowed to identify the method settings more determinant for the quality of the results and their optimum. Remarkably, some of these optimum settings differ significantly from those used in previously published applications, revealing their unexplored potential. Their applicability in large compound database was also explored by comparing the equivalence of the results obtained using either computed or projected principal properties. In general, the results of the study confirm the suitability of the GRIND-PP for practical applications and provide useful hints about how they should be computed for obtaining optimum results.
Resumo:
Biplots are graphical displays of data matrices based on the decomposition of a matrix as the product of two matrices. Elements of these two matrices are used as coordinates for the rows and columns of the data matrix, with an interpretation of the joint presentation that relies on the properties of the scalar product. Because the decomposition is not unique, there are several alternative ways to scale the row and column points of the biplot, which can cause confusion amongst users, especially when software packages are not united in their approach to this issue. We propose a new scaling of the solution, called the standard biplot, which applies equally well to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. The standard biplot also handles data matrices with widely different levels of inherent variance. Two concepts taken from correspondence analysis are important to this idea: the weighting of row and column points, and the contributions made by the points to the solution. In the standard biplot one set of points, usually the rows of the data matrix, optimally represent the positions of the cases or sample units, which are weighted and usually standardized in some way unless the matrix contains values that are comparable in their raw form. The other set of points, usually the columns, is represented in accordance with their contributions to the low-dimensional solution. As for any biplot, the projections of the row points onto vectors defined by the column points approximate the centred and (optionally) standardized data. The method is illustrated with several examples to demonstrate how the standard biplot copes in different situations to give a joint map which needs only one common scale on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot readable. The proposal also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important.
Resumo:
In order to interpret the biplot it is necessary to know which points usually variables are the ones that are important contributors to the solution, and this information is available separately as part of the biplot s numerical results. We propose a new scaling of the display, called the contribution biplot, which incorporates this diagnostic directly into the graphical display, showing visually the important contributors and thus facilitating the biplot interpretation and often simplifying the graphical representation considerably. The contribution biplot can be applied to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. In the contribution biplot one set of points, usually the rows of the data matrix, optimally represent the spatial positions of the cases or sample units, according to some distance measure that usually incorporates some form of standardization unless all data are comparable in scale. The other set of points, usually the columns, is represented by vectors that are related to their contributions to the low-dimensional solution. A fringe benefit is that usually only one common scale for row and column points is needed on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot legible. Furthermore, this version of the biplot also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important, when they are in fact contributing minimally to the solution.
Resumo:
Counterfeit pharmaceutical products have become a widespread problem in the last decade. Various analytical techniques have been applied to discriminate between genuine and counterfeit products. Among these, Near-infrared (NIR) and Raman spectroscopy provided promising results.The present study offers a methodology allowing to provide more valuable information fororganisations engaged in the fight against counterfeiting of medicines.A database was established by analyzing counterfeits of a particular pharmaceutical product using Near-infrared (NIR) and Raman spectroscopy. Unsupervised chemometric techniques (i.e. principal component analysis - PCA and hierarchical cluster analysis - HCA) were implemented to identify the classes within the datasets. Gas Chromatography coupled to Mass Spectrometry (GC-MS) and Fourier Transform Infrared Spectroscopy (FT-IR) were used to determine the number of different chemical profiles within the counterfeits. A comparison with the classes established by NIR and Raman spectroscopy allowed to evaluate the discriminating power provided by these techniques. Supervised classifiers (i.e. k-Nearest Neighbors, Partial Least Squares Discriminant Analysis, Probabilistic Neural Networks and Counterpropagation Artificial Neural Networks) were applied on the acquired NIR and Raman spectra and the results were compared to the ones provided by the unsupervised classifiers.The retained strategy for routine applications, founded on the classes identified by NIR and Raman spectroscopy, uses a classification algorithm based on distance measures and Receiver Operating Characteristics (ROC) curves. The model is able to compare the spectrum of a new counterfeit with that of previously analyzed products and to determine if a new specimen belongs to one of the existing classes, consequently allowing to establish a link with other counterfeits of the database.