913 resultados para INDEPENDENT COMPONENT ANALYSIS (ICA)
Resumo:
At CoDaWork'03 we presented work on the analysis of archaeological glass composi- tional data. Such data typically consist of geochemical compositions involving 10-12 variables and approximates completely compositional data if the main component, sil- ica, is included. We suggested that what has been termed `crude' principal component analysis (PCA) of standardized data often identi ed interpretable pattern in the data more readily than analyses based on log-ratio transformed data (LRA). The funda- mental problem is that, in LRA, minor oxides with high relative variation, that may not be structure carrying, can dominate an analysis and obscure pattern associated with variables present at higher absolute levels. We investigate this further using sub- compositional data relating to archaeological glasses found on Israeli sites. A simple model for glass-making is that it is based on a `recipe' consisting of two `ingredients', sand and a source of soda. Our analysis focuses on the sub-composition of components associated with the sand source. A `crude' PCA of standardized data shows two clear compositional groups that can be interpreted in terms of di erent recipes being used at di erent periods, re ected in absolute di erences in the composition. LRA analysis can be undertaken either by normalizing the data or de ning a `residual'. In either case, after some `tuning', these groups are recovered. The results from the normalized LRA are di erently interpreted as showing that the source of sand used to make the glass di ered. These results are complementary. One relates to the recipe used. The other relates to the composition (and presumed sources) of one of the ingredients. It seems to be axiomatic in some expositions of LRA that statistical analysis of compositional data should focus on relative variation via the use of ratios. Our analysis suggests that absolute di erences can also be informative
Resumo:
Objective: To develop a method for objective quantification of PD motor symptoms related to Off episodes and peak dose dyskinesias, using spiral data gathered by using a touch screen telemetry device. The aim was to objectively characterize predominant motor phenotypes (bradykinesia and dyskinesia), to help in automating the process of visual interpretation of movement anomalies in spirals as rated by movement disorder specialists. Background: A retrospective analysis was conducted on recordings from 65 patients with advanced idiopathic PD from nine different clinics in Sweden, recruited from January 2006 until August 2010. In addition to the patient group, 10 healthy elderly subjects were recruited. Upper limb movement data were collected using a touch screen telemetry device from home environments of the subjects. Measurements with the device were performed four times per day during week-long test periods. On each test occasion, the subjects were asked to trace pre-drawn Archimedean spirals, using the dominant hand. The pre-drawn spiral was shown on the screen of the device. The spiral test was repeated three times per test occasion and they were instructed to complete it within 10 seconds. The device had a sampling rate of 10Hz and measured both position and time-stamps (in milliseconds) of the pen tip. Methods: Four independent raters (FB, DH, AJ and DN) used a web interface that animated the spiral drawings and allowed them to observe different kinematic features during the drawing process and to rate task performance. Initially, a number of kinematic features were assessed including ‘impairment’, ‘speed’, ‘irregularity’ and ‘hesitation’ followed by marking the predominant motor phenotype on a 3-category scale: tremor, bradykinesia and/or choreatic dyskinesia. There were only 2 test occasions for which all the four raters either classified them as tremor or could not identify the motor phenotype. Therefore, the two main motor phenotype categories were bradykinesia and dyskinesia. ‘Impairment’ was rated on a scale from 0 (no impairment) to 10 (extremely severe) whereas ‘speed’, ‘irregularity’ and ‘hesitation’ were rated on a scale from 0 (normal) to 4 (extremely severe). The proposed data-driven method consisted of the following steps. Initially, 28 spatiotemporal features were extracted from the time series signals before being presented to a Multilayer Perceptron (MLP) classifier. The features were based on different kinematic quantities of spirals including radius, angle, speed and velocity with the aim of measuring the severity of involuntary symptoms and discriminate between PD-specific (bradykinesia) and/or treatment-induced symptoms (dyskinesia). A Principal Component Analysis was applied on the features to reduce their dimensions where 4 relevant principal components (PCs) were retained and used as inputs to the MLP classifier. Finally, the MLP classifier mapped these components to the corresponding visually assessed motor phenotype scores for automating the process of scoring the bradykinesia and dyskinesia in PD patients whilst they draw spirals using the touch screen device. For motor phenotype (bradykinesia vs. dyskinesia) classification, the stratified 10-fold cross validation technique was employed. Results: There were good agreements between the four raters when rating the individual kinematic features with intra-class correlation coefficient (ICC) of 0.88 for ‘impairment’, 0.74 for ‘speed’, 0.70 for ‘irregularity’, and moderate agreements when rating ‘hesitation’ with an ICC of 0.49. When assessing the two main motor phenotype categories (bradykinesia or dyskinesia) in animated spirals the agreements between the four raters ranged from fair to moderate. There were good correlations between mean ratings of the four raters on individual kinematic features and computed scores. The MLP classifier classified the motor phenotype that is bradykinesia or dyskinesia with an accuracy of 85% in relation to visual classifications of the four movement disorder specialists. The test-retest reliability of the four PCs across the three spiral test trials was good with Cronbach’s Alpha coefficients of 0.80, 0.82, 0.54 and 0.49, respectively. These results indicate that the computed scores are stable and consistent over time. Significant differences were found between the two groups (patients and healthy elderly subjects) in all the PCs, except for the PC3. Conclusions: The proposed method automatically assessed the severity of unwanted symptoms and could reasonably well discriminate between PD-specific and/or treatment-induced motor symptoms, in relation to visual assessments of movement disorder specialists. The objective assessments could provide a time-effect summary score that could be useful for improving decision-making during symptom evaluation of individualized treatment when the goal is to maximize functional On time for patients while minimizing their Off episodes and troublesome dyskinesias.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
We investigated dietary intake patterns (DIP) in adolescents (14-18 year-olds) and the association with demographic and socioeconomic characteristics and lifestyle variables. This school-based survey was carried out among high school students from the city of Maringa in the state of Parana (PR), Brazil (2007). The sample included 991 students (54.5% girls) from high schools. DIPs were investigated by the frequency of weekly consumption of each food group: vegetables, fruit, rice, beans, fried food, sweet food, milk, soda, meat, eggs, alcoholic drinks. Independent variables were: demographic and socioeconomic characteristics and lifestyle variables. DIPS were identified using principal component analysis with orthogonal rotation (varimax). Three components were extracted. Component 1 (fried foods, sweets and soft drinks) was positively associated with not having breakfast for girls and dinner for boys. Moreover, component 2 (consumption of fruit and vegetables) was positively associated with having breakfast at home for boys and number of meals for girls. Component 3 (beans, eggs and meat) was positively associated with having lunch, employment and sedentary behavior level for girls. However, it was negatively associated with having lunch and dinner for boys. Adolescents who have healthier eating patterns also had other healthier behaviors regardless of gender. However, factors associated with dietary patterns differ between boys and girls. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In this thesis some multivariate spectroscopic methods for the analysis of solutions are proposed. Spectroscopy and multivariate data analysis form a powerful combination for obtaining both quantitative and qualitative information and it is shown how spectroscopic techniques in combination with chemometric data evaluation can be used to obtain rapid, simple and efficient analytical methods. These spectroscopic methods consisting of spectroscopic analysis, a high level of automation and chemometric data evaluation can lead to analytical methods with a high analytical capacity, and for these methods, the term high-capacity analysis (HCA) is suggested. It is further shown how chemometric evaluation of the multivariate data in chromatographic analyses decreases the need for baseline separation. The thesis is based on six papers and the chemometric tools used are experimental design, principal component analysis (PCA), soft independent modelling of class analogy (SIMCA), partial least squares regression (PLS) and parallel factor analysis (PARAFAC). The analytical techniques utilised are scanning ultraviolet-visible (UV-Vis) spectroscopy, diode array detection (DAD) used in non-column chromatographic diode array UV spectroscopy, high-performance liquid chromatography with diode array detection (HPLC-DAD) and fluorescence spectroscopy. The methods proposed are exemplified in the analysis of pharmaceutical solutions and serum proteins. In Paper I a method is proposed for the determination of the content and identity of the active compound in pharmaceutical solutions by means of UV-Vis spectroscopy, orthogonal signal correction and multivariate calibration with PLS and SIMCA classification. Paper II proposes a new method for the rapid determination of pharmaceutical solutions by the use of non-column chromatographic diode array UV spectroscopy, i.e. a conventional HPLC-DAD system without any chromatographic column connected. In Paper III an investigation is made of the ability of a control sample, of known content and identity to diagnose and correct errors in multivariate predictions something that together with use of multivariate residuals can make it possible to use the same calibration model over time. In Paper IV a method is proposed for simultaneous determination of serum proteins with fluorescence spectroscopy and multivariate calibration. Paper V proposes a method for the determination of chromatographic peak purity by means of PCA of HPLC-DAD data. In Paper VI PARAFAC is applied for the decomposition of DAD data of some partially separated peaks into the pure chromatographic, spectral and concentration profiles.
Resumo:
The present PhD thesis was focused on the development and application of chemical methodology (Py-GC-MS) and data-processing method by multivariate data analysis (chemometrics). The chromatographic and mass spectrometric data obtained with this technique are particularly suitable to be interpreted by chemometric methods such as PCA (Principal Component Analysis) as regards data exploration and SIMCA (Soft Independent Models of Class Analogy) for the classification. As a first approach, some issues related to the field of cultural heritage were discussed with a particular attention to the differentiation of binders used in pictorial field. A marker of egg tempera the phosphoric acid esterified, a pyrolysis product of lecithin, was determined using HMDS (hexamethyldisilazane) rather than the TMAH (tetramethylammonium hydroxide) as a derivatizing reagent. The validity of analytical pyrolysis as tool to characterize and classify different types of bacteria was verified. The FAMEs chromatographic profiles represent an important tool for the bacterial identification. Because of the complexity of the chromatograms, it was possible to characterize the bacteria only according to their genus, while the differentiation at the species level has been achieved by means of chemometric analysis. To perform this study, normalized areas peaks relevant to fatty acids were taken into account. Chemometric methods were applied to experimental datasets. The obtained results demonstrate the effectiveness of analytical pyrolysis and chemometric analysis for the rapid characterization of bacterial species. Application to a samples of bacterial (Pseudomonas Mendocina), fungal (Pleorotus ostreatus) and mixed- biofilms was also performed. A comparison with the chromatographic profiles established the possibility to: • Differentiate the bacterial and fungal biofilms according to the (FAMEs) profile. • Characterize the fungal biofilm by means the typical pattern of pyrolytic fragments derived from saccharides present in the cell wall. • Individuate the markers of bacterial and fungal biofilm in the same mixed-biofilm sample.
Resumo:
BACKGROUND It is unknown why patients with extensive ulcerative colitis (UC) have a higher risk of colorectal cancer compared with patients with left-sided UC. This study characterizes the inflammatory processes in left-sided UC, pancolitis, and UC-associated dysplasia at the transcriptional level to identify potential biomarkers and transcripts of importance for the carcinogenic behavior of chronic inflammation. METHODS The Affymetrix GeneChip Human Genome U133 Plus 2.0 was applied on colonic biopsies from UC patients with left-sided UC, pancolitis, dysplasia, and controls. Reverse transcription polymerase chain reaction and immunohistochemistry were performed for validating selected transcripts in the initial cohort and in 2 independent cohorts of patients with UC. Microarray data were analyzed by principal component analysis, and reverse transcription polymerase chain reaction and immunohistochemistry data by the Wilcoxon's rank-sum test. RESULTS The principal component analysis results revealed separate clusters for left-sided UC, pancolitis, dysplasia, and controls. Close clustering of dysplastic and pancolitic samples indicated similarities in gene expression. Indeed, 101 and 656 parallel upregulated and downregulated transcripts, respectively, were identified in specimens from dysplasia and pancolitis. Validation of selected transcripts hereof identified insulin receptor alpha (INSRA) and MAP kinase interacting serine/threonine kinase 2 (MKNK2) with an enhanced expression in dysplasia compared with left-sided UC and controls, whereas laminin γ2 (LAMC2) was found with a lower expression in dysplasia compared with the remaining 3 groups. CONCLUSIONS This study demonstrates pancolitis and left-sided UC as distinct inflammatory processes at the transcriptional level, and identifies INSRA, MKNK2, and LAMC2 as potential critical transcripts in the inflammation-driven preneoplastic process of UC.
Resumo:
Medium density fiberboard (MDF) is an engineered wood product formed by breaking down selected lignin-cellulosic material residuals into fibers, combining it with wax and a resin binder, and then forming panels by applying high temperature and pressure. Because the raw material in the industrial process is ever-changing, the panel industry requires methods for monitoring the composition of their products. The aim of this study was to estimate the ratio of sugarcane (SC) bagasse to Eucalyptus wood in MDF panels using near infrared (NIR) spectroscopy. Principal component analysis (PCA) and partial least square (PLS) regressions were performed. MDF panels having different bagasse contents were easily distinguished from each other by the PCA of their NIR spectra with clearly different patterns of response. The PLS-R models for SC content of these MDF samples presented a strong coefficient of determination (0.96) between the NIR-predicted and Lab-determined values and a low standard error of prediction (similar to 1.5%) in the cross-validations. A key role of resins (adhesives), cellulose, and lignin for such PLS-R calibrations was shown. PLS-DA model correctly classified ninety-four percent of MDF samples by cross-validations and ninety-eight percent of the panels by independent test set. These NIR-based models can be useful to quickly estimate sugarcane bagasse vs. Eucalyptus wood content ratio in unknown MDF samples and to verify the quality of these engineered wood products in an online process.
Resumo:
Natural products have widespread biological activities, including inhibition of mitochondrial enzyme systems. Some of these activities, for example cytotoxicity, may be the result of alteration of cellular bioenergetics. Based on previous computer-aided drug design (CADD) studies and considering reported data on structure-activity relationships (SAR), an assumption regarding the mechanism of action of natural products against parasitic infections involves the NADH-oxidase inhibition. In this study, chemometric tools, such as: Principal Component Analysis (PCA), Consensus PCA (CPCA), and partial least squares regression (PLS), were applied to a set of forty natural compounds, acting as NADH-oxidase inhibitors. The calculations were performed using the VolSurf+ program. The formalisms employed generated good exploratory and predictive results. The independent variables or descriptors having a hydrophobic profile were strongly correlated to the biological data.
Resumo:
The rhizosphere is a niche exploited by a wide variety of bacteria. The expression of heterologous genes by plants might become a factor affecting the structure of bacterial communities in the rhizosphere. In a greenhouse experiment, the bacterial community associated to transgenic eucalyptus, carrying the Lhcb1-2 genes from pea (responsible for a higher photosynthetic capacity), was evaluated. The culturable bacterial community associated to transgenic and wild type plants were not different in density, and the Amplified Ribosomal DNA Restriction Analysis (ARDRA) typing of 124 strains revealed dominant ribotypes representing the bacterial orders Burkholderiales, Rhizobiales, and Actinomycetales, the families Xanthomonadaceae, and Bacillaceae, and the genus Mycobacterium. Principal Component Analysis based on the fingerprints obtained by culture-independent Denaturing Gradient Gel Electrophoresis analysis revealed that Alphaproteobacteria, Betaproteobacteria and Actinobacteria communities responded differently to plant genotypes. Similar effects for the cultivation of transgenic eucalyptus to those observed when two genotype-distinct wild type plants are compared.
Resumo:
Fatty acid synthase (FASN) is the metabolic enzyme responsible for the endogenous synthesis of the saturated long-chain fatty acid palmitate. In contrast to most normal cells, FASN is overexpressed in a variety of human cancers including cutaneous melanoma, in which its levels of expression are associated with a poor prognosis and depth of invasion. Recently, we have demonstrated the mitochondrial involvement in FASN inhibition-induced apoptosis in melanoma cells. Herein we compare, via electrospray ionization mass spectrometry (ESI-MS), free fatty acids (FFA) composition of mitochondria isolated from control (EtOH-treated cells) and Orlistat-treated B16-F10 mouse melanoma cells. Principal component analysis (PCA) was applied to the ESI-MS data and found to separate the two groups of samples. Mitochondria from control cells showed predominance of six ions, that is, those of m/z 157 (Pelargonic, 9:0), 255 (Palmitic, 16:0), 281 (Oleic, 18:1), 311 (Arachidic, 20:0), 327 (Docosahexaenoic, 22:6) and 339 (Behenic, 22:0). In contrast, FASN inhibition with Orlistat changes significantly mitochondrial FFA composition by reducing synthesis of palmitic acid, and its elongation and unsaturation products, such as arachidic and behenic acids, and oleic acid, respectively. ESI-MS of mitochondria isolated from Orlistat-treated cells presented therefore three major ions of m/z 157 (Pelargonic, 9:0), 193 (unknown) and 199 (Lauric, 12:0). These findings demonstrate therefore that FASN inhibition by Orlistat induces significant changes in the FFA composition of mitochondria. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
The supervised pattern recognition methods K-Nearest Neighbors (KNN), stepwise discriminant analysis (SDA), and soft independent modelling of class analogy (SIMCA) were employed in this work with the aim to investigate the relationship between the molecular structure of 27 cannabinoid compounds and their analgesic activity. Previous analyses using two unsupervised pattern recognition methods (PCA-principal component analysis and HCA-hierarchical cluster analysis) were performed and five descriptors were selected as the most relevants for the analgesic activity of the compounds studied: R (3) (charge density on substituent at position C(3)), Q (1) (charge on atom C(1)), A (surface area), log P (logarithm of the partition coefficient) and MR (molecular refractivity). The supervised pattern recognition methods (SDA, KNN, and SIMCA) were employed in order to construct a reliable model that can be able to predict the analgesic activity of new cannabinoid compounds and to validate our previous study. The results obtained using the SDA, KNN, and SIMCA methods agree perfectly with our previous model. Comparing the SDA, KNN, and SIMCA results with the PCA and HCA ones we could notice that all multivariate statistical methods classified the cannabinoid compounds studied in three groups exactly in the same way: active, moderately active, and inactive.
Resumo:
The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
Exploratory factor analysis is a widely used statistical technique in the social sciences. It attempts to identify underlying factors that explain the pattern of correlations within a set of observed variables. A statistical software package is needed to perform the calcula- tions. However, there are some limitations with popular statistical software packages, like SPSS. The R programming language is a free software package for statistical and graphical computing. It o ers many packages written by contributors from all over the world and programming resources that allow it to overcome the dialog limitations of SPSS. This paper o ers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data.
Resumo:
OBJECTIVE: To identify clusters of the major occurrences of leprosy and their associated socioeconomic and demographic factors. METHODS: Cases of leprosy that occurred between 1998 and 2007 in São José do Rio Preto (southeastern Brazil) were geocodified and the incidence rates were calculated by census tract. A socioeconomic classification score was obtained using principal component analysis of socioeconomic variables. Thematic maps to visualize the spatial distribution of the incidence of leprosy with respect to socioeconomic levels and demographic density were constructed using geostatistics. RESULTS: While the incidence rate for the entire city was 10.4 cases per 100,000 inhabitants annually between 1998 and 2007, the incidence rates of individual census tracts were heterogeneous, with values that ranged from 0 to 26.9 cases per 100,000 inhabitants per year. Areas with a high leprosy incidence were associated with lower socioeconomic levels. There were identified clusters of leprosy cases, however there was no association between disease incidence and demographic density. There was a disparity between the places where the majority of ill people lived and the location of healthcare services. CONCLUSIONS: The spatial analysis techniques utilized identified the poorer neighborhoods of the city as the areas with the highest risk for the disease. These data show that health departments must prioritize politico-administrative policies to minimize the effects of social inequality and improve the standards of living, hygiene, and education of the population in order to reduce the incidence of leprosy.