928 resultados para improved principal components analysis (IPCA) algorithm
Resumo:
Els avenços en tècniques de genotipat de polimorfismes genètics a gran escala estan liderant una revolució en el camp de l’epidemiologia genètica i la genètica de poblacions humanes. La informació aportada per aquestes tècniques ha evidenciat l’existència d’estructuracions poblacionals que poden augmentar l’error en els estudis d’associació a escala genòmica (GWAS, genome-wide association studies). Estudis recents han demostrat la presència d’aquestes estructuracions a nivell interregional i intrarregional a Europa. El present projecte ha avaluat el grau d’estructuració genètica en poblacions de la Península Ibèrica i altres regions del sudoest europeu (Itàlia i França) per quantificar l’impacte que aquesta potencial estructuració pot tenir en el disseny d’estudis d’associació GWAS i reconstruir la història demogràfica de les poblacions de la Mediterrània. Per aconseguir aquests objectius, s’han analitzat mostres de DNA de 770 individus de 26 poblacions de la Península Ibèrica, França, Itàlia i d’altres països de la Mediterrània. Aquestes mostres van ser genotipades per 240000 SNPs utilitzant l’array 250K StyI d’Affymetrix en el marc d’aquest projecte o mitjançant altres arrays d’Affymetrix en els projectes internacionals HapMap i POPRES. S’han realitzat anàlisis estadístiques incloent anàlisis de components principals, Fst, identitat per descendència, desequilibri de lligament, barreres genètiques, etc. Aquests resultats han permés construir un marc de referència de la variabilitat en aquesta regió, avaluar el seu impacte en estudis d’associació i proposar mesures per evitar l’increment de qualsevol tipus d’error (tipus I i II) en estudis nacionals i internacionals. A més, també han permés reconstruir la història de les poblacions humanes de la Mediterrània així com analitzar les seves relacions demogràfiques. Donada la duració limitada d’aquesta acció (24 mesos, d’octubre de 2010 a setembre de 2012), els resultats d’aquest projecte es troben actualment en fase de redacció i conduiran a diverses publicacions en revistes internacionals i a la preparació de comunicacions a congressos.
Resumo:
The information provided by the alignment-independent GRid Independent Descriptors (GRIND) can be condensed by the application of principal component analysis, obtaining a small number of principal properties (GRIND-PP), which is more suitable for describing molecular similarity. The objective of the present study is to optimize diverse parameters involved in the obtention of the GRIND-PP and validate their suitability for applications, requiring a biologically relevant description of the molecular similarity. With this aim, GRIND-PP computed with a collection of diverse settings were used to carry out ligand-based virtual screening (LBVS) on standard conditions. The quality of the results obtained was remarkable and comparable with other LBVS methods, and their detailed statistical analysis allowed to identify the method settings more determinant for the quality of the results and their optimum. Remarkably, some of these optimum settings differ significantly from those used in previously published applications, revealing their unexplored potential. Their applicability in large compound database was also explored by comparing the equivalence of the results obtained using either computed or projected principal properties. In general, the results of the study confirm the suitability of the GRIND-PP for practical applications and provide useful hints about how they should be computed for obtaining optimum results.
Resumo:
Background: Peach fruit undergoes a rapid softening process that involves a number of metabolic changes. Storing fruit at low temperatures has been widely used to extend its postharvest life. However, this leads to undesired changes, such as mealiness and browning, which affect the quality of the fruit. In this study, a 2-D DIGE approach was designed to screen for differentially accumulated proteins in peach fruit during normal softening as well as under conditions that led to fruit chilling injury. Results:The analysis allowed us to identify 43 spots -representing about 18% of the total number analyzed- that show statistically significant changes. Thirty-nine of the proteins could be identified by mass spectrometry. Some of the proteins that changed during postharvest had been related to peach fruit ripening and cold stress in the past. However, we identified other proteins that had not been linked to these processes. A graphical display of the relationship between the differentially accumulated proteins was obtained using pairwise average-linkage cluster analysis and principal component analysis. Proteins such as endopolygalacturonase, catalase, NADP-dependent isocitrate dehydrogenase, pectin methylesterase and dehydrins were found to be very important for distinguishing between healthy and chill injured fruit. A categorization of the differentially accumulated proteins was performed using Gene Ontology annotation. The results showed that the 'response to stress', 'cellular homeostasis', 'metabolism of carbohydrates' and 'amino acid metabolism' biological processes were affected the most during the postharvest. Conclusions: Using a comparative proteomic approach with 2-D DIGE allowed us to identify proteins that showed stage-specific changes in their accumulation pattern. Several proteins that are related to response to stress, cellular homeostasis, cellular component organization and carbohydrate metabolism were detected as being differentially accumulated. Finally, a significant proportion of the proteins identified had not been associated with softening, cold storage or chilling injury-altered fruit before; thus, comparative proteomics has proven to be a valuable tool for understanding fruit softening and postharvest.
Resumo:
The fatty acids from cocoa butters of different origins, varieties, and suppliers and a number of cocoa butter equivalents (Illexao 30-61, Illexao 30-71, Illexao 30-96, Choclin, Coberine, Chocosine-Illipe, Chocosine-Shea, Shokao, Akomax, Akonord, and Ertina) were investigated by bulk stable carbon isotope analysis and compound specific isotope analysis. The interpretation is based on principal component analysis combining the fatty acid concentrations and the bulk and molecular isotopic data. The scatterplot of the two first principal components allowed detection of the addition of vegetable fats to cocoa butters. Enrichment in heavy carbon isotope (C-13) of the bulk cocoa butter and of the individual fatty acids is related to mixing with other vegetable fats and possibly to thermally or oxidatively induced degradation during processing (e.g., drying and roasting of the cocoa beans or deodorization of the pressed fat) or storage. The feasibility of the analytical approach for authenticity assessment is discussed.
Resumo:
The fatty acids of olive oils of distinct quality grade from the most important European Union (EU) producer countries were chemically and isotopically characterized. The analytical approach utilized combined capillary column gas chromatography-mass spectrometry (GC/MS) and the novel technique of compound-specific isotope analysis (CSIA) through gas chromatography coupled to a stable isotope ratio mass spectrometer (IRMS) via a combustion (C) interface (GC/C/IRMS). This approach provides further insights into the control of the purity and geographical origin of oils sold as cold-pressed extra virgin olive oil with certified origin appellation. The results indicate that substantial enrichment in heavy carbon isotope (C-13) of the bulk oil and of individual fatty acids are related to (1) a thermally induced degradation due to deodorization or steam washing of the olive oils and (2) the potential blend with refined olive oil or other vegetable oils. The interpretation of the data is based on principal component analysis of the fatty acids concentrations and isotopic data (delta(13)C(oil), delta(13)C(16:0), delta(13)C(18:1)) and on the delta(13)C(16:0) vs delta(13)C(18:1) covariations. The differences in the delta(13)C values of palmitic and oleic acids are discussed in terms of biosynthesis of these acids in the plant tissue and admixture of distinct oils.
Resumo:
Objective To analyze the reliability and validity of the psychometric properties of the Brazilian version of the instrument for symptom assessment, titled MD Anderson Symptom Inventory - core. Method A cross-sectional study with 268 cancer patients in outpatient treatment, in the municipality of Ijuí, state of Rio Grande do Sul, Brazil. Results The Cronbach’s alpha for the MDASI general, symptoms and interferences was respectively (0.857), (0.784) and (0.794). The factor analysis showed adequacy of the data (0.792). In total, were identified four factors of the principal components related to the symptoms. Factor I: sleep problems, distress (upset), difficulties in remembering things and sadness. Factor II: dizziness, nausea, lack of appetite and vomiting. Factor III: drowsiness, dry mouth, numbness and tingling. Factor IV: pain, fatigue and shortness of breath. A single factor was revealed in the component of interferences with life (0.780), with prevalence of activity in general (59.7%), work (54.9%) and walking (49.3%). Conclusion The Brazilian version of the MD Anderson Symptom Inventory - core showed adequate psychometric properties in the studied population.
Resumo:
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual's DNA can be used to infer their geographic origin with surprising accuracy-often to within a few hundred kilometres.
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
This paper establishes a general framework for metric scaling of any distance measure between individuals based on a rectangular individuals-by-variables data matrix. The method allows visualization of both individuals and variables as well as preserving all the good properties of principal axis methods such as principal components and correspondence analysis, based on the singular-value decomposition, including the decomposition of variance into components along principal axes which provide the numerical diagnostics known as contributions. The idea is inspired from the chi-square distance in correspondence analysis which weights each coordinate by an amount calculated from the margins of the data table. In weighted metric multidimensional scaling (WMDS) we allow these weights to be unknown parameters which are estimated from the data to maximize the fit to the original distances. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing a matrix and displaying its rows and columns in biplots.
Resumo:
Com características morfológicas e edafo-climáticas extremamente diversificadas, a ilha de Santo Antão em Cabo Verde apresenta uma reconhecida vulnerabilidade ambiental a par de uma elevada carência de estudos científicos que incidam sobre essa realidade e sirvam de base à uma compreensão integrada dos fenómenos. A cartografia digital e as tecnologias de informação geográfica vêm proporcionando um avanço tecnológico na colecção, armazenamento e processamento de dados espaciais. Várias ferramentas actualmente disponíveis permitem modelar uma multiplicidade de factores, localizar e quantificar os fenómenos bem como e definir os níveis de contribuição de diferentes factores no resultado final. No presente estudo, desenvolvido no âmbito do curso de pós-graduação e mestrado em sistemas de Informação geográfica realizado pela Universidade de Trás-os-Montes e Alto Douro, pretende-se contribuir para a minimização do deficit de informação relativa às características biofísicas da citada ilha, recorrendo-se à aplicação de tecnologias de informação geográfica e detecção remota, associadas à análise estatística multivariada. Nesse âmbito, foram produzidas e analisadas cartas temáticas e desenvolvido um modelo de análise integrada de dados. Com efeito, a multiplicidade de variáveis espaciais produzidas, de entre elas 29 variáveis com variação contínua passíveis de influenciar as características biofísicas da região e, possíveis ocorrências de efeitos mútuos antagónicos ou sinergéticos, condicionam uma relativa complexidade à interpretação a partir dos dados originais. Visando contornar este problema, recorre-se a uma rede de amostragem sistemática, totalizando 921 pontos ou repetições, para extrair os dados correspondentes às 29 variáveis nos pontos de amostragem e, subsequente desenvolvimento de técnicas de análise estatística multivariada, nomeadamente a análise em componentes principais. A aplicação destas técnicas permitiu simplificar e interpretar as variáreis originais, normalizando-as e resumindo a informação contida na diversidade de variáveis originais, correlacionadas entre si, num conjunto de variáveis ortogonais (não correlacionadas), e com níveis de importância decrescente, as componentes principais. Fixou-se como meta a concentração de 75% da variância dos dados originais explicadas pelas primeiras 3 componentes principais e, desenvolveu-se um processo interactivo em diferentes etapas, eliminando sucessivamente as variáveis menos representativas. Na última etapa do processo as 3 primeiras CP resultaram em 74,54% da variância dos dados originais explicadas mas, que vieram a demonstrar na fase posterior, serem insuficientes para retratar a realidade. Optou-se pela inclusão da 4ª CP (CP4), com a qual 84% da referida variância era explicada e, representando oito variáveis biofísicas: a altitude, a densidade hidrográfica, a densidade de fracturação geológica, a precipitação, o índice de vegetação, a temperatura, os recursos hídricos e a distância à rede hidrográfica. A subsequente interpolação da 1ª componente principal (CP1) e, das principais variáveis associadas as componentes CP2, CP3 e CP4 como variáveis auxiliares, recorrendo a técnicas geoestatística em ambiente ArcGIS permitiu a obtenção de uma carta representando 84% da variação das características biofísicas no território. A análise em clusters validada pelo teste “t de Student” permitiu reclassificar o território em 6 unidades biofísicas homogéneas. Conclui-se que, as tecnologias de informação geográfica actualmente disponíveis a par de facilitar análises interactivas e flexíveis, possibilitando que se faça variar temas e critérios, integrar novas informações e introduzir melhorias em modelos construídos com bases em informações disponíveis num determinado contexto, associadas a técnicas de análise estatística multivariada, possibilitam, com base em critérios científicos, desenvolver a análise integrada de múltiplas variáveis biofísicas cuja correlação entre si, torna complexa a compreensão integrada dos fenómenos.
Resumo:
The Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES), a 19-item instrument developed to assess readiness to change alcohol use among individuals presenting for specialized alcohol treatment, has been used in various populations and settings. Its factor structure and concurrent validity has been described for specialized alcohol treatment settings and primary care. The purpose of this study was to determine the factor structure and concurrent validity of the SOCRATES among medical inpatients with unhealthy alcohol use not seeking help for specialized alcohol treatment. The subjects were 337 medical inpatients with unhealthy alcohol use, identified during their hospital stay. Most of them had alcohol dependence (76%). We performed an Alpha Factor Analysis (AFA) and Principal Component Analysis (PCA) of the 19 SOCRATES items, and forced 3 factors and 2 components, in order to replicate findings from Miller and Tonigan (Miller, W. R., & Tonigan, J. S., (1996). Assessing drinkers' motivations for change: The Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES). Psychology of Addictive Behavior, 10, 81-89.) and Maisto et al. (Maisto, S. A., Conigliaro, J., McNeil, M., Kraemer, K., O'Connor, M., & Kelley, M. E., (1999). Factor structure of the SOCRATES in a sample of primary care patients. Addictive Behavior, 24(6), 879-892.). Our analysis supported the view that the 2 component solution proposed by Maisto et al. (Maisto, S.A., Conigliaro, J., McNeil, M., Kraemer, K., O'Connor, M., & Kelley, M.E., (1999). Factor structure of the SOCRATES in a sample of primary care patients. Addictive Behavior, 24(6), 879-892.) is more appropriate for our data than the 3 factor solution proposed by Miller and Tonigan (Miller, W. R., & Tonigan, J. S., (1996). Assessing drinkers' motivations for change: The Stages of Change Readiness and Treatment Eagerness Scale (SOCRATES). Psychology of Addictive Behavior, 10, 81-89.). The first component measured Perception of Problems and was more strongly correlated with severity of alcohol-related consequences, presence of alcohol dependence, and alcohol consumption levels (average number of drinks per day and total number of binge drinking days over the past 30 days) compared to the second component measuring Taking Action. Our findings support the view that the SOCRATES is comprised of two important readiness constructs in general medical patients identified by screening.
Resumo:
Counterfeit pharmaceutical products have become a widespread problem in the last decade. Various analytical techniques have been applied to discriminate between genuine and counterfeit products. Among these, Near-infrared (NIR) and Raman spectroscopy provided promising results.The present study offers a methodology allowing to provide more valuable information fororganisations engaged in the fight against counterfeiting of medicines.A database was established by analyzing counterfeits of a particular pharmaceutical product using Near-infrared (NIR) and Raman spectroscopy. Unsupervised chemometric techniques (i.e. principal component analysis - PCA and hierarchical cluster analysis - HCA) were implemented to identify the classes within the datasets. Gas Chromatography coupled to Mass Spectrometry (GC-MS) and Fourier Transform Infrared Spectroscopy (FT-IR) were used to determine the number of different chemical profiles within the counterfeits. A comparison with the classes established by NIR and Raman spectroscopy allowed to evaluate the discriminating power provided by these techniques. Supervised classifiers (i.e. k-Nearest Neighbors, Partial Least Squares Discriminant Analysis, Probabilistic Neural Networks and Counterpropagation Artificial Neural Networks) were applied on the acquired NIR and Raman spectra and the results were compared to the ones provided by the unsupervised classifiers.The retained strategy for routine applications, founded on the classes identified by NIR and Raman spectroscopy, uses a classification algorithm based on distance measures and Receiver Operating Characteristics (ROC) curves. The model is able to compare the spectrum of a new counterfeit with that of previously analyzed products and to determine if a new specimen belongs to one of the existing classes, consequently allowing to establish a link with other counterfeits of the database.
Resumo:
The Baix Empordà-Selva-Gavarres aquifer system is related to the fault set that created the tectonic basins of Empordà and Selva areas (NE Spain) during the Neogene. In this work, we describe groundwater hydrogeological, hydrochemical and isotopical (3H, δD, δ18O, and the 87Sr/86Sr ratio) characteristics of this system in order to illustrate the relevance of fault zones in groundwater flow-paths and the recharge. In that way, we identify two flow systems, with distinct hydrochemistry and isotopes. A local flow system originates at the Gavarres Range, and it flows towards the basins of the Baix Empordà and Selva, with an approximate residence time of 20 years. Additionally, a regional flow system has only been identified in the Selva basin. This one is related to the main fault zones, as preferential flow paths. Its recharge is located in mountain ranges with higher altitudes, namely the Transversal and Guilleries Ranges, with residence times larger than 50 years. Isotopical data has also shown mixing processes between both flow systems and rainfall recharge while multivariate statistical analysis of principal components has shown the main processes that control hydrochemistry of each flow systems
Resumo:
We present a numerical method for spectroscopic ellipsometry of thick transparent films. When an analytical expression for the dispersion of the refractive index (which contains several unknown coefficients) is assumed, the procedure is based on fitting the coefficients at a fixed thickness. Then the thickness is varied within a range (according to its approximate value). The final result given by our method is as follows: The sample thickness is considered to be the one that gives the best fitting. The refractive index is defined by the coefficients obtained for this thickness.
Resumo:
Mismatch negativity (MMN) overlaps with other auditory event-related potential (ERP) components. We examined the ERPs of 50 9- to 11-year-old children for vowels /i/, /y/ and equivalent complex tones. The goal was to separate MMN from obligatory ERP components using principal component analysis and equal probability control condition. In addition to the contrast of the deviant minus standard response, we employed the contrast of the deviant minus control response, to see whether the obligatory processing contributes to MMN in children. When looking for differences in speech deviant minus standard contrast, MMN starts around 112 ms. However, when both contrasts are examined, MMN emerges for speech at 160 ms whereas for nonspeech MMN is observed at 112 ms regardless of contrast. We argue that this discriminative response to speech stimuli at 112 ms is obligatory in nature rather than reflecting change detection processing.