927 resultados para Linear discriminant analysis
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This paper compares the responses of conventional and transgenic soybean to glyphosate application in terms of the contents of 17 detectable soluble amino acids in leaves, analyzed by HPLC and fluorescence detection. Glutamate, histidine, asparagine, arginine + alanine, glycine + threonine and isoleucine increased in conventional soybean leaves when compared to transgenic soybean leaves, whereas for other amino acids, no significant differences were recorded. Univariate analysis allowed us to make an approximate differentiation between conventional and transgenic lines, observing the changes of some variables by glyphosate application. In addition, by means of the multivariate analysis, using principal components analysis (PCA), cluster analysis (CA) and linear discriminant analysis (LDA) it was possible to identify and discriminate different groups based on the soybean genetic origin. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.
Resumo:
Fractal theory presents a large number of applications to image and signal analysis. Although the fractal dimension can be used as an image object descriptor, a multiscale approach, such as multiscale fractal dimension (MFD), increases the amount of information extracted from an object. MFD provides a curve which describes object complexity along the scale. However, this curve presents much redundant information, which could be discarded without loss in performance. Thus, it is necessary the use of a descriptor technique to analyze this curve and also to reduce the dimensionality of these data by selecting its meaningful descriptors. This paper shows a comparative study among different techniques for MFD descriptors generation. It compares the use of well-known and state-of-the-art descriptors, such as Fourier, Wavelet, Polynomial Approximation (PA), Functional Data Analysis (FDA), Principal Component Analysis (PCA), Symbolic Aggregate Approximation (SAX), kernel PCA, Independent Component Analysis (ICA), geometrical and statistical features. The descriptors are evaluated in a classification experiment using Linear Discriminant Analysis over the descriptors computed from MFD curves from two data sets: generic shapes and rotated fish contours. Results indicate that PCA, FDA, PA and Wavelet Approximation provide the best MFD descriptors for recognition and classification tasks. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Die Untersuchungen umfassen die Periode 1981 – 2000 und basieren hauptsächlich auf Daten des Deutschen Wetterdienstes (DWD). Relativwerte der Globalstrahlung beziehen sich auf die Rayleigh-Atmosphäre. Das Regressionsmodell nach Angström ermöglicht die Erweiterung des Meßnetzes. In linearer und nichtlinearer Regression und Korrelation ist die Globalstrahlung entweder abhängige (Sonnenscheindauer, Bewölkung) oder unabhängige Variable (Lufttemperatur, Bodentemperatur). Ihre Intensität in Abhängigkeit von Großwetterlagen, Großwettertypen und Luftmassen wird diskutiert. Diesbezüglich werden mit der Linearen Diskriminanzanalyse ähnliche Großwetterlagen und Stationen in signifikant unterschiedenen Gruppen zusammengefaßt, getrennt nach Sommer- und Winterhalbjahr. Abhängig von der Zeit betrachtet, enthalten Globalstrahlung, direkte und diffuse Sonnenstrahlung, Lufttemperatur, Bewölkung und Niederschlag signifikante zyklische Variationen, die gegebenenfalls klimatologisch relevant sind. Weiteren Aufschluß ergeben deshalb die Zeitreihenanalysen. Autokorrelation-Spektralanalysen (ASA) der genannten Variablen werden in integrierten Spektren dargestellt. Hinweise auf die zeitliche Konstanz signifikanter Varianzmaxima enthalten die Spektren der dynamischen (gleitenden) ASA.
Resumo:
Questa tesi descrive alcuni studi di messa a punto di metodi di analisi fisici accoppiati con tecniche statistiche multivariate per valutare la qualità e l’autenticità di oli vegetali e prodotti caseari. L’applicazione di strumenti fisici permette di abbattere i costi ed i tempi necessari per le analisi classiche ed allo stesso tempo può fornire un insieme diverso di informazioni che possono riguardare tanto la qualità come l’autenticità di prodotti. Per il buon funzionamento di tali metodi è necessaria la costruzione di modelli statistici robusti che utilizzino set di dati correttamente raccolti e rappresentativi del campo di applicazione. In questo lavoro di tesi sono stati analizzati oli vegetali e alcune tipologie di formaggi (in particolare pecorini per due lavori di ricerca e Parmigiano-Reggiano per un altro). Sono stati utilizzati diversi strumenti di analisi (metodi fisici), in particolare la spettroscopia, l’analisi termica differenziale, il naso elettronico, oltre a metodiche separative tradizionali. I dati ottenuti dalle analisi sono stati trattati mediante diverse tecniche statistiche, soprattutto: minimi quadrati parziali; regressione lineare multipla ed analisi discriminante lineare.
Resumo:
This study focuses on the use of metabonomics applications in measuring fish freshness in various biological species and in evaluating how they are stored. This metabonomic approach is innovative and is based upon molecular profiling through nuclear magnetic resonance (NMR). On one hand, the aim is to ascertain if a type of fish has maintained, within certain limits, its sensory and nutritional characteristics after being caught; and on the second, the research observes the alterations in the product’s composition. The spectroscopic data obtained through experimental nuclear magnetic resonance, 1H-NMR, of the molecular profiles of the fish extracts are compared with those obtained on the same samples through analytical and conventional methods now in practice. These second methods are used to obtain chemical indices of freshness through biochemical and microbial degradation of the proteic nitrogen compounds and not (trimethylamine, N-(CH3)3, nucleotides, amino acids, etc.). At a later time, a principal components analysis (PCA) and a linear discriminant analysis (PLS-DA) are performed through a metabonomic approach to condense the temporal evolution of freshness into a single parameter. In particular, the first principal component (PC1) under both storage conditions (4 °C and 0 °C) represents the component together with the molecular composition of the samples (through 1H-NMR spectrum) evolving during storage with a very high variance. The results of this study give scientific evidence supporting the objective elements evaluating the freshness of fish products showing those which can be labeled “fresh fish.”
Resumo:
The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.
Resumo:
In 1998-2001 Finland suffered the most severe insect outbreak ever recorded, over 500,000 hectares. The outbreak was caused by the common pine sawfly (Diprion pini L.). The outbreak has continued in the study area, Palokangas, ever since. To find a good method to monitor this type of outbreaks, the purpose of this study was to examine the efficacy of multi-temporal ERS-2 and ENVISAT SAR imagery for estimating Scots pine (Pinus sylvestris L.) defoliation. Three methods were tested: unsupervised k-means clustering, supervised linear discriminant analysis (LDA) and logistic regression. In addition, I assessed if harvested areas could be differentiated from the defoliated forest using the same methods. Two different speckle filters were used to determine the effect of filtering on the SAR imagery and subsequent results. The logistic regression performed best, producing a classification accuracy of 81.6% (kappa 0.62) with two classes (no defoliation, >20% defoliation). LDA accuracy was with two classes at best 77.7% (kappa 0.54) and k-means 72.8 (0.46). In general, the largest speckle filter, 5 x 5 image window, performed best. When additional classes were added the accuracy was usually degraded on a step-by-step basis. The results were good, but because of the restrictions in the study they should be confirmed with independent data, before full conclusions can be made that results are reliable. The restrictions include the small size field data and, thus, the problems with accuracy assessment (no separate testing data) as well as the lack of meteorological data from the imaging dates.
Resumo:
In this study, we demonstrate the power of applying complementary DNA (cDNA) microarray technology to identifying candidate loci that exhibit subtle differences in expression levels associated with a complex trait in natural populations of a nonmodel organism. Using a highly replicated experimental design involving 180 cDNA microarray experiments, we measured gene-expression levels from 1098 transcript probes in 90 individuals originating from six brown trout (Salmo trutta) and one Atlantic salmon (Salmo salar) population, which follow either a migratory or a sedentary life history. We identified several candidate genes associated with preparatory adaptations to different life histories in salmonids, including genes encoding for transaldolase 1, constitutive heat-shock protein HSC70-1 and endozepine. Some of these genes clustered into functional groups, providing insight into the physiological pathways potentially involved in the expression of life-history related phenotypic differences. Such differences included the down-regulation of genes involved in the respiratory system of future migratory individuals. In addition, we used linear discriminant analysis to identify a set of 12 genes that correctly classified immature individuals as migratory or sedentary with high accuracy. Using the expression levels of these 12 genes, 17 out of 18 individuals used for cross-validation were correctly assigned to their respective life-history phenotype. Finally, we found various candidate genes associated with physiological changes that are likely to be involved in preadaptations to seawater in anadromous populations of the genus Salmo, one of which was identified to encode for nucleophosmin 1. Our findings thus provide new molecular insights into salmonid life-history variation, opening new perspectives in the study of this complex trait.
Resumo:
Vector control is the mainstay of malaria control programmes. Successful vector control profoundly relies on accurate information on the target mosquito populations in order to choose the most appropriate intervention for a given mosquito species and to monitor its impact. An impediment to identify mosquito species is the existence of morphologically identical sibling species that play different roles in the transmission of pathogens and parasites. Currently PCR diagnostics are used to distinguish between sibling species. PCR based methods are, however, expensive, time-consuming and their development requires a priori DNA sequence information. Here, we evaluated an inexpensive molecular proteomics approach for Anopheles species: matrix assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). MALDI-TOF MS is a well developed protein profiling tool for the identification of microorganisms but so far has received little attention as a diagnostic tool in entomology. We measured MS spectra from specimens of 32 laboratory colonies and 2 field populations representing 12 Anopheles species including the A. gambiae species complex. An important step in the study was the advancement and implementation of a bioinformatics approach improving the resolution over previously applied cluster analysis. Borrowing tools for linear discriminant analysis from genomics, MALDI-TOF MS accurately identified taxonomically closely related mosquito species, including the separation between the M and S molecular forms of A. gambiae sensu stricto. The approach also classifies specimens from different laboratory colonies; hence proving also very promising for its use in colony authentication as part of quality assurance in laboratory studies. While being exceptionally accurate and robust, MALDI-TOF MS has several advantages over other typing methods, including simple sample preparation and short processing time. As the method does not require DNA sequence information, data can also be reviewed at any later stage for diagnostic or functional patterns without the need for re-designing and re-processing biological material.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Kelp forests represent a major habitat type in coastal waters worldwide and their structure and distribution is predicted to change due to global warming. Despite their ecological and economical importance, there is still a lack of reliable spatial information on their abundance and distribution. In recent years, various hydroacoustic mapping techniques for sublittoral environments evolved. However, in turbid coastal waters, such as off the island of Helgoland (Germany, North Sea), the kelp vegetation is present in shallow water depths normally excluded from hydroacoustic surveys. In this study, single beam survey data consisting of the two seafloor parameters roughness and hardness were obtained with RoxAnn from water depth between 2 and 18 m. Our primary aim was to reliably detect the kelp forest habitat with different densities and distinguish it from other vegetated zones. Five habitat classes were identified using underwater-video and were applied for classification of acoustic signatures. Subsequently, spatial prediction maps were produced via two classification approaches: Linear discriminant analysis (LDA) and manual classification routine (MC). LDA was able to distinguish dense kelp forest from other habitats (i.e. mixed seaweed vegetation, sand, and barren bedrock), but no variances in kelp density. In contrast, MC also provided information on medium dense kelp distribution which is characterized by intermediate roughness and hardness values evoked by reduced kelp abundances. The prediction maps reach accordance levels of 62% (LDA) and 68% (MC). The presence of vegetation (kelp and mixed seaweed vegetation) was determined with higher prediction abilities of 75% (LDA) and 76% (MC). Since the different habitat classes reveal acoustic signatures that strongly overlap, the manual classification method was more appropriate for separating different kelp forest densities and low-lying vegetation. It became evident that the occurrence of kelp in this area is not simply linked to water depth. Moreover, this study shows that the two seafloor parameters collected with RoxAnn are suitable indicators for the discrimination of different densely vegetated seafloor habitats in shallow environments.