939 resultados para multivariate classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Predicting and mapping productivity areas allows crop producers to improve their planning of agricultural activities. The primary aims of this work were the identification and mapping of specific management areas allowing coffee bean quality to be predicted from soil attributes and their relationships to relief. The study area was located in the Southeast of the Minas Gerais state, Brazil. A grid containing a total of 145 uniformly spaced nodes 50 m apart was established over an area of 31. 7 ha from which samples were collected at depths of 0. 00-0. 20 m in order to determine physical and chemical attributes of the soil. These data were analysed in conjunction with plant attributes including production, proportion of beans retained by different sieves and drink quality. The results of principal component analysis (PCA) in combination with geostatistical data showed the attributes clay content and available iron to be the best choices for identifying four crop production environments. Environment A, which exhibited high clay and available iron contents, and low pH and base saturation, was that providing the highest yield (30. 4l ha-1) and best coffee beverage quality (61 sacks ha-1). Based on the results, we believe that multivariate analysis, geostatistics and the soil-relief relationships contained in the digital elevation model (DEM) can be effectively used in combination for the hybrid mapping of areas of varying suitability for coffee production. © 2012 Springer Science+Business Media New York.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Concentrations of 39 organic compounds were determined in three fractions (head, heart and tail) obtained from the pot still distillation of fermented sugarcane juice. The results were evaluated using analysis of variance (ANOVA), Tukey's test, principal component analysis (PCA), hierarchical cluster analysis (HCA) and linear discriminant analysis (LDA). According to PCA and HCA, the experimental data lead to the formation of three clusters. The head fractions give rise to a more defined group. The heart and tail fractions showed some overlap consistent with its acid composition. The predictive ability of calibration and validation of the model generated by LDA for the three fractions classification were 90.5 and 100%, respectively. This model recognized as the heart twelve of the thirteen commercial cachacas (92.3%) with good sensory characteristics, thus showing potential for guiding the process of cuts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quality of fresh-cut carambola (Averrhoa carambola L) is related to many chemical and biochemical variables especially those involved with softening and browning, both influenced by storage temperature. To study these effects, a multivariate analysis was used to evaluate slices packaged in vacuum-sealed polyolefin bags, and stored at 2.5 degrees C, 5 degrees C and 10 degrees C, for up to 16 d. The quality of slices at each temperature was correlated with the duration of storage, O(2) and CO(2) concentration in the package, physical chemical constituents, and activity of enzymes involved in softening (PG) and browning (PPO) metabolism. Three quality groups were identified by hierarchical cluster analysis, and the classification of the components within each of these groups was obtained from a principal component analysis (PCA). The characterization of samples by PCA clearly distinguished acceptable and non-acceptable slices. According to PCA, acceptable slices presented higher ascorbic acid content, greater hue angles ((o)h) and final lightness (L-5) in the first principal component (PC1). On the other hand, non-acceptable slices presented higher total pectin content. PPO activity in the PC1. Non-acceptable slices also presented higher soluble pectin content, increased pectin solubilisation and higher CO(2) concentration in the second principal component (PC2) whereas acceptable slices showed lower total sugar content. The hierarchical cluster and PCA analyses were useful for discriminating the quality of slices stored at different temperatures. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current methods for quality control of sugar cane are performed in extracted juice using several methodologies, often requiring appreciable time and chemicals (eventually toxic), making the methods not green and expensive. The present study proposes the use of X-ray spectrometry together with chemometric methods as an innovative and alternative technique for determining sugar cane quality parameters, specifically sucrose concentration, POL, and fiber content. Measurements in stem, leaf, and juice were performed, and those applied directly in stem provided the best results. Prediction models for sugar cane stem determinations with a single 60 s irradiation using portable X-ray fluorescence equipment allows estimating the % sucrose, % fiber, and POL simultaneously. Average relative deviations in the prediction step of around 8% are acceptable if considering that field measurements were done. These results may indicate the best period to cut a particular crop as well as for evaluating the quality of sugar cane for the sugar and alcohol industries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis some multivariate spectroscopic methods for the analysis of solutions are proposed. Spectroscopy and multivariate data analysis form a powerful combination for obtaining both quantitative and qualitative information and it is shown how spectroscopic techniques in combination with chemometric data evaluation can be used to obtain rapid, simple and efficient analytical methods. These spectroscopic methods consisting of spectroscopic analysis, a high level of automation and chemometric data evaluation can lead to analytical methods with a high analytical capacity, and for these methods, the term high-capacity analysis (HCA) is suggested. It is further shown how chemometric evaluation of the multivariate data in chromatographic analyses decreases the need for baseline separation. The thesis is based on six papers and the chemometric tools used are experimental design, principal component analysis (PCA), soft independent modelling of class analogy (SIMCA), partial least squares regression (PLS) and parallel factor analysis (PARAFAC). The analytical techniques utilised are scanning ultraviolet-visible (UV-Vis) spectroscopy, diode array detection (DAD) used in non-column chromatographic diode array UV spectroscopy, high-performance liquid chromatography with diode array detection (HPLC-DAD) and fluorescence spectroscopy. The methods proposed are exemplified in the analysis of pharmaceutical solutions and serum proteins. In Paper I a method is proposed for the determination of the content and identity of the active compound in pharmaceutical solutions by means of UV-Vis spectroscopy, orthogonal signal correction and multivariate calibration with PLS and SIMCA classification. Paper II proposes a new method for the rapid determination of pharmaceutical solutions by the use of non-column chromatographic diode array UV spectroscopy, i.e. a conventional HPLC-DAD system without any chromatographic column connected. In Paper III an investigation is made of the ability of a control sample, of known content and identity to diagnose and correct errors in multivariate predictions something that together with use of multivariate residuals can make it possible to use the same calibration model over time. In Paper IV a method is proposed for simultaneous determination of serum proteins with fluorescence spectroscopy and multivariate calibration. Paper V proposes a method for the determination of chromatographic peak purity by means of PCA of HPLC-DAD data. In Paper VI PARAFAC is applied for the decomposition of DAD data of some partially separated peaks into the pure chromatographic, spectral and concentration profiles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND Low-grade gliomas (LGGs) are rare brain neoplasms, with survival spanning up to a few decades. Thus, accurate evaluations on how biomarkers impact survival among patients with LGG require long-term studies on samples prospectively collected over a long period. METHODS The 210 adult LGGs collected in our databank were screened for IDH1 and IDH2 mutations (IDHmut), MGMT gene promoter methylation (MGMTmet), 1p/19q loss of heterozygosity (1p19qloh), and nuclear TP53 immunopositivity (TP53pos). Multivariate survival analyses with multiple imputation of missing data were performed using either histopathology or molecular markers. Both models were compared using Akaike's information criterion (AIC). The molecular model was reduced by stepwise model selection to filter out the most critical predictors. A third model was generated to assess for various marker combinations. RESULTS Molecular parameters were better survival predictors than histology (ΔAIC = 12.5, P< .001). Forty-five percent of studied patients died. MGMTmet was positively associated with IDHmut (P< .001). In the molecular model with marker combinations, IDHmut/MGMTmet combined status had a favorable impact on overall survival, compared with IDHwt (hazard ratio [HR] = 0.33, P< .01), and even more so the triple combination, IDHmut/MGMTmet/1p19qloh (HR = 0.18, P< .001). Furthermore, IDHmut/MGMTmet/TP53pos triple combination was a significant risk factor for malignant transformation (HR = 2.75, P< .05). CONCLUSION By integrating networks of activated molecular glioma pathways, the model based on genotype better predicts prognosis than histology and, therefore, provides a more reliable tool for standardizing future treatment strategies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays as well as next generation sequencing assays interrogating somatic mutation, insertion, deletion, translocation and structural rearrangements. Given the massive amount of data, a major challenge is to integrate information from multiple sources and formulate testable hypotheses. This thesis focuses on developing methodologies for integrative analyses of genomic assays profiled on the same set of samples. We have developed several novel methods for integrative biomarker identification and cancer classification. We introduce a regression-based approach to identify biomarkers predictive to therapy response or survival by integrating multiple assays including gene expression, methylation and copy number data through penalized regression. To identify key cancer-specific genes accounting for multiple mechanisms of regulation, we have developed the integIRTy software that provides robust and reliable inferences about gene alteration by automatically adjusting for sample heterogeneity as well as technical artifacts using Item Response Theory. To cope with the increasing need for accurate cancer diagnosis and individualized therapy, we have developed a robust and powerful algorithm called SIBER to systematically identify bimodally expressed genes using next generation RNAseq data. We have shown that prediction models built from these bimodal genes have the same accuracy as models built from all genes. Further, prediction models with dichotomized gene expression measurements based on their bimodal shapes still perform well. The effectiveness of outcome prediction using discretized signals paves the road for more accurate and interpretable cancer classification by integrating signals from multiple sources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

By combining complex network theory and data mining techniques, we provide objective criteria for optimization of the functional network representation of generic multivariate time series. In particular, we propose a method for the principled selection of the threshold value for functional network reconstruction from raw data, and for proper identification of the network's indicators that unveil the most discriminative information on the system for classification purposes. We illustrate our method by analysing networks of functional brain activity of healthy subjects, and patients suffering from Mild Cognitive Impairment, an intermediate stage between the expected cognitive decline of normal aging and the more pronounced decline of dementia. We discuss extensions of the scope of the proposed methodology to network engineering purposes, and to other data mining tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many existing engineering works model the statistical characteristics of the entities under study as normal distributions. These models are eventually used for decision making, requiring in practice the definition of the classification region corresponding to the desired confidence level. Surprisingly enough, however, a great amount of computer vision works using multidimensional normal models leave unspecified or fail to establish correct confidence regions due to misconceptions on the features of Gaussian functions or to wrong analogies with the unidimensional case. The resulting regions incur in deviations that can be unacceptable in high-dimensional models. Here we provide a comprehensive derivation of the optimal confidence regions for multivariate normal distributions of arbitrary dimensionality. To this end, firstly we derive the condition for region optimality of general continuous multidimensional distributions, and then we apply it to the widespread case of the normal probability density function. The obtained results are used to analyze the confidence error incurred by previous works related to vision research, showing that deviations caused by wrong regions may turn into unacceptable as dimensionality increases. To support the theoretical analysis, a quantitative example in the context of moving object detection by means of background modeling is given.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The elemental analysis of Spanish palm dates by inductively coupled plasma atomic emission spectrometry and inductively coupled plasma mass spectrometry is reported for the first time. To complete the information about the mineral composition of the samples, C, H, and N are determined by elemental analysis. Dates from Israel, Tunisia, Saudi Arabia, Algeria and Iran have also been analyzed. The elemental composition have been used in multivariate statistical analysis to discriminate the dates according to its geographical origin. A total of 23 elements (As, Ba, C, Ca, Cd, Co, Cr, Cu, Fe, H, In, K, Li, Mg, Mn, N, Na, Ni, Pb, Se, Sr, V, and Zn) at concentrations from major to ultra-trace levels have been determined in 13 date samples (flesh and seeds). A careful inspection of the results indicate that Spanish samples show higher concentrations of Cd, Co, Cr, and Ni than the remaining ones. Multivariate statistical analysis of the obtained results, both in flesh and seed, indicate that the proposed approach can be successfully applied to discriminate the Spanish date samples from the rest of the samples tested.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The associations between personality disorders and adult attachment dimensions were assessed in a sample of 487 consecutively admitted psychiatric subjects. Canonical correlation analysis showed that two sets of moderately correlated canonical variates explained the correlations between personality disorders and adult attachment patterns. The first and second attachment variates closely resembled the avoidance and anxiety attachment dimensions, respectively. The first personality disorder variate was mainly characterized by avoidant, depressive, paranoid, and schizotypal personality disorders, whereas dependent, histrionic, and borderline personality disorders loaded on the second canonical variate. However, these linear combinations of personality disorders were different from those obtained from principal component analysis. The results extend previous studies linking personality disorders and attachment patterns and suggest the importance of focusing on specific constellations of symptoms associated with dimensions of insecurity.