970 resultados para Multivariate data
Resumo:
IDENTIFICATION OF ETHANOLIC WOOD EXTRACTS USING ELECTRONIC ABSORPTION SPECTRUM AND MULTIVARIATE ANALYSIS. The application of multivariate analysis to spectrophotometric (UV) data was explored for distinguishing extracts of cachaca woods commonly used in the manufacture of casks for aging cachacas (oak, cabretiva-parda, jatoba, amendoim and canela-sassafras). Absorbances close to 280 nm were more strongly correlated with oak and jatoba woods, whereas absorbances near 230 nm were more correlated with canela-sassafras and cabretiva-parda. A comparison between the spectrophotometric model and the model based on chromatographic (HPLC-DAD) data was carried out. The spectrophotometric model better explained the variance data (PC1 + PC2 = 91%) exhibiting potential as a routine method for checking aged spirits.
Resumo:
In this work, 50 ceramic fragments from the Lago Grande and 30 from the Osvaldo archaeological site were compared to assess elemental similarities. The aim is to perform a preliminary comparison between the sites, which are located in the central Amazon, Brazil. The analytical technique employed to obtain the ceramics elemental composition was instrumental neutron activation analysis (INAA). The data set obtained was explored by the multivariate statistical techniques of cluster, principal component and discriminant analysis. The analyzed elements were: Na, Lu, U, Yb, La, Th, Cr, Cs, Sc, Fe, Eu, Ce and Hf. The results showed the existence of at least two compositional groups for Lago Grande and Osvaldo. Each compositional group of Osvaldo archaeological site matches with one group of Lago Grande. Correlated with the archaeological background, the results suggest commercial or cultural exchange in the region, which is an indicative of socio-cultural interactions between those sites.
Resumo:
Although a large amount of data have been published in past years on the taxonomic status of the Anastrepha fraterculus (Wiedemann) species complex, there is still a need to know how many species this complex comprises, the distribution of each one, and their distinguishing features. In this study, we assessed the morphometric variability of 32 populations from the A. fraterculus complex, located in major biogeographical areas from the Neotropics. Multivariate techniques for analysis were applied to the measurements of 21 variables referring to the mesonotum, aculeus, and wing. For the first time, our results identified the presence of seven distinct morphotypes within this species complex. According to the biogeographical areas, populations occurring in the Mesoamerican dominion (Mexico, Guatemala, and Panama) were clustered within a single natural entity labeled as the "Mexican" morphotype; whereas in the northwestern South American dominion, samples fell into three distinct groups: the "Venezuelan" morphotype with a single population from the Caribbean lowlands of Venezuela, the "Andean" morphotype from the highlands of Venezuela and Colombia, and the third group or "Peruvian" morphotype comprised the samples from the Pacific coastal lowlands of Ecuador and Peru. Three additional groups were identified from the Chacoan and Paranaense sub-regions: the morphotype "Brazilian-1" was recognized as including the Argentinean samples with most pertaining to Brazil, and widely distributed in these biogeographical areas; the morphotype "Brazilian-2" was recognized as including two samples from the state of Sao Paulo (Ilha-Bela and Sao Sebastiao); whereas the morphotype "Brazilian-3" included a single population from Botucatu (state of Sao Paulo). Based on data published by previous authors showing genetic and karyotypic differentiation, as well as reproductive isolation, we have concluded that such morphotypes indeed represent natural groups and distinct taxonomic entities.
Resumo:
Portable system of energy dispersive X-ray fluorescence was used to determine the elemental composition of 68 pottery fragments from Sambaqui do Bacanga, an archeological site in Sao Luis, Maranhao, Brazil. This site was occupied from 6600 BP until 900 BP. By determining the element chemical composition of those fragments, it was possible to verify the existence of engobe in 43 pottery fragments. Obtained from two-dimensional graphs and hierarchical cluster analysis performed in fragments of stratigraphies from surface and 113-cm level, and 10 to 20, 132 and 144-cm level, it was possible to group these fragments in five distinct groups, according to their stratigraphies. The results of data grouping (two-dimensional graphics) are in agreement with hierarchical cluster analysis by Ward method. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
Abstract Background Prostate cancer is a leading cause of death in the male population, therefore, a comprehensive study about the genes and the molecular networks involved in the tumoral prostate process becomes necessary. In order to understand the biological process behind potential biomarkers, we have analyzed a set of 57 cDNA microarrays containing ~25,000 genes. Results Principal Component Analysis (PCA) combined with the Maximum-entropy Linear Discriminant Analysis (MLDA) were applied in order to identify genes with the most discriminative information between normal and tumoral prostatic tissues. Data analysis was carried out using three different approaches, namely: (i) differences in gene expression levels between normal and tumoral conditions from an univariate point of view; (ii) in a multivariate fashion using MLDA; and (iii) with a dependence network approach. Our results show that malignant transformation in the prostatic tissue is more related to functional connectivity changes in their dependence networks than to differential gene expression. The MYLK, KLK2, KLK3, HAN11, LTF, CSRP1 and TGM4 genes presented significant changes in their functional connectivity between normal and tumoral conditions and were also classified as the top seven most informative genes for the prostate cancer genesis process by our discriminant analysis. Moreover, among the identified genes we found classically known biomarkers and genes which are closely related to tumoral prostate, such as KLK3 and KLK2 and several other potential ones. Conclusion We have demonstrated that changes in functional connectivity may be implicit in the biological process which renders some genes more informative to discriminate between normal and tumoral conditions. Using the proposed method, namely, MLDA, in order to analyze the multivariate characteristic of genes, it was possible to capture the changes in dependence networks which are related to cell transformation.
Resumo:
A faithful depiction of the tropical atmosphere requires three-dimensional sets of observations. Despite the increasing amount of observations presently available, these will hardly ever encompass the entire atmosphere and, in addition, observations have errors. Additional (background) information will always be required to complete the picture. Valuable added information comes from the physical laws governing the flow, usually mediated via a numerical weather prediction (NWP) model. These models are, however, never going to be error-free, why a reliable estimate of their errors poses a real challenge since the whole truth will never be within our grasp. The present thesis addresses the question of improving the analysis procedures for NWP in the tropics. Improvements are sought by addressing the following issues: - the efficiency of the internal model adjustment, - the potential of the reliable background-error information, as compared to observations, - the impact of a new, space-borne line-of-sight wind measurements, and - the usefulness of multivariate relationships for data assimilation in the tropics. Most NWP assimilation schemes are effectively univariate near the equator. In this thesis, a multivariate formulation of the variational data assimilation in the tropics has been developed. The proposed background-error model supports the mass-wind coupling based on convectively-coupled equatorial waves. The resulting assimilation model produces balanced analysis increments and hereby increases the efficiency of all types of observations. Idealized adjustment and multivariate analysis experiments highlight the importance of direct wind measurements in the tropics. In particular, the presented results confirm the superiority of wind observations compared to mass data, in spite of the exact multivariate relationships available from the background information. The internal model adjustment is also more efficient for wind observations than for mass data. In accordance with these findings, new satellite wind observations are expected to contribute towards the improvement of NWP and climate modeling in the tropics. Although incomplete, the new wind-field information has the potential to reduce uncertainties in the tropical dynamical fields, if used together with the existing satellite mass-field measurements. The results obtained by applying the new background-error representation to the tropical short-range forecast errors of a state-of-art NWP model suggest that achieving useful tropical multivariate relationships may be feasible within an operational NWP environment.
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
Questa tesi descrive alcuni studi di messa a punto di metodi di analisi fisici accoppiati con tecniche statistiche multivariate per valutare la qualità e l’autenticità di oli vegetali e prodotti caseari. L’applicazione di strumenti fisici permette di abbattere i costi ed i tempi necessari per le analisi classiche ed allo stesso tempo può fornire un insieme diverso di informazioni che possono riguardare tanto la qualità come l’autenticità di prodotti. Per il buon funzionamento di tali metodi è necessaria la costruzione di modelli statistici robusti che utilizzino set di dati correttamente raccolti e rappresentativi del campo di applicazione. In questo lavoro di tesi sono stati analizzati oli vegetali e alcune tipologie di formaggi (in particolare pecorini per due lavori di ricerca e Parmigiano-Reggiano per un altro). Sono stati utilizzati diversi strumenti di analisi (metodi fisici), in particolare la spettroscopia, l’analisi termica differenziale, il naso elettronico, oltre a metodiche separative tradizionali. I dati ottenuti dalle analisi sono stati trattati mediante diverse tecniche statistiche, soprattutto: minimi quadrati parziali; regressione lineare multipla ed analisi discriminante lineare.
Resumo:
The Large Hadron Collider, located at the CERN laboratories in Geneva, is the largest particle accelerator in the world. One of the main research fields at LHC is the study of the Higgs boson, the latest particle discovered at the ATLAS and CMS experiments. Due to the small production cross section for the Higgs boson, only a substantial statistics can offer the chance to study this particle properties. In order to perform these searches it is desirable to avoid the contamination of the signal signature by the number and variety of the background processes produced in pp collisions at LHC. Much account assumes the study of multivariate methods which, compared to the standard cut-based analysis, can enhance the signal selection of a Higgs boson produced in association with a top quark pair through a dileptonic final state (ttH channel). The statistics collected up to 2012 is not sufficient to supply a significant number of ttH events; however, the methods applied in this thesis will provide a powerful tool for the increasing statistics that will be collected during the next LHC data taking.
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
The SWISSspine registry is the first mandatory registry of its kind in the history of Swiss orthopaedics and it follows the principle of "coverage with evidence development". Its goal is the generation of evidence for a decision by the Swiss federal office of health about reimbursement of the concerned technologies and treatments by the basic health insurance of Switzerland. Recently, developed and clinically implemented, the Dynardi total disc arthroplasty (TDA) accounted for 10% of the implanted lumbar TDAs in the registry. We compared the outcomes of patients treated with Dynardi to those of the recipients of the other TDAs in the registry. Between March 2005 and October 2009, 483 patients with single-level TDA were documented in the registry. The 52 patients with a single Dynardi lumbar disc prosthesis implanted by two surgeons (CE and OS) were compared to the 431 patients who received one of the other prostheses. Data were collected in a prospective, observational multicenter mode. Surgery, implant, 3-month, 1-year, and 2-year follow-up forms as well as comorbidity, NASS and EQ-5D questionnaires were collected. For statistical analyses, the Wilcoxon signed-rank test and chi-square test were used. Multivariate regression analyses were also performed. Significant and clinically relevant reduction of low back pain and leg pain as well as improvement in quality of life was seen in both groups (P < 0.001 postop vs. preop). There were no inter-group differences regarding postoperative pain levels, intraoperative and follow-up complications or revision procedures with a new hospitalization. However, significantly more Dynardi patients achieved a minimum clinically relevant low back pain alleviation of 18 VAS points and a quality of life improvement of 0.25 EQ-5D points. The patients with Dynardi prosthesis showed a similar outcome to patients receiving the other TDAs in terms of postoperative low back and leg pain, complications, and revision procedures. A higher likelihood for achieving a minimum clinically relevant improvement of low back pain and quality of life in Dynardi patients was observed. This difference might be due to the large number of surgeons using other TDAs compared to only two surgeons using the Dynardi TDA, with corresponding variations in patient selection, patient-physician interaction and other factors, which cannot be assessed in a registry study.
Resumo:
The objective of this study was to characterize empirically the association between vaccination coverage and the size and occurrence of measles epidemics in Germany. In order to achieve this we analysed data routinely collected by the Robert Koch Institute, which comprise the weekly number of reported measles cases at all ages as well as estimates of vaccination coverage at the average age of entry into the school system. Coverage levels within each federal state of Germany are incorporated into a multivariate time-series model for infectious disease counts, which captures occasional outbreaks by means of an autoregressive component. The observed incidence pattern of measles for all ages is best described by using the log proportion of unvaccinated school starters in the autoregressive component of the model.
Resumo:
Currently, a variety of linear and nonlinear measures is in use to investigate spatiotemporal interrelation patterns of multivariate time series. Whereas the former are by definition insensitive to nonlinear effects, the latter detect both nonlinear and linear interrelation. In the present contribution we employ a uniform surrogate-based approach, which is capable of disentangling interrelations that significantly exceed random effects and interrelations that significantly exceed linear correlation. The bivariate version of the proposed framework is explored using a simple model allowing for separate tuning of coupling and nonlinearity of interrelation. To demonstrate applicability of the approach to multivariate real-world time series we investigate resting state functional magnetic resonance imaging (rsfMRI) data of two healthy subjects as well as intracranial electroencephalograms (iEEG) of two epilepsy patients with focal onset seizures. The main findings are that for our rsfMRI data interrelations can be described by linear cross-correlation. Rejection of the null hypothesis of linear iEEG interrelation occurs predominantly for epileptogenic tissue as well as during epileptic seizures.
Resumo:
Triggered event-related functional magnetic resonance imaging requires sparse intervals of temporally resolved functional data acquisitions, whose initiation corresponds to the occurrence of an event, typically an epileptic spike in the electroencephalographic trace. However, conventional fMRI time series are greatly affected by non-steady-state magnetization effects, which obscure initial blood oxygen level-dependent (BOLD) signals. Here, conventional echo-planar imaging and a post-processing solution based on principal component analysis were employed to remove the dominant eigenimages of the time series, to filter out the global signal changes induced by magnetization decay and to recover BOLD signals starting with the first functional volume. This approach was compared with a physical solution using radiofrequency preparation, which nullifies magnetization effects. As an application of the method, the detectability of the initial transient BOLD response in the auditory cortex, which is elicited by the onset of acoustic scanner noise, was used to demonstrate that post-processing-based removal of magnetization effects allows to detect brain activity patterns identical with those obtained using the radiofrequency preparation. Using the auditory responses as an ideal experimental model of triggered brain activity, our results suggest that reducing the initial magnetization effects by removing a few principal components from fMRI data may be potentially useful in the analysis of triggered event-related echo-planar time series. The implications of this study are discussed with special caution to remaining technical limitations and the additional neurophysiological issues of the triggered acquisition.