927 resultados para Multivariate statistical methods


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Concentrations of 39 organic compounds were determined in three fractions (head, heart and tail) obtained from the pot still distillation of fermented sugarcane juice. The results were evaluated using analysis of variance (ANOVA), Tukey's test, principal component analysis (PCA), hierarchical cluster analysis (HCA) and linear discriminant analysis (LDA). According to PCA and HCA, the experimental data lead to the formation of three clusters. The head fractions give rise to a more defined group. The heart and tail fractions showed some overlap consistent with its acid composition. The predictive ability of calibration and validation of the model generated by LDA for the three fractions classification were 90.5 and 100%, respectively. This model recognized as the heart twelve of the thirteen commercial cachacas (92.3%) with good sensory characteristics, thus showing potential for guiding the process of cuts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As concentrações de 39 compostos orgânicos foram determinadas em três frações (cabeça, coração e cauda) obtidas da destilação em alambique do caldo de cana fermentado. Os resultados foram avaliados utilizando-se análise de variância (ANOVA), teste de Tukey, análise de componentes principais (PCA), agrupamento hierárquico (HCA) e análise discriminante linear (LDA). De acordo com PCA e HCA, os dados experimentais conduzem à formação de três agrupamentos. As frações de cabeça deram origem a um grupo mais definido. As frações coração e cauda apresentaram alguma sobreposição coerente com sua composição em ácidos. As habilidades preditivas de calibração e validação dos modelos gerados pela LDA para a classificação das três frações foram de 90,5 e 100%, respectivamente. Este modelo reconheceu como coração doze de treze cachaças comerciais (92,3%) com boas características sensoriais, apresentando potencial para a orientação do processo de cortes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis some multivariate spectroscopic methods for the analysis of solutions are proposed. Spectroscopy and multivariate data analysis form a powerful combination for obtaining both quantitative and qualitative information and it is shown how spectroscopic techniques in combination with chemometric data evaluation can be used to obtain rapid, simple and efficient analytical methods. These spectroscopic methods consisting of spectroscopic analysis, a high level of automation and chemometric data evaluation can lead to analytical methods with a high analytical capacity, and for these methods, the term high-capacity analysis (HCA) is suggested. It is further shown how chemometric evaluation of the multivariate data in chromatographic analyses decreases the need for baseline separation. The thesis is based on six papers and the chemometric tools used are experimental design, principal component analysis (PCA), soft independent modelling of class analogy (SIMCA), partial least squares regression (PLS) and parallel factor analysis (PARAFAC). The analytical techniques utilised are scanning ultraviolet-visible (UV-Vis) spectroscopy, diode array detection (DAD) used in non-column chromatographic diode array UV spectroscopy, high-performance liquid chromatography with diode array detection (HPLC-DAD) and fluorescence spectroscopy. The methods proposed are exemplified in the analysis of pharmaceutical solutions and serum proteins. In Paper I a method is proposed for the determination of the content and identity of the active compound in pharmaceutical solutions by means of UV-Vis spectroscopy, orthogonal signal correction and multivariate calibration with PLS and SIMCA classification. Paper II proposes a new method for the rapid determination of pharmaceutical solutions by the use of non-column chromatographic diode array UV spectroscopy, i.e. a conventional HPLC-DAD system without any chromatographic column connected. In Paper III an investigation is made of the ability of a control sample, of known content and identity to diagnose and correct errors in multivariate predictions something that together with use of multivariate residuals can make it possible to use the same calibration model over time. In Paper IV a method is proposed for simultaneous determination of serum proteins with fluorescence spectroscopy and multivariate calibration. Paper V proposes a method for the determination of chromatographic peak purity by means of PCA of HPLC-DAD data. In Paper VI PARAFAC is applied for the decomposition of DAD data of some partially separated peaks into the pure chromatographic, spectral and concentration profiles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis two major topics inherent with medical ultrasound images are addressed: deconvolution and segmentation. In the first case a deconvolution algorithm is described allowing statistically consistent maximum a posteriori estimates of the tissue reflectivity to be restored. These estimates are proven to provide a reliable source of information for achieving an accurate characterization of biological tissues through the ultrasound echo. The second topic involves the definition of a semi automatic algorithm for myocardium segmentation in 2D echocardiographic images. The results show that the proposed method can reduce inter- and intra observer variability in myocardial contours delineation and is feasible and accurate even on clinical data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Large Hadron Collider, located at the CERN laboratories in Geneva, is the largest particle accelerator in the world. One of the main research fields at LHC is the study of the Higgs boson, the latest particle discovered at the ATLAS and CMS experiments. Due to the small production cross section for the Higgs boson, only a substantial statistics can offer the chance to study this particle properties. In order to perform these searches it is desirable to avoid the contamination of the signal signature by the number and variety of the background processes produced in pp collisions at LHC. Much account assumes the study of multivariate methods which, compared to the standard cut-based analysis, can enhance the signal selection of a Higgs boson produced in association with a top quark pair through a dileptonic final state (ttH channel). The statistics collected up to 2012 is not sufficient to supply a significant number of ttH events; however, the methods applied in this thesis will provide a powerful tool for the increasing statistics that will be collected during the next LHC data taking.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Questa tesi si inserisce nell'ambito delle analisi statistiche e dei metodi stocastici applicati all'analisi delle sequenze di DNA. Nello specifico il nostro lavoro è incentrato sullo studio del dinucleotide CG (CpG) all'interno del genoma umano, che si trova raggruppato in zone specifiche denominate CpG islands. Queste sono legate alla metilazione del DNA, un processo che riveste un ruolo fondamentale nella regolazione genica. La prima parte dello studio è dedicata a una caratterizzazione globale del contenuto e della distribuzione dei 16 diversi dinucleotidi all'interno del genoma umano: in particolare viene studiata la distribuzione delle distanze tra occorrenze successive dello stesso dinucleotide lungo la sequenza. I risultati vengono confrontati con diversi modelli nulli: sequenze random generate con catene di Markov di ordine zero (basate sulle frequenze relative dei nucleotidi) e uno (basate sulle probabilità di transizione tra diversi nucleotidi) e la distribuzione geometrica per le distanze. Da questa analisi le proprietà caratteristiche del dinucleotide CpG emergono chiaramente, sia dal confronto con gli altri dinucleotidi che con i modelli random. A seguito di questa prima parte abbiamo scelto di concentrare le successive analisi in zone di interesse biologico, studiando l’abbondanza e la distribuzione di CpG al loro interno (CpG islands, promotori e Lamina Associated Domains). Nei primi due casi si osserva un forte arricchimento nel contenuto di CpG, e la distribuzione delle distanze è spostata verso valori inferiori, indicando che questo dinucleotide è clusterizzato. All’interno delle LADs si trovano mediamente meno CpG e questi presentano distanze maggiori. Infine abbiamo adottato una rappresentazione a random walk del DNA, costruita in base al posizionamento dei dinucleotidi: il walk ottenuto presenta caratteristiche drasticamente diverse all’interno e all’esterno di zone annotate come CpG island. Riteniamo pertanto che metodi basati su questo approccio potrebbero essere sfruttati per migliorare l’individuazione di queste aree di interesse nel genoma umano e di altri organismi.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex human diseases are a major challenge for biological research. The goal of my research is to develop effective methods for biostatistics in order to create more opportunities for the prevention and cure of human diseases. This dissertation proposes statistical technologies that have the ability of being adapted to sequencing data in family-based designs, and that account for joint effects as well as gene-gene and gene-environment interactions in the GWA studies. The framework includes statistical methods for rare and common variant association studies. Although next-generation DNA sequencing technologies have made rare variant association studies feasible, the development of powerful statistical methods for rare variant association studies is still underway. Chapter 2 demonstrates two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. The results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning [2009]. In Chapter 3, I extended the previously proposed test for Testing the effect of an Optimally Weighted combination of variants (TOW) [Sha et al., 2012] for unrelated individuals to TOW &ndash F, TOW for Family &ndash based design. Simulation results show that TOW &ndash F can control for population stratification in wide range of population structures including spatially structured populations, is robust to the directions of effect of causal variants, and is relatively robust to percentage of neutral variants. In GWA studies, this dissertation consists of a two &ndash locus joint effect analysis and a two-stage approach accounting for gene &ndash gene and gene &ndash environment interaction. Chapter 4 proposes a novel two &ndash stage approach, which is promising to identify joint effects, especially for monotonic models. The proposed approach outperforms a single &ndash marker method and a regular two &ndash stage analysis based on the two &ndash locus genotypic test. In Chapter 5, I proposed a gene &ndash based two &ndash stage approach to identify gene &ndash gene and gene &ndash environment interactions in GWA studies which can include rare variants. The two &ndash stage approach is applied to the GAW 17 dataset to identify the interaction between KDR gene and smoking status.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the development of genotyping and next-generation sequencing technologies, multi-marker testing in genome-wide association study and rare variant association study became active research areas in statistical genetics. This dissertation contains three methodologies for association study by exploring different genetic data features and demonstrates how to use those methods to test genetic association hypothesis. The methods can be categorized into in three scenarios: 1) multi-marker testing for strong Linkage Disequilibrium regions, 2) multi-marker testing for family-based association studies, 3) multi-marker testing for rare variant association study. I also discussed the advantage of using these methods and demonstrated its power by simulation studies and applications to real genetic data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Calcium levels in spines play a significant role in determining the sign and magnitude of synaptic plasticity. The magnitude of calcium influx into spines is highly dependent on influx through N-methyl D-aspartate (NMDA) receptors, and therefore depends on the number of postsynaptic NMDA receptors in each spine. We have calculated previously how the number of postsynaptic NMDA receptors determines the mean and variance of calcium transients in the postsynaptic density, and how this alters the shape of plasticity curves. However, the number of postsynaptic NMDA receptors in the postsynaptic density is not well known. Anatomical methods for estimating the number of NMDA receptors produce estimates that are very different than those produced by physiological techniques. The physiological techniques are based on the statistics of synaptic transmission and it is difficult to experimentally estimate their precision. In this paper we use stochastic simulations in order to test the validity of a physiological estimation technique based on failure analysis. We find that the method is likely to underestimate the number of postsynaptic NMDA receptors, explain the source of the error, and re-derive a more precise estimation technique. We also show that the original failure analysis as well as our improved formulas are not robust to small estimation errors in key parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The talk starts out with a short introduction to the philosophy of probability. I highlight the need to interpret probabilities in the sciences and motivate objectivist accounts of probabilities. Very roughly, according to such accounts, ascriptions of probabilities have truth-conditions that are independent of personal interests and needs. But objectivist accounts are pointless if they do not provide an objectivist epistemology, i.e., if they do not determine well-defined methods to support or falsify claims about probabilities. In the rest of the talk I examine recent philosophical proposals for an objectivist methodology. Most of them take up ideas well-known from statistics. I nevertheless find some proposals incompatible with objectivist aspirations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most studies of differential gene-expressions have been conducted between two given conditions. The two-condition experimental (TCE) approach is simple in that all genes detected display a common differential expression pattern responsive to a common two-condition difference. Therefore, the genes that are differentially expressed under the other conditions other than the given two conditions are undetectable with the TCE approach. In order to address the problem, we propose a new approach called multiple-condition experiment (MCE) without replication and develop corresponding statistical methods including inference of pairs of conditions for genes, new t-statistics, and a generalized multiple-testing method for any multiple-testing procedure via a control parameter C. We applied these statistical methods to analyze our real MCE data from breast cancer cell lines and found that 85 percent of gene-expression variations were caused by genotypic effects and genotype-ANAX1 overexpression interactions, which agrees well with our expected results. We also applied our methods to the adenoma dataset of Notterman et al. and identified 93 differentially expressed genes that could not be found in TCE. The MCE approach is a conceptual breakthrough in many aspects: (a) many conditions of interests can be conducted simultaneously; (b) study of association between differential expressions of genes and conditions becomes easy; (c) it can provide more precise information for molecular classification and diagnosis of tumors; (d) it can save lot of experimental resources and time for investigators.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper defines and compares several models for describing excess influenza pneumonia mortality in Houston. First, the methodology used by the Center for Disease Control is examined and several variations of this methodology are studied. All of the models examined emphasize the difficulty of omitting epidemic weeks.^ In an attempt to find a better method of describing expected and epidemic mortality, time series methods are examined. Grouping in four-week periods, truncating the data series to adjust epidemic periods, and seasonally-adjusting the series y(,t), by:^ (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)^ is the best method examined. This new series w(,t) is stationary and a moving average model MA(1) gives a good fit for forecasting influenza and pneumonia mortality in Houston.^ Influenza morbidity, other causes of death, sex, race, age, climate variables, environmental factors, and school absenteeism are all examined in terms of their relationship to influenza and pneumonia mortality. Both influenza morbidity and ischemic heart disease mortality show a very high relationship that remains when seasonal trends are removed from the data. However, when jointly modeling the three series it is obvious that the simple time series MA(1) model of truncated, seasonally-adjusted four-week data gives a better forecast.^