966 resultados para Gaussian Mixture Model
Resumo:
Discriminative training of Gaussian Mixture Models (GMMs) for speech or speaker recognition purposes is usually based on the gradient descent method, in which the iteration step-size, ε, uses to be defined experimentally. In this letter, we derive an equation to adaptively determine ε, by showing that the second-order Newton-Raphson iterative method to find roots of equations is equivalent to the gradient descent algorithm. © 2010 IEEE.
Resumo:
In the composition of this work are present two parts. The first part contains the theory used. The second part contains the two articles. The first article examines two models of the class of generalized linear models for analyzing a mixture experiment, which studied the effect of different diets consist of fat, carbohydrate, and fiber on tumor expression in mammary glands of female rats, given by the ratio mice that had tumor expression in a particular diet. Mixture experiments are characterized by having the effect of collinearity and smaller sample size. In this sense, assuming normality for the answer to be maximized or minimized may be inadequate. Given this fact, the main characteristics of logistic regression and simplex models are addressed. The models were compared by the criteria of selection of models AIC, BIC and ICOMP, simulated envelope charts for residuals of adjusted models, odds ratios graphics and their respective confidence intervals for each mixture component. It was concluded that first article that the simplex regression model showed better quality of fit and narrowest confidence intervals for odds ratio. The second article presents the model Boosted Simplex Regression, the boosting version of the simplex regression model, as an alternative to increase the precision of confidence intervals for the odds ratio for each mixture component. For this, we used the Monte Carlo method for the construction of confidence intervals. Moreover, it is presented in an innovative way the envelope simulated chart for residuals of the adjusted model via boosting algorithm. It was concluded that the Boosted Simplex Regression model was adjusted successfully and confidence intervals for the odds ratio were accurate and lightly more precise than the its maximum likelihood version.
Resumo:
As florestas tropicais da Amazônia historicamente foram alvo de práticas pouco sustentáveis de uso da terra, restando-lhes as cicatrizes de degradação advinda da exploração madeireira predatória, do uso indiscriminado do fogo, das altas taxas de desmatamento e de outras atividades que interferem nas ações de conservação da biodiversidade desta floresta. A atuação do Estado neste cenário é necessária através de políticas que incentivem formas de uso mais sustentáveis, como é o caso das concessões florestais que buscam através do manejo florestal, contribuir para a conservação dos recursos naturais e da manutenção da biodiversidade. A geração de produtos como o Índice de Vegetação por Diferença Normalizada, Modelo Linear de Mistura Espectral e Fração de Abertura de Dossel foram realizados no intuito de criar elementos de interpretação e análise da variável abertura de dossel. Esta pesquisa teve como área de estudo a Unidade de Manejo Florestal I no Conjunto de Glebas Mamuru-Arapiuns, região oeste do estado do Pará; onde foram quantificados e avaliados a abertura de dossel nessa área de concessão florestal, através de imagens multiespectrais e fotos hemisféricas, com vistas a analisar a degradação e a qualidade do manejo executado nesta área. Os resultados obtidos mostraram que é possível estabelecer um processo de monitoramento com o uso dos sensores e técnicas aplicados, uma vez que os dados de MLME, em especial a imagem-fração solo apresentaram forte relação de covariância com os dados obtidos em campo através de fotos hemisféricas, permitindo considera-lo como uma boa ferramenta de alerta para as ações de monitoramentos das florestas amazônicas. Desta forma é possível tornar a gestão florestal mais acessível tanto ao poder público, quanto a entidades não governamentais ou privadas visando fiscalizar as ações de exploração florestal e agregar as populações que vivem nestas áreas tanto oportunidades de renda quanto a conservação florestal.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The water has an important role in human society, especially in Brazil. Its uses are multiple, including supply, energy production, recreation and others. The National Policy for Water Resources (Law No 9.433/97) states in its articles the importance of water use in accordance to their multiple uses, prioritizing the supply for humans and animals. In this approach, it is important to consider the physical and chemical quality of water to meet these demands, scope of the legal framework applied to the Brazilian water bodies according to their main uses, in order to guarantee the water quality compatible with the most demanding uses and to reduce the costs of pollution control through ongoing preventive actions. Among the various parameters that seek to analyze the physical and chemical quality of water it is intended to understand the spatial distribution of turbidity in the lake's surface, since the variation of the components that alter this parameter can be detected by means of passive remote sensing. The application of the Linear spectral mixture model allowed, satisfactorily, the identification of turbidity spatial distribution patterns in the lake.
Resumo:
Conjugated polymers have attracted tremendous academical and industrial research interest over the past decades due to the appealing advantages that organic / polymeric materials offer for electronic applications and devices such as organic light emitting diodes (OLED), organic field effect transistors (OFET), organic solar cells (OSC), photodiodes and plastic lasers. The optimization of organic materials for applications in optoelectronic devices requires detailed knowledge of their photophysical properties, for instance energy levels of excited singlet and triplet states, excited state decay mechanisms and charge carrier mobilities. In the present work a variety of different conjugated (co)polymers, mainly polyspirobifluorene- and polyfluorene-type materials, was investigated using time-resolved photoluminescence spectroscopy in the picosecond to second time domain to study their elementary photophysical properties and to get a deeper insight into structure-property relationships. The experiments cover fluorescence spectroscopy using Streak Camera techniques as well as time-delayed gated detection techniques for the investigation of delayed fluorescence and phosphorescence. All measurements were performed on the solid state, i.e. thin polymer films and on diluted solutions. Starting from the elementary photophysical properties of conjugated polymers the experiments were extended to studies of singlet and triplet energy transfer processes in polymer blends, polymer-triplet emitter blends and copolymers. The phenomenon of photonenergy upconversion was investigated in blue light-emitting polymer matrices doped with metallated porphyrin derivatives supposing an bimolecular annihilation upconversion mechanism which could be experimentally verified on a series of copolymers. This mechanism allows for more efficient photonenergy upconversion than previously reported for polyfluorene derivatives. In addition to the above described spectroscopical experiments, amplified spontaneous emission (ASE) in thin film polymer waveguides was studied employing a fully-arylated poly(indenofluorene) as the gain medium. It was found that the material exhibits a very low threshold value for amplification of blue light combined with an excellent oxidative stability, which makes it interesting as active material for organic solid state lasers. Apart from spectroscopical experiments, transient photocurrent measurements on conjugated polymers were performed as well to elucidate the charge carrier mobility in the solid state, which is an important material parameter for device applications. A modified time-of-flight (TOF) technique using a charge carrier generation layer allowed to study hole transport in a series of spirobifluorene copolymers to unravel the structure-mobility relationship by comparison with the homopolymer. Not only the charge carrier mobility could be determined for the series of polymers but also field- and temperature-dependent measurements analyzed in the framework of the Gaussian disorder model showed that results coincide very well with the predictions of the model. Thus, the validity of the disorder concept for charge carrier transport in amorphous glassy materials could be verified for the investigated series of copolymers.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
Estimation of the number of mixture components (k) is an unsolved problem. Available methods for estimation of k include bootstrapping the likelihood ratio test statistics and optimizing a variety of validity functionals such as AIC, BIC/MDL, and ICOMP. We investigate the minimization of distance between fitted mixture model and the true density as a method for estimating k. The distances considered are Kullback-Leibler (KL) and “L sub 2”. We estimate these distances using cross validation. A reliable estimate of k is obtained by voting of B estimates of k corresponding to B cross validation estimates of distance. This estimation methods with KL distance is very similar to Monte Carlo cross validated likelihood methods discussed by Smyth (2000). With focus on univariate normal mixtures, we present simulation studies that compare the cross validated distance method with AIC, BIC/MDL, and ICOMP. We also apply the cross validation estimate of distance approach along with AIC, BIC/MDL and ICOMP approach, to data from an osteoporosis drug trial in order to find groups that differentially respond to treatment.
Resumo:
This paper addresses the issue of matching statistical and non-rigid shapes, and introduces an Expectation Conditional Maximization-based deformable shape registration (ECM-DSR) algorithm. Similar to previous works, we cast the statistical and non-rigid shape registration problem into a missing data framework and handle the unknown correspondences with Gaussian Mixture Models (GMM). The registration problem is then solved by fitting the GMM centroids to the data. But unlike previous works where equal isotropic covariances are used, our new algorithm uses heteroscedastic covariances whose values are iteratively estimated from the data. A previously introduced virtual observation concept is adopted here to simplify the estimation of the registration parameters. Based on this concept, we derive closed-form solutions to estimate parameters for statistical or non-rigid shape registrations in each iteration. Our experiments conducted on synthesized and real data demonstrate that the ECM-DSR algorithm has various advantages over existing algorithms.
Resumo:
BACKGROUND AND OBJECTIVES We aimed to study the impact of size, maturation and cytochrome P450 2D6 (CYP2D6) genotype activity score as predictors of intravenous tramadol disposition. METHODS Tramadol and O-desmethyl tramadol (M1) observations in 295 human subjects (postmenstrual age 25 weeks to 84.8 years, weight 0.5-186 kg) were pooled. A population pharmacokinetic analysis was performed using a two-compartment model for tramadol and two additional M1 compartments. Covariate analysis included weight, age, sex, disease characteristics (healthy subject or patient) and CYP2D6 genotype activity. A sigmoid maturation model was used to describe age-related changes in tramadol clearance (CLPO), M1 formation clearance (CLPM) and M1 elimination clearance (CLMO). A phenotype-based mixture model was used to identify CLPM polymorphism. RESULTS Differences in clearances were largely accounted for by maturation and size. The time to reach 50 % of adult clearance (TM50) values was used to describe maturation. CLPM (TM50 39.8 weeks) and CLPO (TM50 39.1 weeks) displayed fast maturation, while CLMO matured slower, similar to glomerular filtration rate (TM50 47 weeks). The phenotype-based mixture model identified a slow and a faster metabolizer group. Slow metabolizers comprised 9.8 % of subjects with 19.4 % of faster metabolizer CLPM. Low CYP2D6 genotype activity was associated with lower (25 %) than faster metabolizer CLPM, but only 32 % of those with low genotype activity were in the slow metabolizer group. CONCLUSIONS Maturation and size are key predictors of variability. A two-group polymorphism was identified based on phenotypic M1 formation clearance. Maturation of tramadol elimination occurs early (50 % of adult value at term gestation).
Resumo:
Motivation: Population allele frequencies are correlated when populations have a shared history or when they exchange genes. Unfortunately, most models for allele frequency and inference about population structure ignore this correlation. Recent analytical results show that among populations, correlations can be very high, which could affect estimates of population genetic structure. In this study, we propose a mixture beta model to characterize the allele frequency distribution among populations. This formulation incorporates the correlation among populations as well as extending the model to data with different clusters of populations. Results: Using simulated data, we show that in general, the mixture model provides a good approximation of the among-population allele frequency distribution and a good estimate of correlation among populations. Results from fitting the mixture model to a dataset of genotypes at 377 autosomal microsatellite loci from human populations indicate high correlation among populations, which may not be appropriate to neglect. Traditional measures of population structure tend to over-estimate the amount of genetic differentiation when correlation is neglected. Inference is performed in a Bayesian framework.
Resumo:
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. Many recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. The current study incorporated gene network information into gene-based analysis of GWAS data for Crohn's disease (CD). The purpose was to develop statistical models to boost the power of identifying disease-associated genes and gene subnetworks by maximizing the use of existing biological knowledge from multiple sources. The results revealed that Markov random field (MRF) based mixture model incorporating direct neighborhood information from a single gene network is not efficient in identifying CD-related genes based on the GWAS data. The incorporation of solely direct neighborhood information might lead to the low efficiency of these models. Alternative MRF models looking beyond direct neighboring information are necessary to be developed in the future for the purpose of this study.^
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Hydrothermal emission of mantle helium appears to be directly related to magma production rate, but other processes can generate methane and hydrogen on mid-ocean ridges. In an on-going effort to characterize these processes in the South Atlantic, the flux and distribution of these gases were investigated in the vicinity of a powerful black smoker recently discovered at 8°17.9' S, 13°30.4' W. The vent lies on the shoulder of an oblique offset in the Mid-Atlantic Ridge and discharges high concentrations of methane and hydrogen. Measurements during expeditions in 2004 and 2006 show that the ratio of CH4 to 3He in the neutrally buoyant plume is quite high, 4 x 10**8. The CTD stations were accompanied by velocity measurements with lowered acoustic Doppler current profilers (LADCP), and from these data we estimate the methane transport to have been 0.5 mol/sec in a WSW-trending plume that seems to develop during the ebb tidal phase. This transport is an order of magnitude greater than the source of CH4 calculated from its concentration in the vent fluid and the rise height of the plume. From this range of methane fluxes, the source of 3He is estimated to be between 0.14 and 1.2 nmol/sec. In either case, the 3He source is significantly lower than expected from the spreading rate of the Mid-Atlantic Ridge. From the inventory of methane in the rift valley adjacent to the vent, it appears that the average specific rate of oxidation is 2.6 to 23/yr, corresponding to a turnover time between 140 and 16 days. Vertical profiles of methane in the surrounding region often exhibited Gaussian-like distributions, and the variances appear to increase with distance from the vent. Using a Gaussian plume model, we obtained a range of vertical eddy diffusivities between 0.009 and 0.08 m2m2/sec. These high values may be due to tidally driven internal waves across the promontory on which the vent is located.