899 resultados para two-Gaussian mixture model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Letter addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album [1]. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework [2]. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes difficulties with the introduction of object-oriented concepts in introductory computing education and then proposes a two-language, two-paradigm curriculum model that alleviates such difficulties. Our two-language, two-paradigm curriculum model begins with teaching imperative programming using Python programming language, continues with teaching object-oriented computing using Java, and concludes with teaching object-oriented data structures with Java.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62F15.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Environmental impacts of wind energy facilities increasingly cause concern, a central issue being bats and birds killed by rotor blades. Two approaches have been employed to assess collision rates: carcass searches and surveys of animals prone to collisions. Carcass searches can provide an estimate for the actual number of animals being killed but they offer little information on the relation between collision rates and, for example, weather parameters due to the time of death not being precisely known. In contrast, a density index of animals exposed to collision is sufficient to analyse the parameters influencing the collision rate. However, quantification of the collision rate from animal density indices (e.g. acoustic bat activity or bird migration traffic rates) remains difficult. We combine carcass search data with animal density indices in a mixture model to investigate collision rates. In a simulation study we show that the collision rates estimated by our model were at least as precise as conventional estimates based solely on carcass search data. Furthermore, if certain conditions are met, the model can be used to predict the collision rate from density indices alone, without data from carcass searches. This can reduce the time and effort required to estimate collision rates. We applied the model to bat carcass search data obtained at 30 wind turbines in 15 wind facilities in Germany. We used acoustic bat activity and wind speed as predictors for the collision rate. The model estimates correlated well with conventional estimators. Our model can be used to predict the average collision rate. It enables an analysis of the effect of parameters such as rotor diameter or turbine type on the collision rate. The model can also be used in turbine-specific curtailment algorithms that predict the collision rate and reduce this rate with a minimal loss of energy production.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Li-ion rechargeable battery (LIB) is widely used as an energy storage device, but has significant limitations in battery cycle life and safety. During initial charging, decomposition of the ethylene carbonate (EC)-based electrolytes of the LIB leads to the formation of a passivating layer on the anode known as the solid electrolyte interphase (SEI). The formation of an SEI has great impact on the cycle life and safety of LIB, yet mechanistic aspects of SEI formation are not fully understood. In this dissertation, two surface science model systems have been created under ultra-high vacuum (UHV) to probe the very initial stage of SEI formation at the model carbon anode surfaces of LIB. The first model system, Model System I, is an lithium-carbonate electrolyte/graphite C(0001) system. I have developed a temperature programmed desorption/temperature programmed reaction spectroscopy (TPD/TPRS) instrument as part of my dissertation to study Model System I in quantitative detail. The binding strengths and film growth mechanisms of key electrolyte molecules on model carbon anode surfaces with varying extents of lithiation were measured by TPD. TPRS was further used to track the gases evolved from different reduction products in the early-stage SEI formation. The branching ratio of multiple reaction pathways was quantified for the first time and determined to be 70.% organolithium products vs. 30% inorganic lithium product. The obtained branching ratio provides important information on the distribution of lithium salts that form at the very onset of SEI formation. One of the key reduction products formed from EC in early-stage SEI formation is lithium ethylene dicarbonate (LEDC). Despite intensive studies, the LEDC structure in either the bulk or thin-film (SEI) form is unknown. To enable structural study, pure LEDC was synthesized and subject to synchrotron X-ray diffraction measurements (bulk material) and STM measurements (deposited films). To enable studies of LEDC thin films, Model System II, a lithium ethylene dicarbonate (LEDC)-dimethylformamide (DMF)/Ag(111) system was created by a solution microaerosol deposition technique. Produced films were then imaged by ultra-high vacuum scanning tunneling microscopy (UHV-STM). As a control, the dimethylformamide (DMF)-Ag(111) system was first prepared and its complex 2D phase behavior was mapped out as a function of coverage. The evolution of three distinct monolayer phases of DMF was observed with increasing surface pressure — a 2D gas phase, an ordered DMF phase, and an ordered Ag(DMF)2 complex phase. The addition of LEDC to this mixture, seeded the nucleation of the ordered DMF islands at lower surface pressures (DMF coverages), and was interpreted through nucleation theory. A structural model of the nucleation seed was proposed, and the implication of ionic SEI products, such as LEDC, in early-stage SEI formation was discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Survival models are being widely applied to the engineering field to model time-to-event data once censored data is here a common issue. Using parametric models or not, for the case of heterogeneous data, they may not always represent a good fit. The present study relays on critical pumps survival data where traditional parametric regression might be improved in order to obtain better approaches. Considering censored data and using an empiric method to split the data into two subgroups to give the possibility to fit separated models to our censored data, we’ve mixture two distinct distributions according a mixture-models approach. We have concluded that it is a good method to fit data that does not fit to a usual parametric distribution and achieve reliable parameters. A constant cumulative hazard rate policy was used as well to check optimum inspection times using the obtained model from the mixture-model, which could be a plus when comparing with the actual maintenance policies to check whether changes should be introduced or not.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlikeMel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel– Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, the issue of incorporating uncertainty for environmental modelling informed by imagery is explored by considering uncertainty in deterministic modelling, measurement uncertainty and uncertainty in image composition. Incorporating uncertainty in deterministic modelling is extended for use with imagery using the Bayesian melding approach. In the application presented, slope steepness is shown to be the main contributor to total uncertainty in the Revised Universal Soil Loss Equation. A spatial sampling procedure is also proposed to assist in implementing Bayesian melding given the increased data size with models informed by imagery. Measurement error models are another approach to incorporating uncertainty when data is informed by imagery. These models for measurement uncertainty, considered in a Bayesian conditional independence framework, are applied to ecological data generated from imagery. The models are shown to be appropriate and useful in certain situations. Measurement uncertainty is also considered in the context of change detection when two images are not co-registered. An approach for detecting change in two successive images is proposed that is not affected by registration. The procedure uses the Kolmogorov-Smirnov test on homogeneous segments of an image to detect change, with the homogeneous segments determined using a Bayesian mixture model of pixel values. Using the mixture model to segment an image also allows for uncertainty in the composition of an image. This thesis concludes by comparing several different Bayesian image segmentation approaches that allow for uncertainty regarding the allocation of pixels to different ground components. Each segmentation approach is applied to a data set of chlorophyll values and shown to have different benefits and drawbacks depending on the aims of the analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the goal of identifying disease subgroups based on differences in observed symptom profile is considered. Commonly referred to as phenotype identification, solutions to this task often involve the application of unsupervised clustering techniques. In this paper, we investigate the application of a Dirichlet Process mixture (DPM) model for this task. This model is defined by the placement of the Dirichlet Process (DP) on the unknown components of a mixture model, allowing for the expression of uncertainty about the partitioning of observed data into homogeneous subgroups. To exemplify this approach, an application to phenotype identification in Parkinson’s disease (PD) is considered, with symptom profiles collected using the Unified Parkinson’s Disease Rating Scale (UPDRS). Clustering, Dirichlet Process mixture, Parkinson’s disease, UPDRS.