918 resultados para Latent class model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recurrent wheezing or asthma is a common problem in children that has increased considerably in prevalence in the past few decades. The causes and underlying mechanisms are poorly understood and it is thought that a numb er of distinct diseases causing similar symptoms are involved. Due to the lack of a biologically founded classification system, children are classified according to their observed disease related features (symptoms, signs, measurements) into phenotypes. The objectives of this PhD project were a) to develop tools for analysing phenotypic variation of a disease, and b) to examine phenotypic variability of wheezing among children by applying these tools to existing epidemiological data. A combination of graphical methods (multivariate co rrespondence analysis) and statistical models (latent variables models) was used. In a first phase, a model for discrete variability (latent class model) was applied to data on symptoms and measurements from an epidemiological study to identify distinct phenotypes of wheezing. In a second phase, the modelling framework was expanded to include continuous variability (e.g. along a severity gradient) and combinations of discrete and continuo us variability (factor models and factor mixture models). The third phase focused on validating the methods using simulation studies. The main body of this thesis consists of 5 articles (3 published, 1 submitted and 1 to be submitted) including applications, methodological contributions and a review. The main findings and contributions were: 1) The application of a latent class model to epidemiological data (symptoms and physiological measurements) yielded plausible pheno types of wheezing with distinguishing characteristics that have previously been used as phenotype defining characteristics. 2) A method was proposed for including responses to conditional questions (e.g. questions on severity or triggers of wheezing are asked only to children with wheeze) in multivariate modelling.ii 3) A panel of clinicians was set up to agree on a plausible model for wheezing diseases. The model can be used to generate datasets for testing the modelling approach. 4) A critical review of methods for defining and validating phenotypes of wheeze in children was conducted. 5) The simulation studies showed that a parsimonious parameterisation of the models is required to identify the true underlying structure of the data. The developed approach can deal with some challenges of real-life cohort data such as variables of mixed mode (continuous and categorical), missing data and conditional questions. If carefully applied, the approach can be used to identify whether the underlying phenotypic variation is discrete (classes), continuous (factors) or a combination of these. These methods could help improve precision of research into causes and mechanisms and contribute to the development of a new classification of wheezing disorders in children and other diseases which are difficult to classify.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While many offline retailers have developed informational websites that offer information on products and prices, the key question for such informational websites is whether they can increase revenues via web-to-store shopping. The current paper draws on the information search literature to specify and test hypotheses regarding the offline revenue impact of adding an informational website. Explicitly considering marketing efforts, a latent class model distinguishes consumer segments with different short-term revenue effects, while a Vector Autoregressive model on these segments reveals different long-term marketing response. We find that the offline revenue impact of the informational website critically depends on the product category and customer segment. The lower online search costs are especially beneficial for sensory products and for customers distant from the store. Moreover, offline revenues increase most for customers with high web visit frequency. We find that customers in some segments buy more and more expensive products, suggesting that online search and offline purchases are complements. In contrast, customers in a particular segment reduce their shopping trips, suggesting their online activities partially substitute for experiential shopping in the physical store. Hence, offline retailers should use specific online activities to target specific product categories and customer segments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A cikkben paneladatok segítségével a magyar gabonatermesztő üzemek 2001 és 2009 közötti technikai hatékonyságát vizsgáljuk. A technikai hatékonyság szintjének becslésére egy hagyományos sztochasztikus határok modell (SFA) mellett a látens csoportok modelljét (LCM) használjuk, amely figyelembe veszi a technológiai különbségeket is. Eredményeink arra utalnak, hogy a technológiai heterogenitás fontos lehet egy olyan ágazatban is, mint a szántóföldi növénytermesztés, ahol viszonylag homogén technológiát alkalmaznak. A hagyományos, azonos technológiát feltételező és a látens osztályok modelljeinek összehasonlítása azt mutatja, hogy a gabonatermesztő üzemek technikai hatékonyságát a hagyományos modellek alábecsülhetik. _____ The article sets out to analyse the technical efficiency of Hungarian crop farms between 2001 and 2009, using panel data and employing both standard stochastic frontier analysis and the latent class model (LCM) to estimate technical efficiency. The findings suggest that technological heterogeneity plays an important role in the crop sector, though it is traditionally assumed to employ homogenous technology. A comparison of standard SFA models that assumes the technology is common to all farms and LCM estimates highlights the way the efficiency of crop farms can be underestimated using traditional SFA models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Leishmaniasis, caused by Leishmania infantum, is a vector-borne zoonotic disease that is endemic to the Mediterranean basin. The potential of rabbits and hares to serve as competent reservoirs for the disease has recently been demonstrated, although assessment of the importance of their role on disease dynamics is hampered by the absence of quantitative knowledge on the accuracy of diagnostic techniques in these species. A Bayesian latent-class model was used here to estimate the sensitivity and specificity of the Immuno-fluorescence antibody test (IFAT) in serum and a Leishmania-nested PCR (Ln-PCR) in skin for samples collected from 217 rabbits and 70 hares from two different populations in the region of Madrid, Spain. A two-population model, assuming conditional independence between test results and incorporating prior information on the performance of the tests in other animal species obtained from the literature, was used. Two alternative cut-off values were assumed for the interpretation of the IFAT results: 1/50 for conservative and 1/25 for sensitive interpretation. Results suggest that sensitivity and specificity of the IFAT were around 70–80%, whereas the Ln-PCR was highly specific (96%) but had a limited sensitivity (28.9% applying the conservative interpretation and 21.3% with the sensitive one). Prevalence was higher in the rabbit population (50.5% and 72.6%, for the conservative and sensitive interpretation, respectively) than in hares (6.7% and 13.2%). Our results demonstrate that the IFAT may be a useful screening tool for diagnosis of leishmaniasis in rabbits and hares. These results will help to design and implement surveillance programmes in wild species, with the ultimate objective of early detecting and preventing incursions of the disease into domestic and human populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Working time has been among the first aspect of the employment relation to be the object of intense regulation at the national and supra-national level. This standard regulation of working time comprised a number of elements: full-time hours, rigid working schedules, strong employers’ control and clear boundaries around working time In spite of general claims about the erosion of this model, few studies have investigated this process in a comparative and empirical perspective. The aim of this paper is to investigate the diversity of working time arrangements in European economies by applying latent class analysis to data
from the European Working Conditions Survey (EWCS). This analysis shows the existence of six different types of working time organization highlighting five cross-national patterns: multiple flexibilities, extended flexibility, standard, rigid and fragmented time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures and relationships in the dataset. However, a single two-dimensional visualisation may not display all the intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model (MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein electrostatic potentials for the ‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases, visual observation and the quantitative quality measures have shown better visualisation at lower levels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the business model complexity of Irish credit unions using a latent class approach to measure structural performance over the period 2002 to 2013. The latent class approach allows the endogenous identification of a multi-class framework for business models based on credit union specific characteristics. The analysis finds a three class system to be appropriate with the multi-class model dependent on three financial viability characteristics. This finding is consistent with the deliberations of the Irish Commission on Credit Unions (2012) which identified complexity and diversity in the business models of Irish credit unions and recommended that such complexity and diversity could not be accommodated within a one size fits all regulatory framework. The analysis also highlights that two of the classes are subject to diseconomies of scale. This may suggest credit unions would benefit from a reduction in scale or perhaps that there is an imbalance in the present change process. Finally, relative performance differences are identified for each class in terms of technical efficiency. This suggests that there is an opportunity for credit unions to improve their performance by using within-class best practice or alternatively by switching to another class.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Longitudinal data, where data are repeatedly observed or measured on a temporal basis of time or age provides the foundation of the analysis of processes which evolve over time, and these can be referred to as growth or trajectory models. One of the traditional ways of looking at growth models is to employ either linear or polynomial functional forms to model trajectory shape, and account for variation around an overall mean trend with the inclusion of random eects or individual variation on the functional shape parameters. The identification of distinct subgroups or sub-classes (latent classes) within these trajectory models which are not based on some pre-existing individual classification provides an important methodology with substantive implications. The identification of subgroups or classes has a wide application in the medical arena where responder/non-responder identification based on distinctly diering trajectories delivers further information for clinical processes. This thesis develops Bayesian statistical models and techniques for the identification of subgroups in the analysis of longitudinal data where the number of time intervals is limited. These models are then applied to a single case study which investigates the neuropsychological cognition for early stage breast cancer patients undergoing adjuvant chemotherapy treatment from the Cognition in Breast Cancer Study undertaken by the Wesley Research Institute of Brisbane, Queensland. Alternative formulations to the linear or polynomial approach are taken which use piecewise linear models with a single turning point, change-point or knot at a known time point and latent basis models for the non-linear trajectories found for the verbal memory domain of cognitive function before and after chemotherapy treatment. Hierarchical Bayesian random eects models are used as a starting point for the latent class modelling process and are extended with the incorporation of covariates in the trajectory profiles and as predictors of class membership. The Bayesian latent basis models enable the degree of recovery post-chemotherapy to be estimated for short and long-term followup occasions, and the distinct class trajectories assist in the identification of breast cancer patients who maybe at risk of long-term verbal memory impairment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article explores the use of probabilistic classification, namely finite mixture modelling, for identification of complex disease phenotypes, given cross-sectional data. In particular, if focuses on posterior probabilities of subgroup membership, a standard output of finite mixture modelling, and how the quantification of uncertainty in these probabilities can lead to more detailed analyses. Using a Bayesian approach, we describe two practical uses of this uncertainty: (i) as a means of describing a person’s membership to a single or multiple latent subgroups and (ii) as a means of describing identified subgroups by patient-centred covariates not included in model estimation. These proposed uses are demonstrated on a case study in Parkinson’s disease (PD), where latent subgroups are identified using multiple symptoms from the Unified Parkinson’s Disease Rating Scale (UPDRS).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This is a methodological paper describing when and how manifest items dropped from a latent construct measurement model (e.g., factor analysis) can be retained for additional analysis. Presented are protocols for assessment for retention in the measurement model, evaluation of dropped items as potential items separate from the latent construct, and post hoc analyses that can be conducted using all retained (manifest or latent) variables. The protocols are then applied to data relating to the impact of the NAPLAN test. The variables examined are teachers’ achievement goal orientations and teachers’ perceptions of the impact of the test on curriculum and pedagogy. It is suggested that five attributes be considered before retaining dropped manifest items for additional analyses. (1) Items can be retained when employed in service of an established or hypothesized theoretical model. (2) Items should only be retained if sufficient variance is present in the data set. (3) Items can be retained when they provide a rational segregation of the data set into subsamples (e.g., a consensus measure). (4) The value of retaining items can be assessed using latent class analysis or latent mean analysis. (5) Items should be retained only when post hoc analyses with these items produced significant and substantive results. These suggested exploratory strategies are presented so that other researchers using survey instruments might explore their data in similar and more innovative ways. Finally, suggestions for future use are provided.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Crashes at any particular transport network location consist of a chain of events arising from a multitude of potential causes and/or contributing factors whose nature is likely to reflect geometric characteristics of the road, spatial effects of the surrounding environment, and human behavioural factors. It is postulated that these potential contributing factors do not arise from the same underlying risk process, and thus should be explicitly modelled and understood. The state of the practice in road safety network management applies a safety performance function that represents a single risk process to explain crash variability across network sites. This study aims to elucidate the importance of differentiating among various underlying risk processes contributing to the observed crash count at any particular network location. To demonstrate the principle of this theoretical and corresponding methodological approach, the study explores engineering (e.g. segment length, speed limit) and unobserved spatial factors (e.g. climatic factors, presence of schools) as two explicit sources of crash contributing factors. A Bayesian Latent Class (BLC) analysis is used to explore these two sources and to incorporate prior information about their contribution to crash occurrence. The methodology is applied to the state controlled roads in Queensland, Australia and the results are compared with the traditional Negative Binomial (NB) model. A comparison of goodness of fit measures indicates that the model with a double risk process outperforms the single risk process NB model, and thus indicating the need for further research to capture all the three crash generation processes into the SPFs.