903 resultados para Probabilistic latent semantic analysis (PLSA)


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Sociolinguists have documented the substrate influence of various languages on the formation of dialects in numerous ethnic-regional setting throughout the United States. This literature shows that while phonological and grammatical influences from other languages may be instantiated as durable dialect features, lexical phenomena often fade over time as ethnolinguistic communities assimilate with contiguous dialect groups. In preliminary investigations of emerging Miami Latino English, we have observed that lexical forms based on Spanish lexical forms are not only ubiquitous among the speech of the first generation Cuban Americans but also of the second. Examples, observed in field work, casual observation, and studied formally in an experimental context include the following: “get down from the car,” which derives from the Spanish equivalent, bajar del carro instead of “get out of the car”. The translation task administered to thirty-one participants showed a variety lexical phenomena are still maintained at equal or higher frequencies.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Until recently the use of biometrics was restricted to high-security environments and criminal identification applications, for economic and technological reasons. However, in recent years, biometric authentication has become part of daily lives of people. The large scale use of biometrics has shown that users within the system may have different degrees of accuracy. Some people may have trouble authenticating, while others may be particularly vulnerable to imitation. Recent studies have investigated and identified these types of users, giving them the names of animals: Sheep, Goats, Lambs, Wolves, Doves, Chameleons, Worms and Phantoms. The aim of this study is to evaluate the existence of these users types in a database of fingerprints and propose a new way of investigating them, based on the performance of verification between subjects samples. Once introduced some basic concepts in biometrics and fingerprint, we present the biometric menagerie and how to evaluate them.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Until recently the use of biometrics was restricted to high-security environments and criminal identification applications, for economic and technological reasons. However, in recent years, biometric authentication has become part of daily lives of people. The large scale use of biometrics has shown that users within the system may have different degrees of accuracy. Some people may have trouble authenticating, while others may be particularly vulnerable to imitation. Recent studies have investigated and identified these types of users, giving them the names of animals: Sheep, Goats, Lambs, Wolves, Doves, Chameleons, Worms and Phantoms. The aim of this study is to evaluate the existence of these users types in a database of fingerprints and propose a new way of investigating them, based on the performance of verification between subjects samples. Once introduced some basic concepts in biometrics and fingerprint, we present the biometric menagerie and how to evaluate them.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We formally compare fundamental factor and latent factor approaches to oil price modelling. Fundamental modelling has a long history in seeking to understand oil price movements, while latent factor modelling has a more recent and limited history, but has gained popularity in other financial markets. The two approaches, though competing, have not formally been compared as to effectiveness. For a range of short- medium- and long-dated WTI oil futures we test a recently proposed five-factor fundamental model and a Principal Component Analysis latent factor model. Our findings demonstrate that there is no discernible difference between the two techniques in a dynamic setting. We conclude that this infers some advantages in adopting the latent factor approach due to the difficulty in determining a well specified fundamental model.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

ABSTRACT Researchers frequently have to analyze scales in which some participants have failed to respond to some items. In this paper we focus on the exploratory factor analysis of multidimensional scales (i.e., scales that consist of a number of subscales) where each subscale is made up of a number of Likert-type items, and the aim of the analysis is to estimate participants' scores on the corresponding latent traits. We propose a new approach to deal with missing responses in such a situation that is based on (1) multiple imputation of non-responses and (2) simultaneous rotation of the imputed datasets. We applied the approach in a real dataset where missing responses were artificially introduced following a real pattern of non-responses, and a simulation study based on artificial datasets. The results show that our approach (specifically, Hot-Deck multiple imputation followed of Consensus Promin rotation) was able to successfully compute factor score estimates even for participants that have missing data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We study a climatologically important interaction of two of the main components of the geophysical system by adding an energy balance model for the averaged atmospheric temperature as dynamic boundary condition to a diagnostic ocean model having an additional spatial dimension. In this work, we give deeper insight than previous papers in the literature, mainly with respect to the 1990 pioneering model by Watts and Morantine. We are taking into consideration the latent heat for the two phase ocean as well as a possible delayed term. Non-uniqueness for the initial boundary value problem, uniqueness under a non-degeneracy condition and the existence of multiple stationary solutions are proved here. These multiplicity results suggest that an S-shaped bifurcation diagram should be expected to occur in this class of models generalizing previous energy balance models. The numerical method applied to the model is based on a finite volume scheme with nonlinear weighted essentially non-oscillatory reconstruction and Runge–Kutta total variation diminishing for time integration.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Maintenance of transport infrastructure assets is widely advocated as the key in minimizing current and future costs of the transportation network. While effective maintenance decisions are often a result of engineering skills and practical knowledge, efficient decisions must also account for the net result over an asset's life-cycle. One essential aspect in the long term perspective of transport infrastructure maintenance is to proactively estimate maintenance needs. In dealing with immediate maintenance actions, support tools that can prioritize potential maintenance candidates are important to obtain an efficient maintenance strategy. This dissertation consists of five individual research papers presenting a microdata analysis approach to transport infrastructure maintenance. Microdata analysis is a multidisciplinary field in which large quantities of data is collected, analyzed, and interpreted to improve decision-making. Increased access to transport infrastructure data enables a deeper understanding of causal effects and a possibility to make predictions of future outcomes. The microdata analysis approach covers the complete process from data collection to actual decisions and is therefore well suited for the task of improving efficiency in transport infrastructure maintenance. Statistical modeling was the selected analysis method in this dissertation and provided solutions to the different problems presented in each of the five papers. In Paper I, a time-to-event model was used to estimate remaining road pavement lifetimes in Sweden. In Paper II, an extension of the model in Paper I assessed the impact of latent variables on road lifetimes; displaying the sections in a road network that are weaker due to e.g. subsoil conditions or undetected heavy traffic. The study in Paper III incorporated a probabilistic parametric distribution as a representation of road lifetimes into an equation for the marginal cost of road wear. Differentiated road wear marginal costs for heavy and light vehicles are an important information basis for decisions regarding vehicle miles traveled (VMT) taxation policies. In Paper IV, a distribution based clustering method was used to distinguish between road segments that are deteriorating and road segments that have a stationary road condition. Within railway networks, temporary speed restrictions are often imposed because of maintenance and must be addressed in order to keep punctuality. The study in Paper V evaluated the empirical effect on running time of speed restrictions on a Norwegian railway line using a generalized linear mixed model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The report presents a methodology for whole of life cycle cost analysis of alternative treatment options for bridge structures, which require rehabilitation. The methodology has been developed after a review of current methods and establishing that a life cycle analysis based on a probabilistic risk approach has many advantages including the essential ability to consider variability of input parameters. The input parameters for the analysis are identified as initial cost, maintenance, monitoring and repair cost, user cost and failure cost. The methodology utilizes the advanced simulation technique of Monte Carlo simulation to combine a number of probability distributions to establish the distribution of whole of life cycle cost. In performing the simulation, the need for a powerful software package, which would work with spreadsheet program, has been identified. After exploring several products on the market, @RISK software has been selected for the simulation. In conclusion, the report presents a typical decision making scenario considering two alternative treatment options.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, automatic recognition of semantic information such as salient objects and mid-level concepts from images is a challenging task. Since real-world objects tend to exist in a context within their environment, the computer vision researchers have increasingly incorporated contextual information for improving object recognition. In this paper, we present a method to build a visual contextual ontology from salient objects descriptions for image annotation. The ontologies include not only partOf/kindOf relations, but also spatial and co-occurrence relations. A two-step image annotation algorithm is also proposed based on ontology relations and probabilistic inference. Different from most of the existing work, we specially exploit how to combine representation of ontology, contextual knowledge and probabilistic inference. The experiments show that image annotation results are improved in the LabelMe dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

System analysis within the traction power system is vital to the design and operation of an electrified railway. Loads in traction power systems are often characterised by their mobility, wide range of power variations, regeneration and service dependence. In addition, the feeding systems may take different forms in AC electrified railways. Comprehensive system studies are usually carried out by computer simulation. A number of traction power simulators have been available and they allow calculation of electrical interaction among trains and deterministic solutions of the power network. In the paper, a different approach is presented to enable load-flow analysis on various feeding systems and service demands in AC railways by adopting probabilistic techniques. It is intended to provide a different viewpoint to the load condition. Simulation results are given to verify the probabilistic-load-flow models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Power load flow analysis is essential for system planning, operation, development and maintenance. Its application on railway supply system is no exception. Railway power supplies system distinguishes itself in terms of load pattern and mobility, as well as feeding system structure. An attempt has been made to apply probability load flow (PLF) techniques on electrified railways in order to examine the loading on the feeding substations and the voltage profiles of the trains. This study is to formulate a simple and reliable model to support the necessary calculations for probability load flow analysis in railway systems with autotransformer (AT) feeding system, and describe the development of a software suite to realise the computation.