32 resultados para Probabilistic metrics
em Aston University Research Archive
Resumo:
Representing knowledge using domain ontologies has shown to be a useful mechanism and format for managing and exchanging information. Due to the difficulty and cost of building ontologies, a number of ontology libraries and search engines are coming to existence to facilitate reusing such knowledge structures. The need for ontology ranking techniques is becoming crucial as the number of ontologies available for reuse is continuing to grow. In this paper we present AKTiveRank, a prototype system for ranking ontologies based on the analysis of their structures. We describe the metrics used in the ranking system and present an experiment on ranking ontologies returned by a popular search engine for an example query.
Resumo:
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximum-likelihood framework, based on a specific form of Gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
This Letter addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album [1]. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework [2]. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.
Resumo:
PURPOSE: To provide a consistent standard for the evaluation of different types of presbyopic correction. SETTING: Eye Clinic, School of Life and Health Sciences, Aston University, Birmingham, United Kingdom. METHODS: Presbyopic corrections examined were accommodating intraocular lenses (IOLs), simultaneous multifocal and monovision contact lenses, and varifocal spectacles. Binocular near visual acuity measured with different optotypes (uppercase letters, lowercase letters, and words) and reading metrics assessed with the Minnesota Near Reading chart (reading acuity, critical print size [CPS], CPS reading speed) were intercorrelated (Pearson product moment correlations) and assessed for concordance (intraclass correlation coefficients [ICC]) and agreement (Bland-Altman analysis) for indication of clinical usefulness. RESULTS: Nineteen accommodating IOL cases, 40 simultaneous contact lens cases, and 38 varifocal spectacle cases were evaluated. Other than CPS reading speed, all near visual acuity and reading metrics correlated well with each other (r>0.70, P<.001). Near visual acuity measured with uppercase letters was highly concordant (ICC, 0.78) and in close agreement with lowercase letters (+/- 0.17 logMAR). Near word acuity agreed well with reading acuity (+/- 0.16 logMAR), which in turn agreed well with near visual acuity measured with uppercase letters 0.16 logMAR). Concordance (ICC, 0.18 to 0.46) and agreement (+/- 0.24 to 0.30 logMAR) of CPS with the other near metrics was moderate. CONCLUSION: Measurement of near visual ability in presbyopia should be standardized to include assessment of near visual acuity with logMAR uppercase-letter optotypes, smallest logMAR print size that maintains maximum reading speed (CPS), and reading speed. J Cataract Refract Surg 2009; 35:1401-1409 (C) 2009 ASCRS and ESCRS
Resumo:
This thesis provides an interoperable language for quantifying uncertainty using probability theory. A general introduction to interoperability and uncertainty is given, with particular emphasis on the geospatial domain. Existing interoperable standards used within the geospatial sciences are reviewed, including Geography Markup Language (GML), Observations and Measurements (O&M) and the Web Processing Service (WPS) specifications. The importance of uncertainty in geospatial data is identified and probability theory is examined as a mechanism for quantifying these uncertainties. The Uncertainty Markup Language (UncertML) is presented as a solution to the lack of an interoperable standard for quantifying uncertainty. UncertML is capable of describing uncertainty using statistics, probability distributions or a series of realisations. The capabilities of UncertML are demonstrated through a series of XML examples. This thesis then provides a series of example use cases where UncertML is integrated with existing standards in a variety of applications. The Sensor Observation Service - a service for querying and retrieving sensor-observed data - is extended to provide a standardised method for quantifying the inherent uncertainties in sensor observations. The INTAMAP project demonstrates how UncertML can be used to aid uncertainty propagation using a WPS by allowing UncertML as input and output data. The flexibility of UncertML is demonstrated with an extension to the GML geometry schemas to allow positional uncertainty to be quantified. Further applications and developments of UncertML are discussed.
Resumo:
The generation of very short range forecasts of precipitation in the 0-6 h time window is traditionally referred to as nowcasting. Most existing nowcasting systems essentially extrapolate radar observations in some manner, however, very few systems account for the uncertainties involved. Thus deterministic forecast are produced, which have a limited use when decisions must be made, since they have no measure of confidence or spread of the forecast. This paper develops a Bayesian state space modelling framework for quantitative precipitation nowcasting which is probabilistic from conception. The model treats the observations (radar) as noisy realisations of the underlying true precipitation process, recognising that this process can never be completely known, and thus must be represented probabilistically. In the model presented here the dynamics of the precipitation are dominated by advection, so this is a probabilistic extrapolation forecast. The model is designed in such a way as to minimise the computational burden, while maintaining a full, joint representation of the probability density function of the precipitation process. The update and evolution equations avoid the need to sample, thus only one model needs be run as opposed to the more traditional ensemble route. It is shown that the model works well on both simulated and real data, but that further work is required before the model can be used operationally. © 2004 Elsevier B.V. All rights reserved.
Resumo:
This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.
Resumo:
The research described here concerns the development of metrics and models to support the development of hybrid (conventional/knowledge based) integrated systems. The thesis argues from the point that, although it is well known that estimating the cost, duration and quality of information systems is a difficult task, it is far from clear what sorts of tools and techniques would adequately support a project manager in the estimation of these properties. A literature review shows that metrics (measurements) and estimating tools have been developed for conventional systems since the 1960s while there has been very little research on metrics for knowledge based systems (KBSs). Furthermore, although there are a number of theoretical problems with many of the `classic' metrics developed for conventional systems, it also appears that the tools which such metrics can be used to develop are not widely used by project managers. A survey was carried out of large UK companies which confirmed this continuing state of affairs. Before any useful tools could be developed, therefore, it was important to find out why project managers were not using these tools already. By characterising those companies that use software cost estimating (SCE) tools against those which could but do not, it was possible to recognise the involvement of the client/customer in the process of estimation. Pursuing this point, a model of the early estimating and planning stages (the EEPS model) was developed to test exactly where estimating takes place. The EEPS model suggests that estimating could take place either before a fully-developed plan has been produced, or while this plan is being produced. If it were the former, then SCE tools would be particularly useful since there is very little other data available from which to produce an estimate. A second survey, however, indicated that project managers see estimating as being essentially the latter at which point project management tools are available to support the process. It would seem, therefore, that SCE tools are not being used because project management tools are being used instead. The issue here is not with the method of developing an estimating model or tool, but; in the way in which "an estimate" is intimately tied to an understanding of what tasks are being planned. Current SCE tools are perceived by project managers as targetting the wrong point of estimation, A model (called TABATHA) is then presented which describes how an estimating tool based on an analysis of tasks would thus fit into the planning stage. The issue of whether metrics can be usefully developed for hybrid systems (which also contain KBS components) is tested by extending a number of "classic" program size and structure metrics to a KBS language, Prolog. Measurements of lines of code, Halstead's operators/operands, McCabe's cyclomatic complexity, Henry & Kafura's data flow fan-in/out and post-release reported errors were taken for a set of 80 commercially-developed LPA Prolog programs: By re~defining the metric counts for Prolog it was found that estimates of program size and error-proneness comparable to the best conventional studies are possible. This suggests that metrics can be usefully applied to KBS languages, such as Prolog and thus, the development of metncs and models to support the development of hybrid information systems is both feasible and useful.
Resumo:
This paper concerns the problem of agent trust in an electronic market place. We maintain that agent trust involves making decisions under uncertainty and therefore the phenomenon should be modelled probabilistically. We therefore propose a probabilistic framework that models agent interactions as a Hidden Markov Model (HMM). The observations of the HMM are the interaction outcomes and the hidden state is the underlying probability of a good outcome. The task of deciding whether to interact with another agent reduces to probabilistic inference of the current state of that agent given all previous interaction outcomes. The model is extended to include a probabilistic reputation system which involves agents gathering opinions about other agents and fusing them with their own beliefs. Our system is fully probabilistic and hence delivers the following improvements with respect to previous work: (a) the model assumptions are faithfully translated into algorithms; our system is optimal under those assumptions, (b) It can account for agents whose behaviour is not static with time (c) it can estimate the rate with which an agent's behaviour changes. The system is shown to significantly outperform previous state-of-the-art methods in several numerical experiments. Copyright © 2010, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
Resumo:
The energy consumption and the energy efficiency have become very important issue in optimizing the current as well as in designing the future telecommunications networks. The energy and power metrics are being introduced in order to enable assessment and comparison of the energy consumption and power efficiency of the telecommunications networks and other transmission equipment. The standardization of the energy and power metrics is a significant ongoing activity aiming to define the baseline energy and power metrics for the telecommunications systems. This article provides an up-to-date overview of the energy and power metrics being proposed by the various standardization bodies and subsequently adopted worldwide by the equipment manufacturers and the network operators. © Institut Télécom and Springer-Verlag 2012.and Springer-Verlag 2012.
Resumo:
This thesis explores the process of developing a principled approach for translating a model of mental-health risk expertise into a probabilistic graphical structure. Probabilistic graphical structures can be a combination of graph and probability theory that provide numerous advantages when it comes to the representation of domains involving uncertainty, domains such as the mental health domain. In this thesis the advantages that probabilistic graphical structures offer in representing such domains is built on. The Galatean Risk Screening Tool (GRiST) is a psychological model for mental health risk assessment based on fuzzy sets. In this thesis the knowledge encapsulated in the psychological model was used to develop the structure of the probability graph by exploiting the semantics of the clinical expertise. This thesis describes how a chain graph can be developed from the psychological model to provide a probabilistic evaluation of risk that complements the one generated by GRiST’s clinical expertise by the decomposing of the GRiST knowledge structure in component parts, which were in turned mapped into equivalent probabilistic graphical structures such as Bayesian Belief Nets and Markov Random Fields to produce a composite chain graph that provides a probabilistic classification of risk expertise to complement the expert clinical judgements
Resumo:
To what extent does competitive entry create a structural change in key marketing metrics? New players may just be a temporal nuisance to incumbents, but could also fundamentally change the latter's performance evolution, or induce them to permanently alter their spending levels and/or pricing decisions. Similarly, the addition of a new marketing channel could permanently shift shopping preferences, or could just create a short-lived migration from existing channels. The steady-state impact of a given entry or channel addition on various marketing metrics is intrinsically an empirical issue for which we need an appropriate testing procedure. In this study, we introduce a testing sequence that allows for the endogenous determination of potential change (break) locations, thereby accounting for lead and/or lagged effects of the introduction of interest. By not restricting the number of potential breaks to one (as is commonly done in the marketing literature), we quantify the impact of the new entrant(s) while controlling for other events that may have taken place in the market. We illustrate the methodology in the context of the Dutch television advertising market, which was characterized by the entry of several late movers. We find that the steady-state growth of private incumbents' revenues was slowed by the quasi-simultaneous entry of three new players. Contrary to industry observers' expectations, such a slowdown was not experienced in the related markets of print and radio advertising.
Resumo:
Classification is the most basic method for organizing resources in the physical space, cyber space, socio space and mental space. To create a unified model that can effectively manage resources in different spaces is a challenge. The Resource Space Model RSM is to manage versatile resources with a multi-dimensional classification space. It supports generalization and specialization on multi-dimensional classifications. This paper introduces the basic concepts of RSM, and proposes the Probabilistic Resource Space Model, P-RSM, to deal with uncertainty in managing various resources in different spaces of the cyber-physical society. P-RSM’s normal forms, operations and integrity constraints are developed to support effective management of the resource space. Characteristics of the P-RSM are analyzed through experiments. This model also enables various services to be described, discovered and composed from multiple dimensions and abstraction levels with normal form and integrity guarantees. Some extensions and applications of the P-RSM are introduced.