920 resultados para probability distribution
Resumo:
Information Retrieval is an important albeit imperfect component of information technologies. A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research. This study shows that this problem leads to a decrease of precision and recall, traditional measures of information retrieval effectiveness. This thesis presents an adaptive IR system based on the theory of adaptive dual control. The aim of the approach is the optimization of retrieval precision after all feedback has been issued. This is done by increasing the diversity of retrieved documents. This study shows that the value of recall reflects this diversity. The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic Information Retrieval theory. Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle. This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used). Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim. However, such models are computationally intractable. Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method. The proposed optimization models are based on several assumptions, starting with the assumption that Information Retrieval is conducted in units of topics. The use of clusters is the primary reasons why a new method of probability estimation is proposed. The adaptive dual control of topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents. The Wikipedia experiment revealed that the dual control feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied. In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm. The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR. The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption.
Resumo:
Automatic recognition of people is an active field of research with important forensic and security applications. In these applications, it is not always possible for the subject to be in close proximity to the system. Voice represents a human behavioural trait which can be used to recognise people in such situations. Automatic Speaker Verification (ASV) is the process of verifying a persons identity through the analysis of their speech and enables recognition of a subject at a distance over a telephone channel { wired or wireless. A significant amount of research has focussed on the application of Gaussian mixture model (GMM) techniques to speaker verification systems providing state-of-the-art performance. GMM's are a type of generative classifier trained to model the probability distribution of the features used to represent a speaker. Recently introduced to the field of ASV research is the support vector machine (SVM). An SVM is a discriminative classifier requiring examples from both positive and negative classes to train a speaker model. The SVM is based on margin maximisation whereby a hyperplane attempts to separate classes in a high dimensional space. SVMs applied to the task of speaker verification have shown high potential, particularly when used to complement current GMM-based techniques in hybrid systems. This work aims to improve the performance of ASV systems using novel and innovative SVM-based techniques. Research was divided into three main themes: session variability compensation for SVMs; unsupervised model adaptation; and impostor dataset selection. The first theme investigated the differences between the GMM and SVM domains for the modelling of session variability | an aspect crucial for robust speaker verification. Techniques developed to improve the robustness of GMMbased classification were shown to bring about similar benefits to discriminative SVM classification through their integration in the hybrid GMM mean supervector SVM classifier. Further, the domains for the modelling of session variation were contrasted to find a number of common factors, however, the SVM-domain consistently provided marginally better session variation compensation. Minimal complementary information was found between the techniques due to the similarities in how they achieved their objectives. The second theme saw the proposal of a novel model for the purpose of session variation compensation in ASV systems. Continuous progressive model adaptation attempts to improve speaker models by retraining them after exploiting all encountered test utterances during normal use of the system. The introduction of the weight-based factor analysis model provided significant performance improvements of over 60% in an unsupervised scenario. SVM-based classification was then integrated into the progressive system providing further benefits in performance over the GMM counterpart. Analysis demonstrated that SVMs also hold several beneficial characteristics to the task of unsupervised model adaptation prompting further research in the area. In pursuing the final theme, an innovative background dataset selection technique was developed. This technique selects the most appropriate subset of examples from a large and diverse set of candidate impostor observations for use as the SVM background by exploiting the SVM training process. This selection was performed on a per-observation basis so as to overcome the shortcoming of the traditional heuristic-based approach to dataset selection. Results demonstrate the approach to provide performance improvements over both the use of the complete candidate dataset and the best heuristically-selected dataset whilst being only a fraction of the size. The refined dataset was also shown to generalise well to unseen corpora and be highly applicable to the selection of impostor cohorts required in alternate techniques for speaker verification.
Resumo:
Reliability analysis has several important engineering applications. Designers and operators of equipment are often interested in the probability of the equipment operating successfully to a given age - this probability is known as the equipment's reliability at that age. Reliability information is also important to those charged with maintaining an item of equipment, as it enables them to model and evaluate alternative maintenance policies for the equipment. In each case, information on failures and survivals of a typical sample of items is used to estimate the required probabilities as a function of the item's age, this process being one of many applications of the statistical techniques known as distribution fitting. In most engineering applications, the estimation procedure must deal with samples containing survivors (suspensions or censorings); this thesis focuses on several graphical estimation methods that are widely used for analysing such samples. Although these methods have been current for many years, they share a common shortcoming: none of them is continuously sensitive to changes in the ages of the suspensions, and we show that the resulting reliability estimates are therefore more pessimistic than necessary. We use a simple example to show that the existing graphical methods take no account of any service recorded by suspensions beyond their respective previous failures, and that this behaviour is inconsistent with one's intuitive expectations. In the course of this thesis, we demonstrate that the existing methods are only justified under restricted conditions. We present several improved methods and demonstrate that each of them overcomes the problem described above, while reducing to one of the existing methods where this is justified. Each of the improved methods thus provides a realistic set of reliability estimates for general (unrestricted) censored samples. Several related variations on these improved methods are also presented and justified. - i
Resumo:
Maintenance activities in a large-scale engineering system are usually scheduled according to the lifetimes of various components in order to ensure the overall reliability of the system. Lifetimes of components can be deduced by the corresponding probability distributions with parameters estimated from past failure data. While failure data of the components is not always readily available, the engineers have to be content with the primitive information from the manufacturers only, such as the mean and standard deviation of lifetime, to plan for the maintenance activities. In this paper, the moment-based piecewise polynomial model (MPPM) are proposed to estimate the parameters of the reliability probability distribution of the products when only the mean and standard deviation of the product lifetime are known. This method employs a group of polynomial functions to estimate the two parameters of the Weibull Distribution according to the mathematical relationship between the shape parameter of two-parameters Weibull Distribution and the ratio of mean and standard deviation. Tests are carried out to evaluate the validity and accuracy of the proposed methods with discussions on its suitability of applications. The proposed method is particularly useful for reliability-critical systems, such as railway and power systems, in which the maintenance activities are scheduled according to the expected lifetimes of the system components.
Resumo:
This article introduces a “pseudo classical” notion of modelling non-separability. This form of non-separability can be viewed as lying between separability and quantum-like non-separability. Non-separability is formalized in terms of the non-factorizabilty of the underlying joint probability distribution. A decision criterium for determining the non-factorizability of the joint distribution is related to determining the rank of a matrix as well as another approach based on the chi-square-goodness-of-fit test. This pseudo-classical notion of non-separability is discussed in terms of quantum games and concept combinations in human cognition.
Resumo:
This paper presents a robust stochastic framework for the incorporation of visual observations into conventional estimation, data fusion, navigation and control algorithms. The representation combines Isomap, a non-linear dimensionality reduction algorithm, with expectation maximization, a statistical learning scheme. The joint probability distribution of this representation is computed offline based on existing training data. The training phase of the algorithm results in a nonlinear and non-Gaussian likelihood model of natural features conditioned on the underlying visual states. This generative model can be used online to instantiate likelihoods corresponding to observed visual features in real-time. The instantiated likelihoods are expressed as a Gaussian mixture model and are conveniently integrated within existing non-linear filtering algorithms. Example applications based on real visual data from heterogenous, unstructured environments demonstrate the versatility of the generative models.
Resumo:
This paper describes a novel probabilistic approach to incorporating odometric information into appearance-based SLAM systems, without performing metric map construction or calculating relative feature geometry. The proposed system, dubbed Continuous Appearance-based Trajectory SLAM (CAT-SLAM), represents location as a probability distribution along a trajectory, and represents appearance continuously over the trajectory rather than at discrete locations. The distribution is evaluated using a Rao-Blackwellised particle filter, which weights particles based on local appearance and odometric similarity and explicitly models both the likelihood of revisiting previous locations and visiting new locations. A modified resampling scheme counters particle deprivation and allows loop closure updates to be performed in constant time regardless of map size. We compare the performance of CAT-SLAM to FAB-MAP (an appearance-only SLAM algorithm) in an outdoor environment, demonstrating a threefold increase in the number of correct loop closures detected by CAT-SLAM.
Resumo:
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.
Resumo:
The stochastic simulation algorithm was introduced by Gillespie and in a different form by Kurtz. There have been many attempts at accelerating the algorithm without deviating from the behavior of the simulated system. The crux of the explicit τ-leaping procedure is the use of Poisson random variables to approximate the number of occurrences of each type of reaction event during a carefully selected time period, τ. This method is acceptable providing the leap condition, that no propensity function changes “significantly” during any time-step, is met. Using this method there is a possibility that species numbers can, artificially, become negative. Several recent papers have demonstrated methods that avoid this situation. One such method classifies, as critical, those reactions in danger of sending species populations negative. At most, one of these critical reactions is allowed to occur in the next time-step. We argue that the criticality of a reactant species and its dependent reaction channels should be related to the probability of the species number becoming negative. This way only reactions that, if fired, produce a high probability of driving a reactant population negative are labeled critical. The number of firings of more reaction channels can be approximated using Poisson random variables thus speeding up the simulation while maintaining the accuracy. In implementing this revised method of criticality selection we make use of the probability distribution from which the random variable describing the change in species number is drawn. We give several numerical examples to demonstrate the effectiveness of our new method.
Resumo:
We consider a robust filtering problem for uncertain discrete-time, homogeneous, first-order, finite-state hidden Markov models (HMMs). The class of uncertain HMMs considered is described by a conditional relative entropy constraint on measures perturbed from a nominal regular conditional probability distribution given the previous posterior state distribution and the latest measurement. Under this class of perturbations, a robust infinite horizon filtering problem is first formulated as a constrained optimization problem before being transformed via variational results into an unconstrained optimization problem; the latter can be elegantly solved using a risk-sensitive information-state based filtering.
Resumo:
This work examines the effect of landmark placement on the efficiency and accuracy of risk-bounded searches over probabilistic costmaps for mobile robot path planning. In previous work, risk-bounded searches were shown to offer in excess of 70% efficiency increases over normal heuristic search methods. The technique relies on precomputing distance estimates to landmarks which are then used to produce probability distributions over exact heuristics for use in heuristic searches such as A* and D*. The location and number of these landmarks therefore influence greatly the efficiency of the search and the quality of the risk bounds. Here four new methods of selecting landmarks for risk based search are evaluated. Results are shown which demonstrate that landmark selection needs to take into account the centrality of the landmark, and that diminishing rewards are obtained from using large numbers of landmarks.
Resumo:
Pipelines are important lifeline facilities spread over a large area and they generally encounter a range of seismic hazards and different soil conditions. The seismic response of a buried segmented pipe depends on various parameters such as the type of buried pipe material and joints, end restraint conditions, soil characteristics, burial depths, and earthquake ground motion, etc. This study highlights the effect of the variation of geotechnical properties of the surrounding soil on seismic response of a buried pipeline. The variations of the properties of the surrounding soil along the pipe are described by sampling them from predefined probability distribution. The soil-pipe interaction model is developed in OpenSEES. Nonlinear earthquake time-history analysis is performed to study the effect of soil parameters variability on the response of pipeline. Based on the results, it is found that uncertainty in soil parameters may result in significant response variability of the pipeline.
Resumo:
Consider the concept combination ‘pet human’. In word association experiments, human subjects produce the associate ‘slave’ in relation to this combination. The striking aspect of this associate is that it is not produced as an associate of ‘pet’, or ‘human’ in isolation. In other words, the associate ‘slave’ seems to be emergent. Such emergent associations sometimes have a creative character and cognitive science is largely silent about how we produce them. Departing from a dimensional model of human conceptual space, this article will explore concept combinations, and will argue that emergent associations are a result of abductive reasoning within conceptual space, that is, below the symbolic level of cognition. A tensor-based approach is used to model concept combinations allowing such combinations to be formalized as interacting quantum systems. Free association norm data is used to motivate the underlying basis of the conceptual space. It is shown by analogy how some concept combinations may behave like quantum-entangled (non-separable) particles. Two methods of analysis were presented for empirically validating the presence of non-separable concept combinations in human cognition. One method is based on quantum theory and another based on comparing a joint (true theoretic) probability distribution with another distribution based on a separability assumption using a chi-square goodness-of-fit test. Although these methods were inconclusive in relation to an empirical study of bi-ambiguous concept combinations, avenues for further refinement of these methods are identified.
Resumo:
The question of under what conditions conceptual representation is compositional remains debatable within cognitive science. This paper proposes a well developed mathematical apparatus for a probabilistic representation of concepts, drawing upon methods developed in quantum theory to propose a formal test that can determine whether a specific conceptual combination is compositional, or not. This test examines a joint probability distribution modeling the combination, asking whether or not it is factorizable. Empirical studies indicate that some combinations should be considered non-compositionally.
Resumo:
Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.