930 resultados para Probabilistic latent semantic analysis (PLSA)
Resumo:
Background: The Lescol Intervention Prevention Study (LIPS) was a multinational randomized controlled trial that showed a 47% reduction in the relative risk of cardiac death and a 22% reduction in major adverse cardiac events (MACEs) from the routine use of fluvastatin, compared with controls, in patients undergoing percutaneous coronary intervention (PCI, defined as angioplasty with or without stents). In this study, MACEs included cardiac death, nonfatal myocardial infarction, and subsequent PCI and coronary artery bypass graft. Diabetes was the greatest risk factor for MACEs. Objective: This study estimated the cost-effectiveness of fluvastatin when used for secondary prevention of MACEs after PCI in people with diabetes. Methods: A post hoc subgroup analysis of patients with diabetes from the LIPS was used to estimate the effectiveness of fluvastatin in reducing myocardial infarction, revascularization, and cardiac death. A probabilistic Markov model was developed using United Kingdom resource and cost data to estimate the additional costs and quality-adjusted life-years (QALYs) gained over 10 years from the perspective of the British National Health Service. The model contained 6 health states, and the transition probabilities were derived from the LIPS data. Crossover from fluvastatin to other lipid-lowering drugs, withdrawal from fluvastatin, and the use of lipid-lowering drugs in the control group were included. Results: In the subgroup of 202 patients with diabetes in the LIPS trial, 18 (15.0%) of 120 fluvastatin patients and 21 (25.6%) of 82 control participants were insulin dependent (P = NS). Compared with the control group, patients treated with fluvastatin can expect to gain an additional mean (SD) of 0.196 (0.139) QALY per patient over 10 years (P < 0.001) and will cost the health service an additional mean (SD) of 10 (E448) (P = NS) (mean [SD] US $16 [$689]). The additional cost per QALY gained was;(51 (US $78). The key determinants of cost-effectiveness included the probabilities of repeat interventions, cardiac death, the cost of fluvastatin, and the time horizon used for the evaluation. Conclusion: Fluvastatin was an economically efficient treatment to prevent MACEs in these patients with diabetes undergoing PCI.
Resumo:
This paper presents a new method for producing a functional-structural plant model that simulates response to different growth conditions, yet does not require detailed knowledge of underlying physiology. The example used to present this method is the modelling of the mountain birch tree. This new functional-structural modelling approach is based on linking an L-system representation of the dynamic structure of the plant with a canonical mathematical model of plant function. Growth indicated by the canonical model is allocated to the structural model according to probabilistic growth rules, such as rules for the placement and length of new shoots, which were derived from an analysis of architectural data. The main advantage of the approach is that it is relatively simple compared to the prevalent process-based functional-structural plant models and does not require a detailed understanding of underlying physiological processes, yet it is able to capture important aspects of plant function and adaptability, unlike simple empirical models. This approach, combining canonical modelling, architectural analysis and L-systems, thus fills the important role of providing an intermediate level of abstraction between the two extremes of deeply mechanistic process-based modelling and purely empirical modelling. We also investigated the relative importance of various aspects of this integrated modelling approach by analysing the sensitivity of the standard birch model to a number of variations in its parameters, functions and algorithms. The results show that using light as the sole factor determining the structural location of new growth gives satisfactory results. Including the influence of additional regulating factors made little difference to global characteristics of the emergent architecture. Changing the form of the probability functions and using alternative methods for choosing the sites of new growth also had little effect. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
The ability of two-dimensional gel electrophoresis (2-DE) to separate glycoproteins was exploited to separate distinct glycoforms of kappa-casein that differed only in the number of O-glycans that were attached. To determine where the glycans were attached, the individual glycoforms were digested in-gel with pepsin and the released glycopeptides were identified from characteristic sugar ions in the tandem mass spectrometry (MS) spectra. The O-glycosylation sites were identified by tandem MS after replacement of the glycans with ammonia/aminoethanethiol. The results showed that glycans were not randomly distributed among the five potential glycosylation sites in kappa-casein. Rather, glycosylation of the monoglycoform could only be detected at a single site, T-152. Similarly the diglycoform appeared to be modified exclusively at T-152 and T-163, while the triglycoform was modified at T-152, T-163 and T-154. While low levels of glycosylation at other sites cannot be excluded the hierarchy of site occupation between glycoforms was clearly evident and argues for an ordered addition of glycans to the protein. Since all five potential O-glycosylation sites can be glycosylated in vivo, it would appear that certain sites remain latent until other sites are occupied. The determination of glycosylation site occupancy in individual glycoforms separated by 2-DE revealed a distinct pattern of in vivo glycosylation that has not been recognized previously.
Resumo:
The Leximancer system is a relatively new method for transforming lexical co-occurrence information from natural language into semantic patterns in an unsupervised manner. It employs two stages of co-occurrence information extraction-semantic and relational-using a different algorithm for each stage. The algorithms used are statistical, but they employ nonlinear dynamics and machine learning. This article is an attempt to validate the output of Leximancer, using a set of evaluation criteria taken from content analysis that are appropriate for knowledge discovery tasks.
Resumo:
For second-hand products sold with warranty, the expected warranty cost for an item to the manufacturer, depends on (i) the age and/or usage as well as the maintenance history for the item and (ii) the terms of the warranty policy. The paper develops probabilistic models to compute the expected warranty cost to the manufacturer when the items are sold with free replacement or pro rata warranties. (C) 2000 Elsevier Science Ltd. All rights reserved.
Resumo:
Grid computing is an emerging technology for providing the high performance computing capability and collaboration mechanism for solving the collaborated and complex problems while using the existing resources. In this paper, a grid computing based framework is proposed for the probabilistic based power system reliability and security analysis. The suggested name of this computing grid is Reliability and Security Grid (RSA-Grid). Then the architecture of this grid is presented. A prototype system has been built for further development of grid-based services for power systems reliability and security assessment based on probabilistic techniques, which require high performance computing and large amount of memory. Preliminary results based on prototype of this grid show that RSA-Grid can provide the comprehensive assessment results for real power systems efficiently and economically.
Resumo:
As an alternative to traditional evolutionary algorithms (EAs), population-based incremental learning (PBIL) maintains a probabilistic model of the best individual(s). Originally, PBIL was applied in binary search spaces. Recently, some work has been done to extend it to continuous spaces. In this paper, we review two such extensions of PBIL. An improved version of the PBIL based on Gaussian model is proposed that combines two main features: a new updating rule that takes into account all the individuals and their fitness values and a self-adaptive learning rate parameter. Furthermore, a new continuous PBIL employing a histogram probabilistic model is proposed. Some experiments results are presented that highlight the features of the new algorithms.
Resumo:
Probabilistic robotics most often applied to the problem of simultaneous localisation and mapping (SLAM), requires measures of uncertainty to accompany observations of the environment. This paper describes how uncertainty can be characterised for a vision system that locates coloured landmarks in a typical laboratory environment. The paper describes a model of the uncertainty in segmentation, the internal cameral model and the mounting of the camera on the robot. It explains the implementation of the system on a laboratory robot, and provides experimental results that show the coherence of the uncertainty model.
Resumo:
Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.
Resumo:
Magnification factors specify the extent to which the area of a small patch of the latent (or `feature') space of a topographic mapping is magnified on projection to the data space, and are of considerable interest in both neuro-biological and data analysis contexts. Previous attempts to consider magnification factors for the self-organizing map (SOM) algorithm have been hindered because the mapping is only defined at discrete points (given by the reference vectors). In this paper we consider the batch version of SOM, for which a continuous mapping can be defined, as well as the Generative Topographic Mapping (GTM) algorithm of Bishop et al. (1997) which has been introduced as a probabilistic formulation of the SOM. We show how the techniques of differential geometry can be used to determine magnification factors as continuous functions of the latent space coordinates. The results are illustrated here using a problem involving the identification of crab species from morphological data.
Resumo:
This thesis describes the Generative Topographic Mapping (GTM) --- a non-linear latent variable model, intended for modelling continuous, intrinsically low-dimensional probability distributions, embedded in high-dimensional spaces. It can be seen as a non-linear form of principal component analysis or factor analysis. It also provides a principled alternative to the self-organizing map --- a widely established neural network model for unsupervised learning --- resolving many of its associated theoretical problems. An important, potential application of the GTM is visualization of high-dimensional data. Since the GTM is non-linear, the relationship between data and its visual representation may be far from trivial, but a better understanding of this relationship can be gained by computing the so-called magnification factor. In essence, the magnification factor relates the distances between data points, as they appear when visualized, to the actual distances between those data points. There are two principal limitations of the basic GTM model. The computational effort required will grow exponentially with the intrinsic dimensionality of the density model. However, if the intended application is visualization, this will typically not be a problem. The other limitation is the inherent structure of the GTM, which makes it most suitable for modelling moderately curved probability distributions of approximately rectangular shape. When the target distribution is very different to that, theaim of maintaining an `interpretable' structure, suitable for visualizing data, may come in conflict with the aim of providing a good density model. The fact that the GTM is a probabilistic model means that results from probability theory and statistics can be used to address problems such as model complexity. Furthermore, this framework provides solid ground for extending the GTM to wider contexts than that of this thesis.
Resumo:
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.
Resumo:
The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem. A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters. We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer. We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution. We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes. This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets. We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.
Resumo:
This work explores the relevance of semantic and linguistic description to translation, theory and practice. It is aimed towards a practical model of approach to texts to translate. As literary texts [poetry mainly] are the focus of attention, so are stylistic matters. Note, however, that 'style', and, to some extent, the conclusions of the work, are not limited to so-called literary texts. The study of semantic description reveals that most translation problems do not stem from the cognitive (langue-related), but rather from the contextual (parole-related) aspects of meaning. Thus, any linguistic model that fails to account for the latter is bound to fall short. T.G.G. does, whereas Systemics, concerned with both the 'Iangue' and 'parole' (stylistic and sociolinguistic mainly) aspects of meaning, provides a useful framework of approach to texts to translate. Two essential semantic principles for translation are: that meaning is the property of a language (Firth); and the 'relativity of meaning assignments' (Tymoczko). Both imply that meaning can only be assessed, correctly, in the relevant socio-cultural background. Translation is seen as a restricted creation, and the translator's encroach as a three-dimensional critical one. To encompass the most technical to the most literary text, and account for variations in emphasis in any text, translation theory must be based on typology of function Halliday's ideational, interpersonal and textual, or, Buhler's symbol, signal, symptom, Functions3. Function Coverall and specific] will dictate aims and method, and also provide the critic with criteria to assess translation Faithfulness. Translation can never be reduced to purely objective methods, however. Intuitive procedures intervene, in textual interpretation and analysis, in the choice of equivalents, and in the reception of a translation. Ultimately, translation, theory and practice, may perhaps constitute the touchstone as regards the validity of linguistic and semantic theories.