31 resultados para Latent semantic indexing
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal
Resumo:
Given a model that can be simulated, conditional moments at a trial parameter value can be calculated with high accuracy by applying kernel smoothing methods to a long simulation. With such conditional moments in hand, standard method of moments techniques can be used to estimate the parameter. Since conditional moments are calculated using kernel smoothing rather than simple averaging, it is not necessary that the model be simulable subject to the conditioning information that is used to define the moment conditions. For this reason, the proposed estimator is applicable to general dynamic latent variable models. Monte Carlo results show that the estimator performs well in comparison to other estimators that have been proposed for estimation of general DLV models.
Resumo:
We study the lysis timing of a bacteriophage population by means of a continuously infection-age-structured population dynamics model. The features of the model are the infection process of bacteria, the natural death process, and the lysis process which means the replication of bacteriophage viruses inside bacteria and the destruction of them. We consider that the length of the lysis timing (or latent period) is distributed according to a general probability distribution function. We have carried out an optimization procedure and we have found the latent period corresponding to the maximal fitness (i.e. maximal growth rate) of the bacteriophage population.
Resumo:
Abstract. Given a model that can be simulated, conditional moments at a trial parameter value can be calculated with high accuracy by applying kernel smoothing methods to a long simulation. With such conditional moments in hand, standard method of moments techniques can be used to estimate the parameter. Because conditional moments are calculated using kernel smoothing rather than simple averaging, it is not necessary that the model be simulable subject to the conditioning information that is used to define the moment conditions. For this reason, the proposed estimator is applicable to general dynamic latent variable models. It is shown that as the number of simulations diverges, the estimator is consistent and a higher-order expansion reveals the stochastic difference between the infeasible GMM estimator based on the same moment conditions and the simulated version. In particular, we show how to adjust standard errors to account for the simulations. Monte Carlo results show how the estimator may be applied to a range of dynamic latent variable (DLV) models, and that it performs well in comparison to several other estimators that have been proposed for DLV models.
Resumo:
Nowadays, when a user is planning a touristic route is very difficult to find out which are the best places to visit. The user has to choose considering his/her preferences due to the great quantity of information it is possible to find in the web and taking into account it is necessary to do a selection, within small time because there is a limited time to do a trip. In Itiner@ project, we aim to implement Semantic Web technology combined with Geographic Information Systems in order to offer personalized touristic routes around a region based on user preferences and time situation. Using ontologies it is possible to link, structure, share data and obtain the result more suitable for user's preferences and actual situation with less time and more precisely than without ontologies. To achieve these objectives we propose a web page combining a GIS server and a touristic ontology. As a step further, we also study how to extend this technology on mobile devices due to the raising interest and technological progress of these devices and location-based services, which allows the user to have all the route information on the hand when he/she does a touristic trip. We design a little application in order to apply the combination of GIS and Semantic Web in a mobile device.
Resumo:
This document is a journey through Semantic Web principles and Microsoft SharePoint in order to come to understand some advantages and disadvantages of theirs, and how Semantic Web principles can be blended into an enterprise solution like Microsoft SharePoint.
Resumo:
In this paper we present the theoretical and methodologicalfoundations for the development of a multi-agentSelective Dissemination of Information (SDI) servicemodel that applies Semantic Web technologies for specializeddigital libraries. These technologies make possibleachieving more efficient information management,improving agent–user communication processes, andfacilitating accurate access to relevant resources. Othertools used are fuzzy linguistic modelling techniques(which make possible easing the interaction betweenusers and system) and natural language processing(NLP) techniques for semiautomatic thesaurus generation.Also, RSS feeds are used as “current awareness bulletins”to generate personalized bibliographic alerts.
Resumo:
In this paper we present a description of the role of definitional verbal patterns for the extraction of semantic relations. Several studies show that semantic relations can be extracted from analytic definitions contained in machine-readable dictionaries (MRDs). In addition, definitions found in specialised texts are a good starting point to search for different types of definitions where other semantic relations occur. The extraction of definitional knowledge from specialised corpora represents another interesting approach for the extraction of semantic relations. Here, we present a descriptive analysis of definitional verbal patterns in Spanish and the first steps towards the development of a system for the automatic extraction of definitional knowledge.
Resumo:
In this paper a method for extracting semantic informationfrom online music discussion forums is proposed. The semantic relations are inferred from the co-occurrence of musical concepts in forum posts, using network analysis. The method starts by defining a dictionary of common music terms in an art music tradition. Then, it creates a complex network representation of the online forum by matchingsuch dictionary against the forum posts. Once the complex network is built we can study different network measures, including node relevance, node co-occurrence andterm relations via semantically connecting words. Moreover, we can detect communities of concepts inside the forum posts. The rationale is that some music terms are more related to each other than to other terms. All in all, this methodology allows us to obtain meaningful and relevantinformation from forum discussions.
Resumo:
Acquiring lexical information is a complex problem, typically approached by relying on a number of contexts to contribute information for classification. One of the first issues to address in this domain is the determination of such contexts. The work presented here proposes the use of automatically obtained FORMAL role descriptors as features used to draw nouns from the same lexical semantic class together in an unsupervised clustering task. We have dealt with three lexical semantic classes (HUMAN, LOCATION and EVENT) in English. The results obtained show that it is possible to discriminate between elements from different lexical semantic classes using only FORMAL role information, hence validating our initial hypothesis. Also, iterating our method accurately accounts for fine-grained distinctions within lexical classes, namely distinctions involving ambiguous expressions. Moreover, a filtering and bootstrapping strategy employed in extracting FORMAL role descriptors proved to minimize effects of sparse data and noise in our task.
Resumo:
The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.
Resumo:
This work briefly analyses the difficulties to adopt the Semantic Web, and in particular proposes systems to know the present level of migration to the different technologies that make up the Semantic Web. It focuses on the presentation and description of two tools, DigiDocSpider and DigiDocMetaEdit, designed with the aim of verifYing, evaluating, and promoting its implementation.
Resumo:
Using data from the Spanish household budget survey, we investigate life- cycle effects on several product expenditures. A latent-variable model approach is adopted to evaluate the impact of income on expenditures, controlling for the number of members in the family. Two latent factors underlying repeated measures of monetary and non-monetary income are used as explanatory variables in the expenditure regression equations, thus avoiding possible bias associated to the measurement error in income. The proposed methodology also takes care of the case in which product expenditures exhibit a pattern of infrequent purchases. Multiple-group analysis is used to assess the variation of key parameters of the model across various household life-cycle typologies. The analysis discloses significant life-cycle effects on the mean levels of expenditures; it also detects significant life-cycle effects on the way expenditures are affected by income and family size. Asymptotic robust methods are used to account for possible non-normality of the data.
Resumo:
Using data from the Spanish household budget survey, we investigate life-cycle effects on several product expenditures. A latent-variable model approach is adopted to evaluate the impact of income on expenditures, controlling for the number of members in the family. Two latent factors underlying repeated measures of monetary and non-monetary income are used as explanatory variables in the expenditure regression equations, thus avoiding possible bias associated to the measurement error in income. The proposed methodology also takes care of the case in which product expenditures exhibit a pattern of infrequent purchases. Multiple-group analysis is used to assess the variation of key parameters of the model across various household life-cycle typologies. The analysis discloses significant life-cycle effects on the mean levels of expenditures; it also detects significant life-cycle effects on the way expenditures are affected by income and family size. Asymptotic robust methods are used to account for possible non-normality of the data.