992 resultados para Latent Semantic Indexing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper demonstrates a potential application for latent semantic analysis and similar techniques in visualising the differences between two levels of knowledge about a risk issue. The HIV/AIDS risk issue will be examined and the semantic clusters of key words in a technical corpora derived from specific literature about HIV/AIDS will be compared with the semantic clusters of those in more general corpora. It is hoped that these comparisons will create a fast and efficient complementary approach to the articulation of mental models of risk issues that could be used to target possible inconsistencies between expert and lay mental models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La diversification des résultats de recherche (DRR) vise à sélectionner divers documents à partir des résultats de recherche afin de couvrir autant d’intentions que possible. Dans les approches existantes, on suppose que les résultats initiaux sont suffisamment diversifiés et couvrent bien les aspects de la requête. Or, on observe souvent que les résultats initiaux n’arrivent pas à couvrir certains aspects. Dans cette thèse, nous proposons une nouvelle approche de DRR qui consiste à diversifier l’expansion de requête (DER) afin d’avoir une meilleure couverture des aspects. Les termes d’expansion sont sélectionnés à partir d’une ou de plusieurs ressource(s) suivant le principe de pertinence marginale maximale. Dans notre première contribution, nous proposons une méthode pour DER au niveau des termes où la similarité entre les termes est mesurée superficiellement à l’aide des ressources. Quand plusieurs ressources sont utilisées pour DER, elles ont été uniformément combinées dans la littérature, ce qui permet d’ignorer la contribution individuelle de chaque ressource par rapport à la requête. Dans la seconde contribution de cette thèse, nous proposons une nouvelle méthode de pondération de ressources selon la requête. Notre méthode utilise un ensemble de caractéristiques qui sont intégrées à un modèle de régression linéaire, et génère à partir de chaque ressource un nombre de termes d’expansion proportionnellement au poids de cette ressource. Les méthodes proposées pour DER se concentrent sur l’élimination de la redondance entre les termes d’expansion sans se soucier si les termes sélectionnés couvrent effectivement les différents aspects de la requête. Pour pallier à cet inconvénient, nous introduisons dans la troisième contribution de cette thèse une nouvelle méthode pour DER au niveau des aspects. Notre méthode est entraînée de façon supervisée selon le principe que les termes reliés doivent correspondre au même aspect. Cette méthode permet de sélectionner des termes d’expansion à un niveau sémantique latent afin de couvrir autant que possible différents aspects de la requête. De plus, cette méthode autorise l’intégration de plusieurs ressources afin de suggérer des termes d’expansion, et supporte l’intégration de plusieurs contraintes telles que la contrainte de dispersion. Nous évaluons nos méthodes à l’aide des données de ClueWeb09B et de trois collections de requêtes de TRECWeb track et montrons l’utilité de nos approches par rapport aux méthodes existantes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Resumen tomado del autor

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este trabajo es un análisis de carácter investigativo que busca explorar los principales rasgos que describen y constituyen la identidad de dos instituciones de Educación Superior en Colombia. Se busca identificar las características de convergencia y divergencia entre ambas, así como indagar acerca del impacto que tienen los procesos de cambio en la conformación de una identidad sólida que les permita ser perdurables en el tiempo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To enable high-level semantic indexing of video, we tackle the problem of automatically structuring motion pictures into meaningful story units, namely scenes. In our recent work, drawing guidance from film grammar, we proposed an algorithmic solution for extracting scenes in motion pictures based on a shot neighborhood color coherence measure. In this paper, we extend our work by presenting various refinement mechanisms, inspired by the knowledge of film devices that are brought to bear while crafting scenes, to further improve the results of the scene detection algorithm. We apply the enhanced algorithm to ten motion pictures and demonstrate the resulting improvements in performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Questa tesi riguarda lo sviluppo di un'applicazione che sfrutta le tecnologie del Web Semantico e del Text Mining. L'applicazione rappresenta l'estensione di un lavoro relativo ad una tesi precedente, aggiungendo ad esso la funzionalità di ricerca semantica. Tale funzionalità permette il recupero di informazioni che con il metodo di ricerca normale non verrebbero considerate. Per raggiungere questo risultato si utilizza WordNet, un database semantico-lessicale, e una libreria per la Latent Semantic Analysis, una tecnica del Text Mining.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation examines the role of topic knowledge (TK) in comprehension among typical readers and those with Specifically Poor Comprehension (SPC), i.e., those who demonstrate deficits in understanding what they read despite adequate decoding. Previous studies of poor comprehension have focused on weaknesses in specific skills, such as word decoding and inferencing ability, but this dissertation examined a different factor: whether deficits in availability and use of TK underlie poor comprehension. It is well known that TK tends to facilitate comprehension among typical readers, but its interaction with working memory and word decoding is unclear, particularly among participants with deficits in these skills. Across several passages, we found that SPCs do in fact have less TK to assist their interpretation of a text. However, we found no evidence that deficits in working memory or word decoding ability make it difficult for children to benefit from their TK when they have it. Instead, children across the skill spectrum are able to draw upon TK to assist their interpretation of a passage. Because TK is difficult to assess and studies vary in methodology, another goal of this dissertation was to compare two methods for measuring it. Both approaches score responses to a concept question to assess TK, but in the first, a human rater assigns a score whereas in the second, a computer algorithm, Latent Semantic Analysis (LSA; Landauer & Dumais, 1997) assigns a score. We found similar results across both methods of assessing TK, suggesting that a continuous measure is not appreciably more sensitive to variations in knowledge than discrete human ratings. This study contributes to our understanding of how best to measure TK, the factors that moderate its relationship with recall, and its role in poor comprehension. The findings suggest that teaching practices that focus on expanding TK are likely to improve comprehension across readers with a variety of abilities.