11 resultados para XML, Information, Retrieval, Query, Language

em Bulgarian Digital Mathematics Library at IMI-BAS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Similar to Genetic algorithm, Evolution strategy is a process of continuous reproduction, trial and selection. Each new generation is an improvement on the one that went before. This paper presents two different proposals based on the vector space model (VSM) as a traditional model in information Retrieval (TIR). The first uses evolution strategy (ES). The second uses the document centroid (DC) in query expansion technique. Then the results are compared; it was noticed that ES technique is more efficient than the other methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An ontological representation of buyer interests’ knowledge in process of e-commerce is proposed to use. It makes it more efficient to make a search of the most appropriate sellers via multiagent systems. An algorithm of a comparison of buyer ontology with one of e-shops (the taxonomies) and an e-commerce multiagent system are realised using ontology of information retrieval in distributed environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is an extended version of an article presented at the Second International Conference on Software, Services and Semantic Technologies, Sofia, Bulgaria, 11–12 September 2010.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an adaptive method using genetic algorithm to modify user’s queries, based on relevance judgments. This algorithm was adapted for the three well-known documents collections (CISI, NLP and CACM). The method is shown to be applicable to large text collections, where more relevant documents are presented to users in the genetic modification. The algorithm shows the effects of applying GA to improve the effectiveness of queries in IR systems. Further studies are planned to adjust the system parameters to improve its effectiveness. The goal is to retrieve most relevant documents with less number of non-relevant documents with respect to user's query in information retrieval system using genetic algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

В статье представлены промежуточные результаты реализации комплексного подхода к разработке подсистемы управления электронными документами в CASE-системе METAS, предназначенной для создания распределенных информационных систем, допускающих динамическую настройку на меняющиеся условия эксплуатации и потребности пользователей. Предлагается значительно увеличить эффективность работы с электронными документами за счет их автоматизированного интеллектуального анализа. В предлагаемом решении для анализа документов используются агентный и онтологический подходы. Онтологии позволяют в явном виде представить семантику и структуру документа. Использование агентов позволяет упростить процесс анализа, сделать его расширяемым и масштабируемым. Результаты интеллектуального поиска и обработки документов, получаемых из гетерогенных источников, могут быть использованы не только для автоматической классификации и каталогизации документов в информационной системе в удобной для пользователя форме, но и для снижения трудоемкости выполнения этапа анализа предметной области информационной системы, ее проектирования, а также для интеллектуализации процессов создания отчетных документов на основе информации, размещенной в базе данных системы.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we study some of the characteristics of the art painting image color semantics. We analyze the color features of differ- ent artists and art movements. The analysis includes exploration of hue, saturation and luminance. We also use quartile’s analysis to obtain the dis- tribution of the dispersion of defined groups of paintings and measure the degree of purity for these groups. A special software system “Art Paint- ing Image Color Semantics” (APICSS) for image analysis and retrieval was created. The obtained result can be used for automatic classification of art paintings in image retrieval systems, where the indexing is based on color characteristics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present algorithms which work on pairs of 0,1- matrices which multiply again a matrix of zero and one entries. When applied over a pair, the algorithms change the number of non-zero entries present in the matrices, meanwhile their product remains unchanged. We establish the conditions under which the number of 1s decreases. We recursively define as well pairs of matrices which product is a specific matrix and such that by applying on them these algorithms, we minimize the total number of non-zero entries present in both matrices. These matrices may be interpreted as solutions for a well known information retrieval problem, and in this case the number of 1 entries represent the complexity of the retrieve and information update operations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Search engines sometimes apply the search on the full text of documents or web-pages; but sometimes they can apply the search on selected parts of the documents only, e.g. their titles. Full-text search may consume a lot of computing resources and time. It may be possible to save resources by applying the search on the titles of documents only, assuming that a title of a document provides a concise representation of its content. We tested this assumption using Google search engine. We ran search queries that have been defined by users, distinguishing between two types of queries/users: queries of users who are familiar with the area of the search, and queries of users who are not familiar with the area of the search. We found that searches which use titles provide similar and sometimes even (slightly) better results compared to searches which use the full-text. These results hold for both types of queries/users. Moreover, we found an advantage in title-search when searching in unfamiliar areas because the general terms used in queries in unfamiliar areas match better with general terms which tend to be used in document titles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the context of Software Reuse providing techniques to support source code retrieval has been widely experimented. However, much effort is required in order to find how to match classical Information Retrieval and source code characteristics and implicit information. Introducing linguistic theories in the software development process, in terms of documentation standardization may produce significant benefits when applying Information Retrieval techniques. The goal of our research is to provide a tool to improve source code search and retrieval In order to achieve this goal we apply some linguistic rules to the development process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper discusses the Europeana Creative project which aims to facilitate re-use of cultural heritage metadata and content by the creative industries. The paper focuses on the contribution of Ontotext to the project activities. The Europeana Data Model (EDM) is further discussed as a new proposal for structuring the data that Europeana will ingest, manage and publish. The advantages of using EDM instead of the current ESE metadata set are highlighted. Finally, Ontotext’s EDM Endpoint is presented, based on OWLIM semantic repository and SPARQL query language. A user-friendly RDF view is presented in order to illustrate the possibilities of Forest - an extensible modular user interface framework for creating linked data and semantic web applications.