15 resultados para 080704 Information Retrieval and Web Search
em Bulgarian Digital Mathematics Library at IMI-BAS
Resumo:
Search engines sometimes apply the search on the full text of documents or web-pages; but sometimes they can apply the search on selected parts of the documents only, e.g. their titles. Full-text search may consume a lot of computing resources and time. It may be possible to save resources by applying the search on the titles of documents only, assuming that a title of a document provides a concise representation of its content. We tested this assumption using Google search engine. We ran search queries that have been defined by users, distinguishing between two types of queries/users: queries of users who are familiar with the area of the search, and queries of users who are not familiar with the area of the search. We found that searches which use titles provide similar and sometimes even (slightly) better results compared to searches which use the full-text. These results hold for both types of queries/users. Moreover, we found an advantage in title-search when searching in unfamiliar areas because the general terms used in queries in unfamiliar areas match better with general terms which tend to be used in document titles.
Resumo:
Electronic publishing exploits numerous possibilities to present or exchange information and to communicate via most current media like the Internet. By utilizing modern Web technologies like Web Services, loosely coupled services, and peer-to-peer networks we describe the integration of an intelligent business news presentation and distribution network. Employing semantics technologies enables the coupling of multinational and multilingual business news data on a scalable international level and thus introduce a service quality that is not achieved by alternative technologies in the news distribution area so far. Architecturally, we identified the loose coupling of existing services as the most feasible way to address multinational and multilingual news presentation and distribution networks. Furthermore we semantically enrich multinational news contents by relating them using AI techniques like the Vector Space Model. Summarizing our experiences we describe the technical integration of semantics and communication technologies in order to create a modern international news network.
Resumo:
This paper introduces an encoding of knowledge representation statements as regular languages and proposes a two-phase approach to processing of explicitly declared conceptual information. The idea is presented for the simple conceptual graphs where conceptual pattern search is implemented by the so called projection operation. Projection calculations are organised into off-line preprocessing and run-time computations. This enables fast run-time treatment of NP-complete problems, given that the intermediate results of the off-line phase are kept in suitable data structures. The experiments with randomly-generated, middle-size knowledge bases support the claim that the suggested approach radically improves the run-time conceptual pattern search.
Resumo:
An ontological representation of buyer interests’ knowledge in process of e-commerce is proposed to use. It makes it more efficient to make a search of the most appropriate sellers via multiagent systems. An algorithm of a comparison of buyer ontology with one of e-shops (the taxonomies) and an e-commerce multiagent system are realised using ontology of information retrieval in distributed environment.
Resumo:
This is an extended version of an article presented at the Second International Conference on Software, Services and Semantic Technologies, Sofia, Bulgaria, 11–12 September 2010.
Resumo:
Similar to Genetic algorithm, Evolution strategy is a process of continuous reproduction, trial and selection. Each new generation is an improvement on the one that went before. This paper presents two different proposals based on the vector space model (VSM) as a traditional model in information Retrieval (TIR). The first uses evolution strategy (ES). The second uses the document centroid (DC) in query expansion technique. Then the results are compared; it was noticed that ES technique is more efficient than the other methods.
Resumo:
ACM Computing Classification System (1998): J.2.
Resumo:
The purpose of this work is the development of database of the distributed information measurement and control system that implements methods of optical spectroscopy for plasma physics research and atomic collisions and provides remote access to information and hardware resources within the Intranet/Internet networks. The database is based on database management system Oracle9i. Client software was realized in Java language. The software was developed using Model View Controller architecture, which separates application data from graphical presentation components and input processing logic. The following graphical presentations were implemented: measurement of radiation spectra of beam and plasma objects, excitation function for non-elastic collisions of heavy particles and analysis of data acquired in preceding experiments. The graphical clients have the following functionality of the interaction with the database: browsing information on experiments of a certain type, searching for data with various criteria, and inserting the information about preceding experiments.
Resumo:
The development of the distributed information measurement and control system for optical spectral research of particle beam and plasma objects and the execution of laboratory works on Physics and Engineering Department of Petrozavodsk State University are described. At the hardware level the system is represented by a complex of the automated workplaces joined into computer network. The key element of the system is the communication server, which supports the multi-user mode and distributes resources among clients, monitors the system and provides secure access. Other system components are formed by equipment servers (CАМАC and GPIB servers, a server for the access to microcontrollers MCS-196 and others) and the client programs that carry out data acquisition, accumulation and processing and management of the course of the experiment as well. In this work the designed by the authors network interface is discussed. The interface provides the connection of measuring and executive devices to the distributed information measurement and control system via Ethernet. This interface allows controlling of experimental parameters by use of digital devices, monitoring of experiment parameters by polling of analog and digital sensors. The device firmware is written in assembler language and includes libraries for Ethernet-, IP-, TCP- и UDP-packets forming.
Resumo:
Unlike other works of art (painting, sculpture, etc.) a musical composition should be performed, it should sound to become accessible. Therefore, the role of the musical masterly performance is extremely important. But presently it has increased in importance when music through mass communication media i.e. radio, television, sound recording becomes in the full sense of the word the property of millions. Art in all its genres as a means of information helps to recreate a picture of one or other epoch as a whole. Moreover, art has a profound impact on education: it can be positive or negative, creative or destructive. Let us dwell on such aspect of music as means of information and the value of musical mastery activity for brining information to hearers of the alternating generations. Unlike other works of art (painting, sculpture etc.) a musical composition should be performed, it should sound to become intelligible. Therefore, the role of the musical masterly performance is extremely important. But presently it becomes particularly great in the XXI century when music becomes a true property of the masses due to mass media – radio, television, sound recording.
Resumo:
The problems and methods for adaptive control and multi-agent processing of information in global telecommunication and computer networks (TCN) are discussed. Criteria for controllability and communication ability (routing ability) of dataflows are described. Multi-agent model for exchange of divided information resources in global TCN has been suggested. Peculiarities for adaptive and intelligent control of dataflows in uncertain conditions and network collisions are analyzed.
Resumo:
In the context of Software Reuse providing techniques to support source code retrieval has been widely experimented. However, much effort is required in order to find how to match classical Information Retrieval and source code characteristics and implicit information. Introducing linguistic theories in the software development process, in terms of documentation standardization may produce significant benefits when applying Information Retrieval techniques. The goal of our research is to provide a tool to improve source code search and retrieval In order to achieve this goal we apply some linguistic rules to the development process.
Resumo:
The present paper is devoted to creation of cryptographic data security and realization of the packet mode in the distributed information measurement and control system that implements methods of optical spectroscopy for plasma physics research and atomic collisions. This system gives a remote access to information and instrument resources within the Intranet/Internet networks. The system provides remote access to information and hardware resources for the natural sciences within the Intranet/Internet networks. The access to physical equipment is realized through the standard interface servers (PXI, CАМАC, and GPIB), the server providing access to Ethernet devices, and the communication server, which integrates the equipment servers into a uniform information system. The system is used to make research task in optical spectroscopy, as well as to support the process of education at the Department of Physics and Engineering of Petrozavodsk State University.
Resumo:
The paper presents in brief the “2nd Generation Open Access Infrastructure for Research in Europe” project (http://www.openaire.eu/) and what is done in Bulgaria during the last year in the area of open access to scientific information and data.
Resumo:
In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.