924 resultados para NUDIST (Information retrieval system)
Resumo:
This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.
Resumo:
Ideally, one would like to perform image search using an intuitive and friendly approach. Many existing image search engines, however, present users with sets of images arranged in some default order on the screen, typically the relevance to a query, only. While this certainly has its advantages, arguably, a more flexible and intuitive way would be to sort images into arbitrary structures such as grids, hierarchies, or spheres so that images that are visually or semantically alike are placed together. This paper focuses on designing such a navigation system for image browsers. This is a challenging task because arbitrary layout structure makes it difficult - if not impossible - to compute cross-similarities between images and structure coordinates, the main ingredient of traditional layouting approaches. For this reason, we resort to a recently developed machine learning technique: kernelized sorting. It is a general technique for matching pairs of objects from different domains without requiring cross-domain similarity measures and hence elegantly allows sorting images into arbitrary structures. Moreover, we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way. Copyright 2010 ACM.
Resumo:
I have invented "Internet Fish," a novel class of resource-discovery tools designed to help users extract useful information from the Internet. Internet Fish (IFish) are semi-autonomous, persistent information brokers; users deploy individual IFish to gather and refine information related to a particular topic. An IFish will initiate research, continue to discover new sources of information, and keep tabs on new developments in that topic. As part of the information-gathering process the user interacts with his IFish to find out what it has learned, answer questions it has posed, and make suggestions for guidance. Internet Fish differ from other Internet resource discovery systems in that they are persistent, personal and dynamic. As part of the information-gathering process IFish conduct extended, long-term conversations with users as they explore. They incorporate deep structural knowledge of the organization and services of the net, and are also capable of on-the-fly reconfiguration, modification and expansion. Human users may dynamically change the IFish in response to changes in the environment, or IFish may initiate such changes itself. IFish maintain internal state, including models of its own structure, behavior, information environment and its user; these models permit an IFish to perform meta-level reasoning about its own structure. To facilitate rapid assembly of particular IFish I have created the Internet Fish Construction Kit. This system provides enabling technology for the entire class of Internet Fish tools; it facilitates both creation of new IFish as well as additions of new capabilities to existing ones. The Construction Kit includes a collection of encapsulated heuristic knowledge modules that may be combined in mix-and-match fashion to create a particular IFish; interfaces to new services written with the Construction Kit may be immediately added to "live" IFish. Using the Construction Kit I have created a demonstration IFish specialized for finding World-Wide Web documents related to a given group of documents. This "Finder" IFish includes heuristics that describe how to interact with the Web in general, explain how to take advantage of various public indexes and classification schemes, and provide a method for discovering similarity relationships among documents.
Resumo:
Urquhart, C., Lonsdale, R.,Thomas, R., Spink, S., Yeoman, A., Armstrong, C. & Fenton, R. (2003). Uptake and use of electronic information services: trends in UK higher education from the JUSTEIS project. Program, 37(3), 167-180. Sponsorship: JISC
Resumo:
Poster pokazuje metody komunikacji z czytelnikiem stosowane w Bibliotece Uniwersyteckiej w Poznaniu w technologii mediów cyfrowych. Cyfrowe narzędzia komunikacji stały się bardzo pomocne, niemal niezbędne w pozyskiwaniu nowych czytelników, podtrzymywaniu i rozwijaniu współpracy w społeczności w sieci Web.2.0, zarówno tej globalnej, jak i lokalnej akademickiej. Strona WWW jako statyczna komunikacyjnie jest wspierana przez fora dyskusyjne, chaty, wideokonferencje, warsztaty informacyjne, które są prowadzone w czasie rzeczywistym. Twórczą siłę relacji społecznych z biblioteką rozwinęły interaktywne serwisy społecznościowe (Facebook) oraz komunikatory internetowe integrowane na platformie Ask a Librarian. Biblioteka stała się Biblioteką 2.0 ukierunkowaną na komunikację z czytelnikiem. Aktywne uczestnictwo i udział czytelników przy tworzeniu zasobów naukowych wdrożyliśmy w projekcie instytucjonalnego repozytorium - Adam Mickiewicz Repository (AMUR). Biblioteka zmienia się dla czytelników i z czytelnikami. Wykorzystywane platformy i serwisy społecznościowe dostarczają unikatowych danych o nowych potrzebach informacyjnych i oczekiwaniach docelowego Patrona 2.0, co skutkuje doskonaleniu usług istniejących i tworzeniu nowych. Biblioteka monitoruje usługi i potrzeby czytelników przez prowadzone badania społeczne. Technologie cyfrowe stosowane w komunikacji sprawiają, iż biblioteka staje się bliższa, bardziej dostępna, aby stać się w rezultacie partnerem dla stałych i nowych czytelników. Biblioteka Uniwersytecka w Poznaniu bierze udział w programach europejskich w zakresie katalogowania i digitalizacji zasobu biblioteki cyfrowej WBC, w zakresie wdrożenia nowych technologii i rozwiązań podnoszących jakość usług bibliotecznych, działalności kulturotwórczej (Poznańska Dyskusyjna Akademia Kominksu, deBiUty) i edukacji informacyjnej. Biblioteka Uniwersytecka w Poznaniu jest członkiem organizacji międzynarodowych: LIBER (Liga Europejskich Bibliotek Naukowych), IAML (Stowarzyszenie Bibliotek Muzycznych, Archiwów i Ośrodków Dokumentacji), CERL - Europejskie Konsorcjum Bibliotek Naukowych.
Resumo:
Mapping novel terrain from sparse, complex data often requires the resolution of conflicting information from sensors working at different times, locations, and scales, and from experts with different goals and situations. Information fusion methods help resolve inconsistencies in order to distinguish correct from incorrect answers, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods developed here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an objects class is car, vehicle, or man-made. Underlying relationships among objects are assumed to be unknown to the automated system of the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchial knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples.
Resumo:
Classifying novel terrain or objects front sparse, complex data may require the resolution of conflicting information from sensors working at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods described here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among objects are assumed to be unknown to the automated system or the human user. The ARTMAP information fusion system used distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchical knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships.
Resumo:
Classifying novel terrain or objects from sparse, complex data may require the resolution of conflicting information from sensors woring at different times, locations, and scales, and from sources with different goals and situations. Information fusion methods can help resolve inconsistencies, as when eveidence variously suggests that and object's class is car, truck, or airplane. The methods described her address a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an object's class is car, vehicle, and man-made. Underlying relationships among classes are assumed to be unknown to the autonomated system or the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierachical knowlege structures. The fusion system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples, but is not limited to image domain.
Resumo:
Accepted Version
Resumo:
To help design an environment in which professionals without legal training can make effective use of public sector legal information on planning and the environment - for Add-Wijzer, a European e-government project - we evaluated their perceptions of usefulness and usability. In concurrent think-aloud usability tests, lawyers and non-lawyers carried out information retrieval tasks on a range of online legal databases. We found that non-lawyers reported twice as many difficulties as those with legal training (p = 0.001), that the number of difficulties and the choice of database affected successful completion, and that the non-lawyers had surprisingly few problems understanding legal terminology. Instead, they had more problems understanding the syntactical structure of legal documents and collections. The results support the constraint attunement hypothesis (CAH) of the effects of expertise on information retrieval, with implications for the design of systems to support the effective understanding and use of information.
Resumo:
Latent semantic indexing (LSI) is a popular technique used in information retrieval (IR) applications. This paper presents a novel evaluation strategy based on the use of image processing tools. The authors evaluate the use of the discrete cosine transform (DCT) and Cohen Daubechies Feauveau 9/7 (CDF 9/7) wavelet transform as a pre-processing step for the singular value decomposition (SVD) step of the LSI system. In addition, the effect of different threshold types on the search results is examined. The results show that accuracy can be increased by applying both transforms as a pre-processing step, with better performance for the hard-threshold function. The choice of the best threshold value is a key factor in the transform process. This paper also describes the most effective structure for the database to facilitate efficient searching in the LSI system.
Resumo:
Face recognition with unknown, partial distortion and occlusion is a practical problem, and has a wide range of applications, including security and multimedia information retrieval. The authors present a new approach to face recognition subject to unknown, partial distortion and occlusion. The new approach is based on a probabilistic decision-based neural network, enhanced by a statistical method called the posterior union model (PUM). PUM is an approach for ignoring severely mismatched local features and focusing the recognition mainly on the reliable local features. It thereby improves the robustness while assuming no prior information about the corruption. We call the new approach the posterior union decision-based neural network (PUDBNN). The new PUDBNN model has been evaluated on three face image databases (XM2VTS, AT&T and AR) using testing images subjected to various types of simulated and realistic partial distortion and occlusion. The new system has been compared to other approaches and has demonstrated improved performance.
Resumo:
The electronic storage of medical patient data is becoming a daily experience in most of the practices and hospitals worldwide. However, much of the data available is in free-form text, a convenient way of expressing concepts and events, but especially challenging if one wants to perform automatic searches, summarization or statistical analysis. Information Extraction can relieve some of these problems by offering a semantically informed interpretation and abstraction of the texts. MedInX, the Medical Information eXtraction system presented in this document, is the first information extraction system developed to process textual clinical discharge records written in Portuguese. The main goal of the system is to improve access to the information locked up in unstructured text, and, consequently, the efficiency of the health care process, by allowing faster and reliable access to quality information on health, for both patient and health professionals. MedInX components are based on Natural Language Processing principles, and provide several mechanisms to read, process and utilize external resources, such as terminologies and ontologies, in the process of automatic mapping of free text reports onto a structured representation. However, the flexible and scalable architecture of the system, also allowed its application to the task of Named Entity Recognition on a shared evaluation contest focused on Portuguese general domain free-form texts. The evaluation of the system on a set of authentic hospital discharge letters indicates that the system performs with 95% F-measure, on the task of entity recognition, and 95% precision on the task of relation extraction. Example applications, demonstrating the use of MedInX capabilities in real applications in the hospital setting, are also presented in this document. These applications were designed to answer common clinical problems related with the automatic coding of diagnoses and other health-related conditions described in the documents, according to the international classification systems ICD-9-CM and ICF. The automatic review of the content and completeness of the documents is an example of another developed application, denominated MedInX Clinical Audit system.
Resumo:
We compare the effect of different text segmentation strategies on speech based passage retrieval of video. Passage retrieval has mainly been studied to improve document retrieval and to enable question answering. In these domains best results were obtained using passages defined by the paragraph structure of the source documents or by using arbitrary overlapping passages. For the retrieval of relevant passages in a video, using speech transcripts, no author defined segmentation is available. We compare retrieval results from 4 different types of segments based on the speech channel of the video: fixed length segments, a sliding window, semantically coherent segments and prosodic segments. We evaluated the methods on the corpus of the MediaEval 2011 Rich Speech Retrieval task. Our main conclusion is that the retrieval results highly depend on the right choice for the segment length. However, results using the segmentation into semantically coherent parts depend much less on the segment length. Especially, the quality of fixed length and sliding window segmentation drops fast when the segment length increases, while quality of the semantically coherent segments is much more stable. Thus, if coherent segments are defined, longer segments can be used and consequently less segments have to be considered at retrieval time.