866 resultados para twitter, conversation retrieval


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Information Retrieval is an important albeit imperfect component of information technologies. A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research. This study shows that this problem leads to a decrease of precision and recall, traditional measures of information retrieval effectiveness. This thesis presents an adaptive IR system based on the theory of adaptive dual control. The aim of the approach is the optimization of retrieval precision after all feedback has been issued. This is done by increasing the diversity of retrieved documents. This study shows that the value of recall reflects this diversity. The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic Information Retrieval theory. Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle. This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used). Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim. However, such models are computationally intractable. Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method. The proposed optimization models are based on several assumptions, starting with the assumption that Information Retrieval is conducted in units of topics. The use of clusters is the primary reasons why a new method of probability estimation is proposed. The adaptive dual control of topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents. The Wikipedia experiment revealed that the dual control feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied. In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm. The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR. The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent years have seen an increased uptake of business process management technology in industries. This has resulted in organizations trying to manage large collections of business process models. One of the challenges facing these organizations concerns the retrieval of models from large business process model repositories. For example, in some cases new process models may be derived from existing models, thus finding these models and adapting them may be more effective than developing them from scratch. As process model repositories may be large, query evaluation may be time consuming. Hence, we investigate the use of indexes to speed up this evaluation process. Experiments are conducted to demonstrate that our proposal achieves a significant reduction in query evaluation time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a novel platform for underwater sensor networks to be used for long-term monitoring of coral reefs and �sheries. The sensor network consists of static and mobile underwater sensor nodes. The nodes communicate point-to-point using a novel high-speed optical communication system integrated into the TinyOS stack, and they broadcast using an acoustic protocol integrated in the TinyOS stack. The nodes have a variety of sensing capabilities, including cameras, water temperature, and pressure. The mobile nodes can locate and hover above the static nodes for data muling, and they can perform network maintenance functions such as deployment, relocation, and recovery. In this paper we describe the hardware and software architecture of this underwater sensor network. We then describe the optical and acoustic networking protocols and present experimental networking and data collected in a pool, in rivers, and in the ocean. Finally, we describe our experiments with mobility for data muling in this network.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most information retrieval (IR) models treat the presence of a term within a document as an indication that the document is somehow "about" that term, they do not take into account when a term might be explicitly negated. Medical data, by its nature, contains a high frequency of negated terms - e.g. "review of systems showed no chest pain or shortness of breath". This papers presents a study of the effects of negation on information retrieval. We present a number of experiments to determine whether negation has a significant negative affect on IR performance and whether language models that take negation into account might improve performance. We use a collection of real medical records as our test corpus. Our findings are that negation has some affect on system performance, but this will likely be confined to domains such as medical data where negation is prevalent.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This is an edited version of an interview recorded for Canadian Theatre Review in 1992. By that time Nowra had established a reputation as one of Australia's foremost playwrights. Part of the generation which succeeded the New Wave of the late 1960s and early 1970s, Nowra became known for a stylistic inventiveness which placed him outside the tradition of realist playwriting in Australia. The international outlook in his early plays, and the fact that he was not exclusively preoccupied with Australian settings and subject matter, was often a focal point in critical accounts of his work. In this interview Nowra discusses his 'internationalism', and a range of topics including the playwriting process; the presence of landscape in his plays; and the autobiographical elements in his work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A distinctive feature of Chinese test is that a Chinese document is a sequence of Chinese with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

How does the image of the future operate upon history, and upon national and individual identities? To what extent are possible futures colonized by the image? What are the un-said futurecratic discourses that underlie the image of the future? Such questions inspired the examination of Japan’s futures images in this thesis. The theoretical point of departure for this examination is Polak’s (1973) seminal research into the theory of the ‘image of the future’ and seven contemporary Japanese texts which offer various alternative images for Japan’s futures, selected as representative of a ‘national conversation’ about the futures of that nation. These seven images of the future are: 1. Report of the Prime Minister’s Commission on Japan’s Goals in the 21st Century—The Frontier Within: Individual Empowerment and Better Governance in the New Millennium, compiled by a committee headed by Japan’s preeminent Jungian psychologist Kawai Hayao (1928-2007); 2. Slow Is Beautiful—a publication by Tsuji Shinichi, in which he re-images Japan as a culture represented by the metaphor of the sloth, concerned with slow and quality-oriented livingry as a preferred image of the future to Japan’s current post-bubble cult of speed and economic efficiency; 3. MuRatopia is an image of the future in the form of a microcosmic prototype community and on-going project based on the historically significant island of Awaji, and established by Japanese economist and futures thinker Yamaguchi Kaoru; 4. F.U.C.K, I Love Japan, by author Tanja Yujiro provides this seven text image of the future line-up with a youth oriented sub-culture perspective on that nation’s futures; 5. IMAGINATION / CREATION—a compilation of round table discussions about Japan’s futures seen from the point of view of Japan’s creative vanguard; 6. Visionary People in a Visionless Country: 21 Earth Connecting Human Stories is a collection of twenty one essays compiled by Denmark born Tokyo resident Peter David Pedersen; and, 7. EXODUS to the Land of Hope, authored by Murakami Ryu, one of Japan’s most prolific and influential writers, this novel suggests a future scenario portraying a massive exodus of Japan’s youth, who, literate with state-of-the-art information and communication technologies (ICTs) move en masse to Japan’s northern island of Hokkaido to launch a cyber-revolution from the peripheries. The thesis employs a Futures Triangle Analysis (FTA) as the macro organizing framework and as such examines both pushes of the present and weights from the past before moving to focus on the pulls to the future represented by the seven texts mentioned above. Inayatullah’s (1999) Causal Layered Analysis (CLA) is the analytical framework used in examining the texts. Poststructuralist concepts derived primarily from the work of Michel Foucault are a particular (but not exclusive) reference point for the analytical approach it encompasses. The research questions which reflect the triangulated analytic matrix are: 1. What are the pushes—in terms of current trends—that are affecting Japan’s futures? 2. What are the historical and cultural weights that influence Japan’s futures? 3. What are the emerging transformative Japanese images of the future discourses, as embodied in actual texts, and what potential do they offer for transformative change in Japan? Research questions one and two are discussed in Chapter five and research question three is discussed in Chapter six. The first two research questions should be considered preliminary. The weights outlined in Chapter five indicate that the forces working against change in Japan are formidable, structurally deep-rooted, wide-spread, and under-recognized as change-adverse. Findings and analyses of the push dimension reveal strong forces towards a potentially very different type of Japan. However it is the seven contemporary Japanese images of the future, from which there is hope for transformative potential, which form the analytical heart of the thesis. In analyzing these texts the thesis establishes the richness of Japan’s images of the future and, as such, demonstrates the robustness of Japan’s stance vis-à-vis the problem of a perceived map-less and model-less future for Japan. Frontier is a useful image of the future, whose hybrid textuality, consisting of government, business, academia, and creative minority perspectives, demonstrates the earnestness of Japan’s leaders in favour of the creation of innovative futures for that nation. Slow is powerful in its aim to reconceptualize Japan’s philosophies of temporality, and build a new kind of nation founded on the principles of a human-oriented and expanded vision of economy based around the core metaphor of slowness culture. However its viability in Japan, with its post-Meiji historical pushes to an increasingly speed-obsessed social construction of reality, could render it impotent. MuRatopia is compelling in its creative hybridity indicative of an advanced IT society, set in a modern day utopian space based upon principles of a high communicative social paradigm, and sustainability. IMAGINATION / CREATION is less the plan than the platform for a new discussion on Japan’s transformation from an econo-centric social framework to a new Creative Age. It accords with emerging discourses from the Creative Industries, which would re-conceive of Japan as a leading maker of meaning, rather than as the so-called guzu, a term referred to in the book meaning ‘laggard’. In total, Love Japan is still the most idiosyncratic of all the images of the future discussed. Its communication style, which appeals to Japan’s youth cohort, establishes it as a potentially formidable change agent in a competitive market of futures images. Visionary People is a compelling image for its revolutionary and subversive stance against Japan’s vision-less political leadership, showing that it is the people, not the futures-making elite or aristocracy who must take the lead and create a new vanguard for the nation. Finally, Murakami’s Exodus cannot be ruled out as a compelling image of the future. Sharing the appeal of Tanja’s Love Japan to an increasingly disenfranchised youth, Exodus portrays a near-term future that is achievable in the here and now, by Japan’s teenagers, using information and communications technologies (ICTs) to subvert leadership, and create utopianist communities based on alternative social principles. The principal contribution from this investigation in terms of theory belongs to that of developing the Japanese image of the future. In this respect, the literature reviews represent a significant compilation, specifically about Japanese futures thinking, the Japanese image of the future, and the Japanese utopia. Though not exhaustive, this compilation will hopefully serve as a useful starting point for future research, not only for the Japanese image of the future, but also for all image of the future research. Many of the sources are in Japanese and their English summations are an added reason to respect this achievement. Secondly, the seven images of the future analysed in Chapter six represent the first time that Japanese image of the future texts have been systematically organized and analysed. Their translation from Japanese to English can be claimed as a significant secondary contribution. What is more, they have been analysed according to current futures methodologies that reveal a layeredness, depth, and overall richness existing in Japanese futures images. Revealing this image-richness has been one of the most significant findings of this investigation, suggesting that there is fertile research to be found from this still under-explored field, whose implications go beyond domestic Japanese concerns, and may offer fertile material for futures thinkers and researchers, Japanologists, social planners, and policy makers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a framework for evaluating information retrieval of medical records. We use the BLULab corpus, a large collection of real-world de-identified medical records. The collection has been hand coded by clinical terminol- ogists using the ICD-9 medical classification system. The ICD codes are used to devise queries and relevance judge- ments for this collection. Results of initial test runs using a baseline IR system are provided. Queries and relevance judgements are online to aid further research in medical IR. Please visit: http://koopman.id.au/med_eval.