800 resultados para information bottleneck method


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information Retrieval is an important albeit imperfect component of information technologies. A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research. This study shows that this problem leads to a decrease of precision and recall, traditional measures of information retrieval effectiveness. This thesis presents an adaptive IR system based on the theory of adaptive dual control. The aim of the approach is the optimization of retrieval precision after all feedback has been issued. This is done by increasing the diversity of retrieved documents. This study shows that the value of recall reflects this diversity. The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic Information Retrieval theory. Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle. This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used). Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim. However, such models are computationally intractable. Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method. The proposed optimization models are based on several assumptions, starting with the assumption that Information Retrieval is conducted in units of topics. The use of clusters is the primary reasons why a new method of probability estimation is proposed. The adaptive dual control of topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents. The Wikipedia experiment revealed that the dual control feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied. In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm. The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR. The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the results from a study of information behaviors in the context of people's everyday lives undertaken in order to develop an integrated model of information behavior (IB). 34 participants from across 6 countries maintained a daily information journal or diary – mainly through a secure web log – for two weeks, to an aggregate of 468 participant days over five months. The text-rich diary data was analyzed using a multi-method qualitative-quantitative analysis in the following order: Grounded Theory analysis with manual coding, automated concept analysis using thesaurus-based visualization, and finally a statistical analysis of the coding data. The findings indicate that people engage in several information behaviors simultaneously throughout their everyday lives (including home and work life) and that sense-making is entangled in all aspects of them. Participants engaged in many of the information behaviors in a parallel, distributed, and concurrent fashion: many information behaviors for one information problem, one information behavior across many information problems, and many information behaviors concurrently across many information problems. Findings indicate also that information avoidance – both active and passive avoidance – is a common phenomenon and that information organizing behaviors or the lack thereof caused the most problems for participants. An integrated model of information behaviors is presented based on the findings.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A statistical modeling method to accurately determine combustion chamber resonance is proposed and demonstrated. This method utilises Markov-chain Monte Carlo (MCMC) through the use of the Metropolis-Hastings (MH) algorithm to yield a probability density function for the combustion chamber frequency and find the best estimate of the resonant frequency, along with uncertainty. The accurate determination of combustion chamber resonance is then used to investigate various engine phenomena, with appropriate uncertainty, for a range of engine cycles. It is shown that, when operating on various ethanol/diesel fuel combinations, a 20% substitution yields the least amount of inter-cycle variability, in relation to combustion chamber resonance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is a need for educational frameworks for computer ethics education. This discussion paper presents an approach to developing students’ moral sensitivity, an awareness of morally relevant issues, in project-based learning (PjBL). The proposed approach is based on a study of IT professionals’ levels of awareness of ethics. These levels are labelled My world, The corporate world, A shared world, The client’s world and The wider world. We give recommendations for how instructors may stimulate students’ thinking with the levels and how the levels may be taken into account in managing a project course and in an IS department. Limitations of the recommendations are assessed and issues for discussion are raised.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information behaviour (IB) is an area within Library and Information Science that studies the totality of human behaviour in relation to information, both active and passive, along with the explicit and the tacit mental states related to information. This study reports on a recently completed dissertation research that integrates the different models of information behaviours using a diary study where 34 participants maintained a daily journal for two weeks through a web log or paper diary. This resulted in thick descriptions of IB, which were manually analysed using the Grounded Theory method of inquiry, and then cross-referenced through both text-analysis and statistical analysis programs. Among the many key findings of this study, one is the focus this paper: how participants express their feelings of the information seeking process and their mental and affective states related specifically to the sense-making component which co-occurs with almost every other aspect of information behaviour. The paper title – Down the Rabbit Hole and Through the Looking Glass – refers to an observation that some of the participants made in their journals when they searched for, or avoided information, and wrote that they felt like they have fallen into a rabbit hole where nothing made sense, and reported both positive feelings of surprise and amazement, and negative feelings of confusion, puzzlement, apprehensiveness, frustration, stress, ambiguity, and fatigue. The study situates this sense-making aspects of IB within an overarching model of information behaviour that includes IB concepts like monitoring information, encountering information, information seeking and searching, flow, multitasking, information grounds, information horizons, and more, and proposes an integrated model of information behaviour illuminating how these different concepts are interleaved and inter-connected with each other, along with it's implications for information services.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recommender systems are one of the recent inventions to deal with ever growing information overload. Collaborative filtering seems to be the most popular technique in recommender systems. With sufficient background information of item ratings, its performance is promising enough. But research shows that it performs very poor in a cold start situation where previous rating data is sparse. As an alternative, trust can be used for neighbor formation to generate automated recommendation. User assigned explicit trust rating such as how much they trust each other is used for this purpose. However, reliable explicit trust data is not always available. In this paper we propose a new method of developing trust networks based on user’s interest similarity in the absence of explicit trust data. To identify the interest similarity, we have used user’s personalized tagging information. This trust network can be used to find the neighbors to make automated recommendations. Our experiment result shows that the proposed trust based method outperforms the traditional collaborative filtering approach which uses users rating data. Its performance improves even further when we utilize trust propagation techniques to broaden the range of neighborhood.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Trust can be used for neighbor formation to generate automated recommendations. User assigned explicit rating data can be used for this purpose. However, the explicit rating data is not always available. In this paper we present a new method of generating trust network based on user’s interest similarity. To identify the interest similarity, we use user’s personalized tag information. This trust network can be used to find the neighbors to make automated recommendation. Our experiment result shows that the precision of the proposed method outperforms the traditional collaborative filtering approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, I show clear links between the theoretical underpinnings of SFL and those of specific sociological, anthropological, and communication research traditions. My purpose in doing so is to argue that SFL is an excellent interdisciplinary research method for the social sciences, especially considering the emergent form of political economy being touted by new media enthusiasts: the so called knowledge (or information) economy. To demonstrate the flexibility and salience of SFL in diverse traditions of social research, and as evidence of its ability to be deployed as a flexible research method across formerly impermeable disciplinary and social boundaries, I use analyses from my doctoral research, relating these - theoretically speaking - to specific research traditions in sociology, communication, and anthropology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims. This article is a report of a study done to identify how renal nurses experience information about renal care and the information practices that they used to support everyday practice. Background. What counts as nursing knowledge remains a contested area in the discipline yet little research has been undertaken. Information practice encompasses a range of activities such as seeking, evaluation and sharing of information. The ability to make informed judgement is dependent on nurses being able to identify relevant sources of information that inform their practice and those sources of information may enable the identification of what knowledge is important to nursing practice. Method. The study was philosophically framed from a practice perspective and informed by Habermas and Schatzki; it employed qualitative research techniques. Using purposive sampling six registered nurses working in two regional renal units were interviewed during 2009 and data was thematically analysed. Findings. The information practices of renal nurses involved mapping an information landscape in which they drew on information obtained from epistemic, social and corporeal sources. They also used coupling, a process of drawing together information from a range of sources, to enable them to practice. Conclusion. Exploring how nurses engage with information, and the role the information plays in situating and enacting epistemic, social and corporeal knowledge into everyday nursing practice is instructive because it indicates that nurses must engage with all three modalities in order to perform effectively, efficiently and holistically in the context of patient care. © 2011 The Authors. Journal of Advanced Nursing © 2011 Blackwell Publishing Ltd.