908 resultados para Information search
Resumo:
Experimental / pilot online journalistic publication. EUAustralia Online (www.euaustralia.com) is a pilot niche publication identifying and demonstrating dynamics of online journalism. The editor, an experienced and senior journalist and academic, specialist in European studies, commenced publication on 28.8.06 during one year’s “industry immersion” -- with media accreditation to the European Commission, Brussels. Reporting now is from Australia and from Europe on field trip exercises. Student editors participate making it partly a training operation. EUAustralia demonstrates adaptation of conventional, universal, “Western” liberal journalistic practices. Its first premise is to fill a knowledge gap in Australia about the European Union -- institutions, functions and directions. The second premise is to test the communications capacity of the online format, where the publication sets a strong standard of journalistic credibility – hence its transparency with sourcing or signposting of “commentary” or ”opinion”. EUAustralia uses modified, enhanced weblog software allowing for future allocation of closed pages to subscribers. An early exemplar of its kind, with modest upload rate (2010-13 average, 16 postings monthly), esteemed, it commands over 180000 site visits p.a. (half as unique visitors; AWB Statistics); strongly rated by search engines, see page one Googlr placements for “EU Australia”. Comment by the ISP (SeventhVision, Broadbeach, Queensland): “The site has good search engine recognition because seen as credible; can be used to generate revenue”. This journalistic exercise has been analysed in theoretical context twice, in published refereed conference proceedings (Communication and Media Policy Forum, Sydney; 2007, 2009).
Resumo:
Information Retrieval is an important albeit imperfect component of information technologies. A problem of insufficient diversity of retrieved documents is one of the primary issues studied in this research. This study shows that this problem leads to a decrease of precision and recall, traditional measures of information retrieval effectiveness. This thesis presents an adaptive IR system based on the theory of adaptive dual control. The aim of the approach is the optimization of retrieval precision after all feedback has been issued. This is done by increasing the diversity of retrieved documents. This study shows that the value of recall reflects this diversity. The Probability Ranking Principle is viewed in the literature as the “bedrock” of current probabilistic Information Retrieval theory. Neither the proposed approach nor other methods of diversification of retrieved documents from the literature conform to this principle. This study shows by counterexample that the Probability Ranking Principle does not in general lead to optimal precision in a search session with feedback (for which it may not have been designed but is actively used). Retrieval precision of the search session should be optimized with a multistage stochastic programming model to accomplish the aim. However, such models are computationally intractable. Therefore, approximate linear multistage stochastic programming models are derived in this study, where the multistage improvement of the probability distribution is modelled using the proposed feedback correctness method. The proposed optimization models are based on several assumptions, starting with the assumption that Information Retrieval is conducted in units of topics. The use of clusters is the primary reasons why a new method of probability estimation is proposed. The adaptive dual control of topic-based IR system was evaluated in a series of experiments conducted on the Reuters, Wikipedia and TREC collections of documents. The Wikipedia experiment revealed that the dual control feedback mechanism improves precision and S-recall when all the underlying assumptions are satisfied. In the TREC experiment, this feedback mechanism was compared to a state-of-the-art adaptive IR system based on BM-25 term weighting and the Rocchio relevance feedback algorithm. The baseline system exhibited better effectiveness than the cluster-based optimization model of ADTIR. The main reason for this was insufficient quality of the generated clusters in the TREC collection that violated the underlying assumption.
Resumo:
On the back of the growing capacity of networked digital information technologies to process and visualise large amounts of information in a timely, efficient and user-driven manner we have seen an increasing demand for better access to and re-use of public sector information (PSI). The story is not a new one. Share knowledge and together we can do great things; limit access and we reduce the potential for opportunity. The two volumes of this book seek to explain and analyse this global shift in the way we manage public sector information. In doing so they collect and present papers, reports and submissions on the topic by leading authors and institutions from across the world. These in turn provide people tasked with mapping out and implementing information policy with reference material and practical guidance. Volume 1 draws together papers on the topic by policymakers, academics and practitioners while Volume 2 presents a selection of the key reports and submissions that have been published over the last few years.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
Consider a person searching electronic health records, a search for the term ‘cracked skull’ should return documents that contain the term ‘cranium fracture’. A information retrieval systems is required that matches concepts, not just keywords. Further more, determining relevance of a query to a document requires inference – its not simply matching concepts. For example a document containing ‘dialysis machine’ should align with a query for ‘kidney disease’. Collectively we describe this problem as the ‘semantic gap’ – the difference between the raw medical data and the way a human interprets it. This paper presents an approach to semantic search of health records by combining two previous approaches: an ontological approach using the SNOMED CT medical ontology; and a distributional approach using semantic space vector space models. Our approach will be applied to a specific problem in health informatics: the matching of electronic patient records to clinical trials.
Resumo:
Most information retrieval (IR) models treat the presence of a term within a document as an indication that the document is somehow "about" that term, they do not take into account when a term might be explicitly negated. Medical data, by its nature, contains a high frequency of negated terms - e.g. "review of systems showed no chest pain or shortness of breath". This papers presents a study of the effects of negation on information retrieval. We present a number of experiments to determine whether negation has a significant negative affect on IR performance and whether language models that take negation into account might improve performance. We use a collection of real medical records as our test corpus. Our findings are that negation has some affect on system performance, but this will likely be confined to domains such as medical data where negation is prevalent.
Resumo:
User-Web interactions have emerged as an important research in the field of information science. In this study, we examine extensively the Web searching performed by general users. Our goal is to investigate the effects of users’ cognitive styles on their Web search behavior in relation to two broad components: Information Searching and Information Processing Approaches. We use questionnaires, a measure of cognitive style, Web session logs and think-aloud as the data collection instruments. Our study findings show wholistic Web users tend to adopt a top-down approach to Web searching, where the users searched for a generic topic, and then reformulate their queries to search for specific information. They tend to prefer reading to process information. Analytic users tend to prefer a bottom-up approach to information searching and they process information by scanning search result pages.
Resumo:
Aim. This paper is a report of a review conducted to identify (a) best practice in information transfer from the emergency department for multi-trauma patients; (b) conduits and barriers to information transfer in trauma care and related settings; and (c) interventions that have an impact on information communication at handover and beyond. Background. Information transfer is integral to effective trauma care, and communication breakdown results in important challenges to this. However, evidence of adequacy of structures and processes to ensure transfer of patient information through the acute phase of trauma care is limited. Data sources. Papers were sourced from a search of 12 online databases and scanning references from relevant papers for 1990–2009. Review methods. The review was conducted according to the University of York’s Centre for Reviews and Dissemination guidelines. Studies were included if they concerned issues that influenced information transfer for patients in healthcare settings. Results. Forty-five research papers, four literature reviews and one policy statement were found to be relevant to parts of the topic, but not all of it. The main issues emerging concerned the impact of communication breakdown in some form, and included communication issues within trauma team processes, lack of structure and clarity during handovers including missing, irrelevant and inaccurate information, distractions and poorly documented care. Conclusion. Many factors influence information transfer but are poorly identified in relation to trauma care. The measurement of information transfer, which is integral to patient handover, has not been the focus of research to date. Nonetheless, documented patient information is considered evidence of care and a resource that affects continuing care.
Resumo:
Many researchers have investigated and modelled aspects of Web searching. A number of studies have explored the relationships between individual differences and Web searching. However, limited studies have explored the role of users’ cognitive styles in determining Web searching behaviour. Current models of Web searching have limited consideration of users’ cognitive styles. The impact of users’ cognitive style on Web searching and their relationships are little understood or represented. Individuals differ in their information processing approaches and the way they represent information, thus affecting their performance. To create better models of Web searching we need to understand more about user’s cognitive style and their Web search behaviour, and the relationship between them. More rigorous research is needed in using more complex and meaningful measures of relevance; across a range of different types of search tasks and different populations of Internet users. The project further explores the relationships between the users’ cognitive style and their Web searching. The project will develop a model depicting the relationships between a user’s cognitive style and their Web searching. The related literature, aims and objectives and research design are discussed.
Resumo:
The traditional searching method for model-order selection in linear regression is a nested full-parameters-set searching procedure over the desired orders, which we call full-model order selection. On the other hand, a method for model-selection searches for the best sub-model within each order. In this paper, we propose using the model-selection searching method for model-order selection, which we call partial-model order selection. We show by simulations that the proposed searching method gives better accuracies than the traditional one, especially for low signal-to-noise ratios over a wide range of model-order selection criteria (both information theoretic based and bootstrap-based). Also, we show that for some models the performance of the bootstrap-based criterion improves significantly by using the proposed partial-model selection searching method. Index Terms— Model order estimation, model selection, information theoretic criteria, bootstrap 1. INTRODUCTION Several model-order selection criteria can be applied to find the optimal order. Some of the more commonly used information theoretic-based procedures include Akaike’s information criterion (AIC) [1], corrected Akaike (AICc) [2], minimum description length (MDL) [3], normalized maximum likelihood (NML) [4], Hannan-Quinn criterion (HQC) [5], conditional model-order estimation (CME) [6], and the efficient detection criterion (EDC) [7]. From a practical point of view, it is difficult to decide which model order selection criterion to use. Many of them perform reasonably well when the signal-to-noise ratio (SNR) is high. The discrepancies in their performance, however, become more evident when the SNR is low. In those situations, the performance of the given technique is not only determined by the model structure (say a polynomial trend versus a Fourier series) but, more importantly, by the relative values of the parameters within the model. This makes the comparison between the model-order selection algorithms difficult as within the same model with a given order one could find an example for which one of the methods performs favourably well or fails [6, 8]. Our aim is to improve the performance of the model order selection criteria in cases where the SNR is low by considering a model-selection searching procedure that takes into account not only the full-model order search but also a partial model order search within the given model order. Understandably, the improvement in the performance of the model order estimation is at the expense of additional computational complexity.
Resumo:
Intelligent agents are an advanced technology utilized in Web Intelligence. When searching information from a distributed Web environment, information is retrieved by multi-agents on the client site and fused on the broker site. The current information fusion techniques rely on cooperation of agents to provide statistics. Such techniques are computationally expensive and unrealistic in the real world. In this paper, we introduce a model that uses a world ontology constructed from the Dewey Decimal Classification to acquire user profiles. By search using specific and exhaustive user profiles, information fusion techniques no longer rely on the statistics provided by agents. The model has been successfully evaluated using the large INEX data set simulating the distributed Web environment.
Resumo:
As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.
Resumo:
Most web service discovery systems use keyword-based search algorithms and, although partially successful, sometimes fail to satisfy some users information needs. This has given rise to several semantics-based approaches that look to go beyond simple attribute matching and try to capture the semantics of services. However, the results reported in the literature vary and in many cases are worse than the results obtained by keyword-based systems. We believe the accuracy of the mechanisms used to extract tokens from the non-natural language sections of WSDL files directly affects the performance of these techniques, because some of them can be more sensitive to noise. In this paper three existing tokenization algorithms are evaluated and a new algorithm that outperforms all the algorithms found in the literature is introduced.