973 resultados para Compressed text search
Resumo:
Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. As increasingly more material becomes available electronically the influence of search engines on our lives will continue to grow. This presents the problem of how to find what information is contained in each search engine, what bias a search engine may have, and how to select the best search engine for a particular information need. This research introduces a new method, search engine content analysis, in order to solve the above problem. Search engine content analysis is a new development of traditional information retrieval field called collection selection, which deals with general information repositories. Current research in collection selection relies on full access to the collection or estimations of the size of the collections. Also collection descriptions are often represented as term occurrence statistics. An automatic ontology learning method is developed for the search engine content analysis, which trains an ontology with world knowledge of hundreds of different subjects in a multilevel taxonomy. This ontology is then mined to find important classification rules, and these rules are used to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Instead of representing collections as a set of terms, which commonly occurs in collection selection, they are represented as a set of subjects, leading to a more robust representation of information and a decrease of synonymy. The ontology based method was compared with ReDDE (Relevant Document Distribution Estimation method for resource selection) using the standard R-value metric, with encouraging results. ReDDE is the current state of the art collection selection method which relies on collection size estimation. The method was also used to analyse the content of the most popular search engines in use today, including Google and Yahoo. In addition several specialist search engines such as Pubmed and the U.S. Department of Agriculture were analysed. In conclusion, this research shows that the ontology based method mitigates the need for collection size estimation.
Resumo:
The field of research training (for students and supervisors) is becoming more heavily regulated by the Federal Government. At the same time, quality improvement imperatives are requiring staff across the University to have better access to information and knowledge about a wider range of activities each year. Within the Creative Industries Faculty at the Queensland University of Technology (QUT), the training provided to academic and research staff is organised differently and individually. This session will involve discussion of the dichotomies found in this differentiated approach to staff training, and begin a search for best practice through interaction and input from the audience.
Resumo:
This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.
Resumo:
In this paper, we use time series analysis to evaluate predictive scenarios using search engine transactional logs. Our goal is to develop models for the analysis of searchers’ behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine transactional log and time series analysis to investigate users’ actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and transactional queries, while rates for transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research.
Resumo:
The recent focus on literacy in Social Studies has been on linguistic design, particularly that related to the grammar of written and spoken text. When students are expected to produce complex hybridized genres such as timelines, a focus on the teaching and learning of linguistic design is necessary but not sufficient to complete the task. Theorizations of new literacies identify five interrelated meaning making designs for text deconstruction and reproduction: linguistic, spatial, visual, gestural, and audio design. Honing in on the complexity of timelines, this paper casts a lens on the linguistic, visual, spatial, and gestural designs of three pairs of primary school aged Social Studies learners. Drawing on a functional metalanguage, we analyze the linguistic, visual, spatial, and gestural designs of their work. We also offer suggestions of their effect, and from there consider the importance of explicit instruction in text design choices for this Social Studies task. We conclude the analysis by suggesting the foci of explicit instruction for future lessons.
Resumo:
Here we search for evidence of the existence of a sub-chondritic 142Nd/144Nd reservoir that balances the Nd isotope chemistry of the Earth relative to chondrites. If present, it may reside in the source region of deeply sourced mantle plume material. We suggest that lavas from Hawai’i with coupled elevations in 186Os/188Os and 187Os/188Os, from Iceland that represent mixing of upper mantle and lower mantle components, and from Gough with sub-chondritic 143Nd/144Nd and high 207Pb/206Pb, are favorable samples that could reflect mantle sources that have interacted with an Early-Enriched Reservoir (EER) with sub-chondritic 142Nd/144Nd. High-precision Nd isotope analyses of basalts from Hawai’i, Iceland and Gough demonstrate no discernable 142Nd/144Nd deviation from terrestrial standards. These data are consistent with previous high-precision Nd isotope analysis of recent mantle-derived samples and demonstrate that no mantle-derived material to date provides evidence for the existence of an EER in the mantle. We then evaluate mass balance in the Earth with respect to both 142Nd/144Nd and 143Nd/144Nd. The Nd isotope systematics of EERs are modeled for different sizes and timing of formation relative to ε143Nd estimates of the reservoirs in the μ142Nd = 0 Earth, where μ142Nd is ((measured 142Nd/144Nd/terrestrial standard 142Nd/144Nd)−1 * 10−6) and the μ142Nd = 0 Earth is the proportion of the silicate Earth with 142Nd/144Nd indistinguishable from the terrestrial standard. The models indicate that it is not possible to balance the Earth with respect to both 142Nd/144Nd and 143Nd/144Nd unless the μ142Nd = 0 Earth has a ε143Nd within error of the present-day Depleted Mid-ocean ridge basalt Mantle source (DMM). The 4567 Myr age 142Nd–143Nd isochron for the Earth intersects μ142Nd = 0 at ε143Nd of +8 ± 2 providing a minimum ε143Nd for the μ142Nd = 0 Earth. The high ε143Nd of the μ142Nd = 0 Earth is confirmed by the Nd isotope systematics of Archean mantle-derived rocks that consistently have positive ε143Nd. If the EER formed early after solar system formation (0–70 Ma) continental crust and DMM can be complementary reservoirs with respect to Nd isotopes, with no requirement for significant additional reservoirs. If the EER formed after 70 Ma then the μ142Nd = 0 Earth must have a bulk ε143Nd more radiogenic than DMM and additional high ε143Nd material is required to balance the Nd isotope systematics of the Earth.
Resumo:
This paper reports preliminary results from a study modeling the interplay between multitasking, cognitive coordination, and cognitive shifts during Web search. Study participants conducted three Web searches on personal information problems. Data collection techniques included pre- and post-search questionnaires; think-aloud protocols, Web search logs, observation, and post-search interviews. Key findings include: (1) users Web searches included multitasking, cognitive shifting and cognitive coordination processes, (2) cognitive coordination is the hinge linking multitasking and cognitive shifting that enables Web search construction, (3) cognitive shift levels determine the process of cognitive coordination, and (4) cognitive coordination is interplay of task, mechanism and strategy levels that underpin multitasking and task switching. An initial model depicts the interplay between multitasking, cognitive coordination, and cognitive shifts during Web search. Implications of the findings and further research are also discussed.
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
Over the years, people have often held the hypothesis that negative feedback should be very useful for largely improving the performance of information filtering systems; however, we have not obtained very effective models to support this hypothesis. This paper, proposes an effective model that use negative relevance feedback based on a pattern mining approach to improve extracted features. This study focuses on two main issues of using negative relevance feedback: the selection of constructive negative examples to reduce the space of negative examples; and the revision of existing features based on the selected negative examples. The former selects some offender documents, where offender documents are negative documents that are most likely to be classified in the positive group. The later groups the extracted features into three groups: the positive specific category, general category and negative specific category to easily update the weight. An iterative algorithm is also proposed to implement this approach on RCV1 data collections, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
The issue of what an effective high quality / high equity education system might look like remains contested. Indeed there is more educational commentary on those systems that do not achieve this goal (see for example Luke & Woods, 2009 for a detailed review of the No Child Left Behind policy initiatives put forward in the United States under the Bush Administration) than there is detailed consideration of what such a system might enact and represent. A long held critique of socio cultural and critical perspectives in education has been their focus on deconstruction to the supposed detriment of reconstructive work. This critique is less warranted in recent times based on work in the field, especially the plethora of qualitative research focusing on case studies of ‘best practice’. However it certainly remains the case that there is more work to be done in investigating the characteristics of a socially just system. This issue of Point and Counterpoint aims to progress such a discussion. Several of the authors call for a reconfiguration of the use of large scale comparative assessment measures and all suggest new ways of thinking about quality and equity for school systems. Each of the papers tackles different aspects of the problematic of how to achieve high equity without compromising quality within a large education system. They each take a reconstructive focus, highlighting ways forward for education systems in Australia and beyond. While each paper investigates different aspects of the issue, the clearly stated objective of seeking to delineate and articulate characteristics of socially just education is consistent throughout the issue.
Resumo:
We argue that web service discovery technology should help the user navigate a complex problem space by providing suggestions for services which they may not be able to formulate themselves as (s)he lacks the epistemic resources to do so. Free text documents in service environments provide an untapped source of information for augmenting the epistemic state of the user and hence their ability to search effectively for services. A quantitative approach to semantic knowledge representation is adopted in the form of semantic space models computed from these free text documents. Knowledge of the user’s agenda is promoted by associational inferences computed from the semantic space. The inferences are suggestive and aim to promote human abductive reasoning to guide the user from fuzzy search goals into a better understanding of the problem space surrounding the given agenda. Experimental results are discussed based on a complex and realistic planning activity.
Resumo:
Objective: To systematically review the published evidence of the impact of health information technology (HIT) on the quality of medical and health care specifically clinicians’ adherence to evidence-based guidelines and the corresponding impact this had on patient clinical outcomes. In order to be as inclusive as possible the research examined literature discussing the use of health information technologies and systems in both medical care such as clinical and surgical, and other health care such as allied health and preventive services.----- Design: Systematic review----- Data Sources: Relevant literature was systematically searched on English language studies indexed in MEDLINE and CINAHL(1998 to 2008), Cochrane Library, PubMed, Database of Abstracts of Review of Effectiveness (DARE), Google scholar and other relevant electronic databases. A search for eligible studies (matching the inclusion criteria) was also performed by searching relevant conference proceedings available through internet and electronic databases, as well as using reference lists identified from cited papers.----- Selection criteria: Studies were included in the review if they examined the impact of Electronic Health Record (EHR), Computerised Provider Order-Entry (CPOE), or Decision Support System (DS); and if the primary outcomes of the studies were focused on the level of compliance with evidence-based guidelines among clinicians. Measures could be either changes in clinical processes resulting from a change of the providers’ behaviour or specific patient outcomes that demonstrated the effectiveness of a particular treatment given by providers. ----- Methods: Studies were reviewed and summarised in tabular and text form. Due to heterogeneity between studies, meta-analysis was not performed.----- Results: Out of 17 studies that assessed the impact of health information technology on health care practitioners’ performance, 14 studies revealed a positive improvement in relation to their compliance with evidence-based guidelines. The primary domain of improvement was evident from preventive care and drug ordering studies. Results from the studies that included an assessment for patient outcomes however, were insufficient to detect either clinically or statistically important improvements as only a small proportion of these studies found benefits. For instance, only 3 studies had shown positive improvement, while 5 studies revealed either no change or adverse outcomes.----- Conclusion: Although the number of included studies was relatively small for reaching a conclusive statement about the effectiveness of health information technologies and systems on clinical care, the results demonstrated consistency with other systematic reviews previously undertaken. Widescale use of HIT has been shown to increase clinician’s adherence to guidelines in this review. Therefore, it presents ongoing opportunities to maximise the uptake of research evidence into practice for health care organisations, policy makers and stakeholders.
Resumo:
Traffic safety is a major concern world-wide. It is in both the sociological and economic interests of society that attempts should be made to identify the major and multiple contributory factors to those road crashes. This paper presents a text mining based method to better understand the contextual relationships inherent in road crashes. By examining and analyzing the crash report data in Queensland from year 2004 and year 2005, this paper identifies and reports the major and multiple contributory factors to those crashes. The outcome of this study will support road asset management in reducing road crashes.
Resumo:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.