896 resultados para Query
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
This project is an extension of a previous CRC project (220-059-B) which developed a program for life prediction of gutters in Queensland schools. A number of sources of information on service life of metallic building components were formed into databases linked to a Case-Based Reasoning Engine which extracted relevant cases from each source. In the initial software, no attempt was made to choose between the results offered or construct a case for retention in the casebase. In this phase of the project, alternative data mining techniques will be explored and evaluated. A process for selecting a unique service life prediction for each query will also be investigated. This report summarises the initial evaluation of several data mining techniques.
Resumo:
The project has further developed two programs for the industry partners related to service life prediction and salt deposition. The program for Queensland Department of Main Roads which predicts salt deposition on different bridge structures at any point in Queensland has been further refined by looking at more variables. It was found that the height of the bridge significantly affects the salt deposition levels only when very close to the coast. However the effect of natural cleaning of salt by rainfall was incorporated into the program. The user interface allows selection of a location in Queensland, followed by a bridge component. The program then predicts the annual salt deposition rate and rates the likely severity of the environment. The service life prediction program for the Queensland Department of Public Works has been expanded to include 10 common building components, in a variety of environments. Data mining procedures have been used to develop the program and increase the usefulness of the application. A Query Based Learning System (QBLS) has been developed which is based on a data-centric model with extensions to provide support for user interaction. The program is based on number of sources of information about the service life of building components. These include the Delphi survey, the CSIRO Holistic model and a school survey. During the project, the Holistic model was modified for each building component and databases generated for the locations of all Queensland schools. Experiments were carried out to verify and provide parameters for the modelling. These included instrumentation of a downpipe, measurements on pH and chloride levels in leaf litter, EIS measurements and chromate leaching from Colorbond materials and dose tests to measure corrosion rates of new materials. A further database was also generated for inclusion in the program through a large school survey. Over 30 schools in a range of environments from tropical coastal to temperate inland were visited and the condition of the building components rated on a scale of 0-5. The data was analysed and used to calculate an average service life for each component/material combination in the environments, where sufficient examples were available.
Resumo:
This paper deals with the problem of using the data mining models in a real-world situation where the user can not provide all the inputs with which the predictive model is built. A learning system framework, Query Based Learning System (QBLS), is developed for improving the performance of the predictive models in practice where not all inputs are available for querying to the system. The automatic feature selection algorithm called Query Based Feature Selection (QBFS) is developed for selecting features to obtain a balance between the relative minimum subset of features and the relative maximum classification accuracy. Performance of the QBLS system and the QBFS algorithm is successfully demonstrated with a real-world application
Resumo:
Intuitively, any `bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distri- butions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
Resumo:
This talk proceeds from the premise that IR should engage in a more substantial dialogue with cognitive science. After all, how users decide relevance, or how they chose terms to modify a query are processes rooted in human cognition. Recently, there has been a growing literature applying quantum theory (QT) to model cognitive phenomena. This talk will survey recent research, in particular, modelling interference effects in human decision making. One aspect of QT will be illustrated - how quantum entanglement can be used to model word associations in human memory. The implications of this will be briefly discussed in terms of a new approach for modelling concept combinations. Tentative links to human adductive reasoning will also be drawn. The basic theme behind this talk is QT can potentially provide a new genre of information processing models (including search) more aligned with human cognition.
Resumo:
In this research, we aim to identify factors that significantly affect the clickthrough of Web searchers. Our underlying goal is determine more efficient methods to optimize the clickthrough rate. We devise a clickthrough metric for measuring customer satisfaction of search engine results using the number of links visited, number of queries a user submits, and rank of clicked links. We use a neural network to detect the significant influence of searching characteristics on future user clickthrough. Our results show that high occurrences of query reformulation, lengthy searching duration, longer query length, and the higher ranking of prior clicked links correlate positively with future clickthrough. We provide recommendations for leveraging these findings for improving the performance of search engine retrieval and result ranking, along with implications for search engine marketing
Resumo:
The Sydney Harbour Bridge provides an imaginative space that is revisited by Australian writers in particular ways. In this space novelists, poets, and cultural historians negotiate questions of emotional and psychological transformation as well as reflect on social and environmental change in the city of Sydney. The writerly tensions that mark these accounts often alter, or query, representations of the Bridge as a symbol of material progress and demonstrate a complex creative engagement with the Bridge. This discussion of ‘the Bridge’ focuses on the work of four authors, Eleanor Dark, P.R. Stephensen, Peter Carey and Vicki Hastrich and includes a range of other fictional and non-fictional accounts of ‘Bridge-writing.’ The ideas proffered are framed by a theorising of space, especially referencing the work of Michel de Certeau, whose writing on the spatial ambiguity of a bridge is important to the examination of the diverse ways in which Australian writers have engaged with the imaginative potential and almost mythic resonance of the Sydney Harbour Bridge.
Resumo:
Purpose – This paper aims to report findings from an exploratory study investigating the web interactions and technoliteracy of children in the early childhood years. Previous research has studied aspects of older children’s technoliteracy and web searching; however, few studies have analyzed web search data from children younger than six years of age. Design/methodology/approach – The study explored the Google web searching and technoliteracy of young children who are enrolled in a “preparatory classroom” or kindergarten (the year before young children begin compulsory schooling in Queensland, Australia). Young children were video- and audio-taped while conducting Google web searches in the classroom. The data were qualitatively analysed to understand the young children’s web search behaviour. Findings – The findings show that young children engage in complex web searches, including keyword searching and browsing, query formulation and reformulation, relevance judgments, successive searches, information multitasking and collaborative behaviours. The study results provide significant initial insights into young children’s web searching and technoliteracy. Practical implications – The use of web search engines by young children is an important research area with implications for educators and web technologies developers. Originality/value – This is the first study of young children’s interaction with a web search engine.
Resumo:
Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.
Resumo:
Recent years have seen an increased uptake of business process management technology in industries. This has resulted in organizations trying to manage large collections of business process models. One of the challenges facing these organizations concerns the retrieval of models from large business process model repositories. For example, in some cases new process models may be derived from existing models, thus finding these models and adapting them may be more effective than developing them from scratch. As process model repositories may be large, query evaluation may be time consuming. Hence, we investigate the use of indexes to speed up this evaluation process. Experiments are conducted to demonstrate that our proposal achieves a significant reduction in query evaluation time.
Resumo:
In the terminology of Logic programming, current search engines answer Sigma1 queries (formulas of the form where is a boolean combination of attributes). Such a query is determined by a particular sequence of keywords input by a user. In order to give more control to users, search engines will have to tackle more expressive queries, namely, Sigma2 queries (formulas of the form ). The purpose of the talk is to examine which directions could be explored in order to move towards more expressive languages, more powerful search engines, and the benefits that users should expect.
Resumo:
Nature Refuges encompass the second largest extent of protected area estate in Queensland. Major problems exist in the data capture, map presentation, data quality and integrity of these boundaries. The spatial accuracies/inaccuracies of the Nature Refuge administrative boundaries directly influence the ability to preserve valuable ecosystems by challenging negative environmental impacts on these properties. This research work is about supporting the Nature Refuge Programs efforts to secure Queensland’s natural and cultural values on private land by utilising GIS and its advanced functionalities. The research design organizes and enters Queensland’s Nature Refuge boundaries into a spatial environment. Survey quality data collection techniques such as the Global Positioning Systems (GPS) are investigated to capture Nature Refuge boundary information. Using the concepts of map communication GIS Cartography is utilised for the protected area plan design. New spatial datasets are generated facilitating the effectiveness of investigative data analysis. The geodatabase model developed by this study adds rich GIS behaviour providing the capability to store, query, and manipulate geographic information. It provides the ability to leverage data relationships and enforces topological integrity creating savings in customization and productivity. The final phase of the research design incorporates the advanced functions of ArcGIS. These functions facilitate building spatial system models. The geodatabase and process models developed by this research can be easily modified and the data relating to mining can be replaced by other negative environmental impacts affecting the Nature Refuges. Results of the research are presented as graphs and maps providing visual evidence supporting the usefulness of GIS as means for capturing, visualising and enhancing spatial quality and integrity of Nature Refuge boundaries.