283 resultados para Compressed text search
Resumo:
WHAT if you lost someone you loved? What if you had to let go for the sake of your own sanity? Lachlan Philpott's Colder and Dennis Kelly's Orphans, playing as part of La Boite's and Queensland Theatre Company's independents programs, are emotionally and textually dense theatrical works...
Resumo:
Electronic services are a leitmotif in ‘hot’ topics like Software as a Service, Service Oriented Architecture (SOA), Service oriented Computing, Cloud Computing, application markets and smart devices. We propose to consider these in what has been termed the Service Ecosystem (SES). The SES encompasses all levels of electronic services and their interaction, with human consumption and initiation on its periphery in much the same way the ‘Web’ describes a plethora of technologies that eventuate to connect information and expose it to humans. Presently, the SES is heterogeneous, fragmented and confined to semi-closed systems. A key issue hampering the emergence of an integrated SES is Service Discovery (SD). A SES will be dynamic with areas of structured and unstructured information within which service providers and ‘lay’ human consumers interact; until now the two are disjointed, e.g., SOA-enabled organisations, industries and domains are choreographed by domain experts or ‘hard-wired’ to smart device application markets and web applications. In a SES, services are accessible, comparable and exchangeable to human consumers closing the gap to the providers. This requires a new SD with which humans can discover services transparently and effectively without special knowledge or training. We propose two modes of discovery, directed search following an agenda and explorative search, which speculatively expands knowledge of an area of interest by means of categories. Inspired by conceptual space theory from cognitive science, we propose to implement the modes of discovery using concepts to map a lay consumer’s service need to terminologically sophisticated descriptions of services. To this end, we reframe SD as an information retrieval task on the information attached to services, such as, descriptions, reviews, documentation and web sites - the Service Information Shadow. The Semantic Space model transforms the shadow's unstructured semantic information into a geometric, concept-like representation. We introduce an improved and extended Semantic Space including categorization calling it the Semantic Service Discovery model. We evaluate our model with a highly relevant, service related corpus simulating a Service Information Shadow including manually constructed complex service agendas, as well as manual groupings of services. We compare our model against state-of-the-art information retrieval systems and clustering algorithms. By means of an extensive series of empirical evaluations, we establish optimal parameter settings for the semantic space model. The evaluations demonstrate the model’s effectiveness for SD in terms of retrieval precision over state-of-the-art information retrieval models (directed search) and the meaningful, automatic categorization of service related information, which shows potential to form the basis of a useful, cognitively motivated map of the SES for exploratory search.
Resumo:
The development of text classification techniques has been largely promoted in the past decade due to the increasing availability and widespread use of digital documents. Usually, the performance of text classification relies on the quality of categories and the accuracy of classifiers learned from samples. When training samples are unavailable or categories are unqualified, text classification performance would be degraded. In this paper, we propose an unsupervised multi-label text classification method to classify documents using a large set of categories stored in a world ontology. The approach has been promisingly evaluated by compared with typical text classification methods, using a real-world document collection and based on the ground truth encoded by human experts.
Resumo:
Background This paper presents a novel approach to searching electronic medical records that is based on concept matching rather than keyword matching. Aim The concept-based approach is intended to overcome specific challenges we identified in searching medical records. Method Queries and documents were transformed from their term-based originals into medical concepts as defined by the SNOMED-CT ontology. Results Evaluation on a real-world collection of medical records showed our concept-based approach outperformed a keyword baseline by 25% in Mean Average Precision. Conclusion The concept-based approach provides a framework for further development of inference based search systems for dealing with medical data.
Resumo:
It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.
Resumo:
Purpose – This paper seeks to look at youth justice (YJ) personnel training and education and the recommendations about it made in Time for a Fresh Start. Design/methodology/approach – The pedagogic tensions that currently shape YJ training are described – particularly those around the question of instructionalism vs education and what “specialist” means in the context of YJ. Findings – The paper suggests that the authors of Time for a Fresh Start missed the opportunity to better serve the public and young people's interests by neither acknowledging the pedagogic tensions nor articulating what a “specialist” “YJ” professional training can mean in twenty-first century England and Wales. Originality/value – The paper highlights an urgent need for an open debate between academics, practitioners and policy makers about YJ pedagogy.
Resumo:
Queensland University of Technology (QUT) was one of the first universities in Australia to establish an institutional repository. Launched in November 2003, the repository (QUT ePrints) uses the EPrints open source repository software (from Southampton) and has enjoyed the benefit of an institutional deposit mandate since January 2004. Currently (April 2012), the repository holds over 36,000 records, including 17,909 open access publications with another 2,434 publications embargoed but with mediated access enabled via the ‘Request a copy’ button which is a feature of the EPrints software. At QUT, the repository is managed by the library.QUT ePrints (http://eprints.qut.edu.au) The repository is embedded into a number of other systems at QUT including the staff profile system and the University’s research information system. It has also been integrated into a number of critical processes related to Government reporting and research assessment. Internally, senior research administrators often look to the repository for information to assist with decision-making and planning. While some statistics could be drawn from the advanced search feature and the existing download statistics feature, they were rarely at the level of granularity or aggregation required. Getting the information from the ‘back end’ of the repository was very time-consuming for the Library staff. In 2011, the Library funded a project to enhance the range of statistics which would be available from the public interface of QUT ePrints. The repository team conducted a series of focus groups and individual interviews to identify and prioritise functionality requirements for a new statistics ‘dashboard’. The participants included a mix research administrators, early career researchers and senior researchers. The repository team identified a number of business criteria (eg extensible, support available, skills required etc) and then gave each a weighting. After considering all the known options available, five software packages (IRStats, ePrintsStats, AWStats, BIRT and Google Urchin/Analytics) were thoroughly evaluated against a list of 69 criteria to determine which would be most suitable. The evaluation revealed that IRStats was the best fit for our requirements. It was deemed capable of meeting 21 out of the 31 high priority criteria. Consequently, IRStats was implemented as the basis for QUT ePrints’ new statistics dashboards which were launched in Open Access Week, October 2011. Statistics dashboards are now available at four levels; whole-of-repository level, organisational unit level, individual author level and individual item level. The data available includes, cumulative total deposits, time series deposits, deposits by item type, % fulltexts, % open access, cumulative downloads, time series downloads, downloads by item type, author ranking, paper ranking (by downloads), downloader geographic location, domains, internal v external downloads, citation data (from Scopus and Web of Science), most popular search terms, non-search referring websites. The data is displayed in charts, maps and table format. The new statistics dashboards are a great success. Feedback received from staff and students has been very positive. Individual researchers have said that they have found the information to be very useful when compiling a track record. It is now very easy for senior administrators (including the Deputy Vice Chancellor-Research) to compare the full-text deposit rates (i.e. mandate compliance rates) across organisational units. This has led to increased ‘encouragement’ from Heads of School and Deans in relation to the provision of full-text versions.
Resumo:
From a law enforcement standpoint, the ability to search for a person matching a semantic description (i.e. 1.8m tall, red shirt, jeans) is highly desirable. While a significant research effort has focused on person re-detection (the task of identifying a previously observed individual in surveillance video), these techniques require descriptors to be built from existing image or video observations. As such, person re-detection techniques are not suited to situations where footage of the person of interest is not readily available, such as a witness reporting a recent crime. In this paper, we present a novel framework that is able to search for a person based on a semantic description. The proposed approach uses size and colour cues, and does not require a person detection routine to locate people in the scene, improving utility in crowded conditions. The proposed approach is demonstrated with a new database that will be made available to the research community, and we show that the proposed technique is able to correctly localise a person in a video based on a simple semantic description.
Resumo:
In the context of ambiguity resolution (AR) of Global Navigation Satellite Systems (GNSS), decorrelation among entries of an ambiguity vector, integer ambiguity search and ambiguity validations are three standard procedures for solving integer least-squares problems. This paper contributes to AR issues from three aspects. Firstly, the orthogonality defect is introduced as a new measure of the performance of ambiguity decorrelation methods, and compared with the decorrelation number and with the condition number which are currently used as the judging criterion to measure the correlation of ambiguity variance-covariance matrix. Numerically, the orthogonality defect demonstrates slightly better performance as a measure of the correlation between decorrelation impact and computational efficiency than the condition number measure. Secondly, the paper examines the relationship of the decorrelation number, the condition number, the orthogonality defect and the size of the ambiguity search space with the ambiguity search candidates and search nodes. The size of the ambiguity search space can be properly estimated if the ambiguity matrix is decorrelated well, which is shown to be a significant parameter in the ambiguity search progress. Thirdly, a new ambiguity resolution scheme is proposed to improve ambiguity search efficiency through the control of the size of the ambiguity search space. The new AR scheme combines the LAMBDA search and validation procedures together, which results in a much smaller size of the search space and higher computational efficiency while retaining the same AR validation outcomes. In fact, the new scheme can deal with the case there are only one candidate, while the existing search methods require at least two candidates. If there are more than one candidate, the new scheme turns to the usual ratio-test procedure. Experimental results indicate that this combined method can indeed improve ambiguity search efficiency for both the single constellation and dual constellations respectively, showing the potential for processing high dimension integer parameters in multi-GNSS environment.
Resumo:
Success of query reformulation and relevant information retrieval depends on many factors, such as users’ prior knowledge, age, gender, and cognitive styles. One of the important factors that affect a user’s query reformulation behaviour is that of the nature of the search tasks. Limited studies have examined the impact of the search task types on query reformulation behaviour while performing Web searches. This paper examines how the nature of the search tasks affects users’ query reformulation behaviour during information searching. The paper reports empirical results from a user study in which 50 participants performed a set of three Web search tasks – exploratory, factorial and abstract. Users’ interactions with search engines were logged by using a monitoring program. 872 unique search queries were classified into five query types – New, Add, Remove, Replace and Repeat. Users submitted fewer queries for the factual task, which accounted for 26%. They completed a higher number of queries (40% of the total queries) while carrying out the exploratory task. A one-way MANOVA test indicated a significant effect of search task types on users’ query reformulation behaviour. In particular, the search task types influenced the manner in which users reformulated the New and Repeat queries.
Resumo:
Search technologies are critical to enable clinical sta to rapidly and e ectively access patient information contained in free-text medical records. Medical search is challenging as terms in the query are often general but those in rel- evant documents are very speci c, leading to granularity mismatch. In this paper we propose to tackle granularity mismatch by exploiting subsumption relationships de ned in formal medical domain knowledge resources. In symbolic reasoning, a subsumption (or `is-a') relationship is a parent-child rela- tionship where one concept is a subset of another concept. Subsumed concepts are included in the retrieval function. In addition, we investigate a number of initial methods for combining weights of query concepts and those of subsumed concepts. Subsumption relationships were found to provide strong indication of relevant information; their inclusion in retrieval functions yields performance improvements. This result motivates the development of formal models of rela- tionships between medical concepts for retrieval purposes.