879 resultados para Ontologies (Information Retrieval)
Resumo:
As more and more information is available on the Web finding quality and reliable information is becoming harder. To help solve this problem, Web search models need to incorporate users’ cognitive styles. This paper reports the preliminary results from a user study exploring the relationships between Web users’ searching behavior and their cognitive style. The data was collected using a questionnaire, Web search logs and think-aloud strategy. The preliminary findings reveal a number of cognitive factors, such as information searching processes, results evaluations and cognitive style, having an influence on users’ Web searching behavior. Among these factors, the cognitive style of the user was observed to have a greater impact. Based on the key findings, a conceptual model of Web searching and cognitive styles is presented.
Resumo:
User-Web interactions have emerged as an important research in the field of information science. In this study, we examine extensively the Web searching performed by general users. Our goal is to investigate the effects of users’ cognitive styles on their Web search behavior in relation to two broad components: Information Searching and Information Processing Approaches. We use questionnaires, a measure of cognitive style, Web session logs and think-aloud as the data collection instruments. Our study findings show wholistic Web users tend to adopt a top-down approach to Web searching, where the users searched for a generic topic, and then reformulate their queries to search for specific information. They tend to prefer reading to process information. Analytic users tend to prefer a bottom-up approach to information searching and they process information by scanning search result pages.
Resumo:
In computational linguistics, information retrieval and applied cognition, words and concepts are often represented as vectors in high dimensional spaces computed from a corpus of text. These high dimensional spaces are often referred to as Semantic Spaces. We describe a novel and efficient approach to computing these semantic spaces via the use of complex valued vector representations. We report on the practical implementation of the proposed method and some associated experiments. We also briefly discuss how the proposed system relates to previous theoretical work in Information Retrieval and Quantum Mechanics and how the notions of probability, logic and geometry are integrated within a single Hilbert space representation. In this sense the proposed system has more general application and gives rise to a variety of opportunities for future research.
Resumo:
The Australian National Data Service (ANDS) was established in 2008 and aims to: influence national policy in the area of data management in the Australian research community; inform best practice for the curation of data, and, transform the disparate collections of research data around Australia into a cohesive collection of research resources One high profile ANDS activity is to establish the population of Research Data Australia, a set of web pages describing data collections produced by or relevant to Australian researchers. It is designed to promote visibility of research data collections in search engines, in order to encourage their re-use. As part of activities associated with the Australian National Data Service, an increasing number of Australian Universities are choosing to implement VIVO, not as a platform to profile information about researchers, but as a 'metadata store' platform to profile information about institutional research data sets, both locally and as part of a national data commons. To date, the University of Melbourne, Griffith University, the Queensland University of Technology, and the University of Western Australia have all chosen to implement VIVO, with interest from other Universities growing.
Resumo:
INTRODUCTION: Since the introduction of its QUT ePrints institutional repository of published research outputs, together with the world’s first mandate for author contributions to an institutional repository, Queensland University of Technology (QUT) has been a leader in support of green road open access. With QUT ePrints providing our mechanism for supporting the green road to open access, QUT has since then also continued to expand its secondary open access strategy supporting gold road open access, which is also designed to assist QUT researchers to maximise the accessibility and so impact of their research. ---------- METHODS: QUT Library has adopted the position of selectively supporting true gold road open access publishing by using the Library Resource Allocation budget to pay the author publication fees for QUT authors wishing to publish in the open access journals of a range of publishers including BioMed Central, Public Library of Science and Hindawi. QUT Library has been careful to support only true open access publishers and not those open access publishers with hybrid models which “double dip” by charging authors publication fees and libraries subscription fees for the same journal content. QUT Library has maintained a watch on the growing number of open access journals available from gold road open access publishers and their increased rate of success as measured by publication impact. ---------- RESULTS: This paper reports on the successes and challenges of QUT’s efforts to support true gold road open access publishers and promote these publishing strategy options to researchers at QUT. The number and spread of QUT papers submitted and published in the journals of each publisher is provided. Citation counts for papers and authors are also presented and analysed, with the intention of identifying the benefits to accessibility and research impact for early career and established researchers.---------- CONCLUSIONS: QUT Library is eager to continue and further develop support for this publishing strategy, and makes a number of recommendations to other research institutions, on how they can best achieve success with this strategy.
Resumo:
For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.
Resumo:
Many researchers have investigated and modelled aspects of Web searching. A number of studies have explored the relationships between individual differences and Web searching. However, limited studies have explored the role of users’ cognitive styles in determining Web searching behaviour. Current models of Web searching have limited consideration of users’ cognitive styles. The impact of users’ cognitive style on Web searching and their relationships are little understood or represented. Individuals differ in their information processing approaches and the way they represent information, thus affecting their performance. To create better models of Web searching we need to understand more about user’s cognitive style and their Web search behaviour, and the relationship between them. More rigorous research is needed in using more complex and meaningful measures of relevance; across a range of different types of search tasks and different populations of Internet users. The project further explores the relationships between the users’ cognitive style and their Web searching. The project will develop a model depicting the relationships between a user’s cognitive style and their Web searching. The related literature, aims and objectives and research design are discussed.
Resumo:
Special collections, because of the issues associated with conservation and use, a feature they share with archives, tend to be the most digitized areas in libraries. The Nineteenth Century Schoolbooks collection is a collection of 9000 rarely held nineteenth-century schoolbooks that were painstakingly collected over a lifetime of work by Prof. John A. Nietz, and donated to the Hillman Library at the University of Pittsburgh in 1958, which has since grown to 15,000. About 140 of these texts are completely digitized and showcased in a publicly accessible website through the University of Pittsburgh’s Library, along with a searchable bibliography of the entire collection, which expanded the awareness of this collection and its user base to beyond the academic community. The URL for the website is http://digital.library.pitt.edu/nietz/. The collection is a rich resource for researchers studying the intellectual, educational, and textbook publishing history of the United States. In this study, we examined several existing records collected by the Digital Research Library at the University of Pittsburgh in order to determine the identity and searching behaviors of the users of this collection. Some of the records examined include: 1) The results of a 3-month long user survey, 2) User access statistics including search queries for a period of one year, a year after the digitized collection became publicly available in 2001, and 3) E-mail input received by the website over 4 years from 2000-2004. The results of the study demonstrate the differences in online retrieval strategies used by academic researchers and historians, archivists, avocationists, and the general public, and the importance of facilitating the discovery of digitized special collections through the use of electronic finding aids and an interactive interface with detailed metadata.
Resumo:
The INEX 2010 Focused Relevance Feedback track offered a refined approach to the evaluation of Focused Relevance Feedback algorithms through simulated exhaustive user feedback. As in traditional approaches we simulated a user-in-the loop by re-using the assessments of ad-hoc retrieval obtained from real users who assess focused ad-hoc retrieval submissions. The evaluation was extended in several ways: the use of exhaustive relevance feedback over entire runs; the evaluation of focused retrieval where both the retrieval results and the feedback are focused; the evaluation was performed over a closed set of documents and complete focused assessments; the evaluation was performed over executable implementations of relevance feedback algorithms; and �finally, the entire evaluation platform is reusable. We present the evaluation methodology, its implementation, and experimental results obtained for nine submissions from three participating organisations.
Resumo:
As organizations reach to higher levels of business process management maturity, they often find themselves maintaining repositories of hundreds or even thousands of process models, representing valuable knowledge about their operations. Over time, process model repositories tend to accumulate duplicate fragments (also called clones) as new process models are created or extended by copying and merging fragments from other models. This calls for methods to detect clones in process models, so that these clones can be refactored as separate subprocesses in order to improve maintainability. This paper presents an indexing structure to support the fast detection of clones in large process model repositories. The proposed index is based on a novel combination of a method for process model decomposition (specifically the Refined Process Structure Tree), with established graph canonization and string matching techniques. Experiments show that the algorithm scales to repositories with hundreds of models. The experimental results also show that a significant number of non-trivial clones can be found in process model repositories taken from industrial practice.
Resumo:
This work proposes to improve spoken term detection (STD) accuracy by optimising the Figure of Merit (FOM). In this article, the index takes the form of phonetic posterior-feature matrix. Accuracy is improved by formulating STD as a discriminative training problem and directly optimising the FOM, through its use as an objective function to train a transformation of the index. The outcome of indexing is then a matrix of enhanced posterior-features that are directly tailored for the STD task. The technique is shown to improve the FOM by up to 13% on held-out data. Additional analysis explores the effect of the technique on phone recognition accuracy, examines the actual values of the learned transform, and demonstrates that using an extended training data set results in further improvement in the FOM.
Resumo:
The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.
Resumo:
QUT Library’s model of learning support brings together academic literacy (study skills) and information literacy (research skills). The blended portfolio enables holistic planning and development, seamless services, connected learning resources and more authentic curriculum-embedded education. The model reinforces the Library’s strategic focus on learning service innovation and active engagement in teaching and learning. ----- ----- ----- The online learning strategy is a critical component of the broader literacies framework. This strategy unifies new and existing online resources (e.g.: Pilot, QUT cite|write and IFN001|AIRS Online) to augment learner capability. Across the suite, prudent application of emerging technologies with visual communications and learning design delivers a wide range of adaptive study tools. Separately and together, these resources meet the learning needs and styles of a diverse cohort providing positive and individual learning opportunities. Deliberate articulation with strategic directions regarding First Year Experience, assessment, retention and curriculum alignment assures that the Library’s initiatives move in step with institutional objectives relating to enhancing the student experience and flexible blended learning. ----- ----- ----- The release of Studywell in 2010 emphasises the continuing commitment to blended literacy education. Targeting undergraduate learners (particularly 1st year/transition), this online environment provides 24/7 access to practical study and research tools. Studywell’s design and application of technology creates a “discovery infrastructure” [1] which facilitates greater self-directed learning and interaction with content. ----- ----- ----- This paper presents QUT Library’s online learning strategy within the context of the parent “integrated literacies” framework. Highlighting the key online learning resources, the paper describes the inter-relationships between those resources to develop complementary literacies. The paper details broad aspects of the overarching learning and study support framework as well as the online strategy, including strategic positioning, quality and evaluation processes, maintenance, development, implementation, and client engagement and satisfaction with the learning resources.
Resumo:
In vector space based approaches to natural language processing, similarity is commonly measured by taking the angle between two vectors representing words or documents in a semantic space. This is natural from a mathematical point of view, as the angle between unit vectors is, up to constant scaling, the only unitarily invariant metric on the unit sphere. However, similarity judgement tasks reveal that human subjects fail to produce data which satisfies the symmetry and triangle inequality requirements for a metric space. A possible conclusion, reached in particular by Tversky et al., is that some of the most basic assumptions of geometric models are unwarranted in the case of psychological similarity, a result which would impose strong limits on the validity and applicability vector space based (and hence also quantum inspired) approaches to the modelling of cognitive processes. This paper proposes a resolution to this fundamental criticism of of the applicability of vector space models of cognition. We argue that pairs of words imply a context which in turn induces a point of view, allowing a subject to estimate semantic similarity. Context is here introduced as a point of view vector (POVV) and the expected similarity is derived as a measure over the POVV's. Different pairs of words will invoke different contexts and different POVV's. Hence the triangle inequality ceases to be a valid constraint on the angles. We test the proposal on a few triples of words and outline further research.