201 resultados para Open source information retrieval
Resumo:
QUT Software Finder is a searchable repository of metadata describing software and source code, which has been created as a result of QUT research activities. It was launched in December 2013. https://researchdatafinder.qut.edu.au/scf The registry was designed to aid the discovery and visibility of QUT research outputs and encourage sharing and re-use of code and software throughout the research community, both nationally and internationally. The repository platform used is VIVO (an open source product initially developed at Cornell University). QUT Software Finder records that describe software or code are connected to information about researchers involved, the research groups, related publications and related projects. Links to where the software or code can be accessed from are also provided alongside licencing and re-use information.
Resumo:
In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.
Resumo:
This article presents a study of how humans perceive and judge the relevance of documents. Humans are adept at making reasonably robust and quick decisions about what information is relevant to them, despite the ever increasing complexity and volume of their surrounding information environment. The literature on document relevance has identified various dimensions of relevance (e.g., topicality, novelty, etc.), however little is understood about how these dimensions may interact. We performed a crowdsourced study of how human subjects judge two relevance dimensions in relation to document snippets retrieved from an internet search engine. The order of the judgment was controlled. For those judgments exhibiting an order effect, a q–test was performed to determine whether the order effects can be explained by a quantum decision model based on incompatible decision perspectives. Some evidence of incompatibility was found which suggests incompatible decision perspectives is appropriate for explaining interacting dimensions of relevance in such instances.
Resumo:
This paper evaluates the performance of different text recognition techniques for a mobile robot in an indoor (university campus) environment. We compared four different methods: our own approach using existing text detection methods (Minimally Stable Extremal Regions detector and Stroke Width Transform) combined with a convolutional neural network, two modes of the open source program Tesseract, and the experimental mobile app Google Goggles. The results show that a convolutional neural network combined with the Stroke Width Transform gives the best performance in correctly matched text on images with single characters whereas Google Goggles gives the best performance on images with multiple words. The dataset used for this work is released as well.
Resumo:
We used our TopSig open-source indexing and retrieval tool to produce runs for the ShARe/CLEF eHealth 2013 track. TopSig was used to produce runs using the query fields and provided discharge summaries, where appropriate. Although the improvement was not great TopSig was able to gain some benefit from utilising the discharge summaries, although the software needed to be modified to support this. This was part of a larger experiment involving determining the applicability and limits to signature-based approaches.
Resumo:
Today’s information systems log vast amounts of data. These collections of data (implicitly) describe events (e.g. placing an order or taking a blood test) and, hence, provide information on the actual execution of business processes. The analysis of such data provides an excellent starting point for business process improvement. This is the realm of process mining, an area which has provided a repertoire of many analysis techniques. Despite the impressive capabilities of existing process mining algorithms, dealing with the abundance of data recorded by contemporary systems and devices remains a challenge. Of particular importance is the capability to guide the meaningful interpretation of “oceans of data” by process analysts. To this end, insights from the field of visual analytics can be leveraged. This article proposes an approach where process states are reconstructed from event logs and visualised in succession, leading to an animated history of a process. This approach is customisable in how a process state, partially defined through a collection of activity instances, is visualised: one can select a map and specify a projection of events on this map based on the properties of the events. This paper describes a comprehensive implementation of the proposal. It was realised using the open-source process mining framework ProM. Moreover, this paper also reports on an evaluation of the approach conducted with Suncorp, one of Australia’s largest insurance companies.
Resumo:
This paper details the initial design and planning of a Field Programmable Gate Array (FPGA) implemented control system that will enable a path planner to interact with a MAVLink based flight computer. The design is aimed at small Unmanned Aircraft Vehicles (UAV) under autonomous operation which are typically subject to constraints arising from limited on-board processing capabilities, power and size. An FPGA implementation for the de- sign is chosen for its potential to address such limitations through low power and high speed in-hardware computation. The MAVLink protocol offers a low bandwidth interface for the FPGA implemented path planner to communicate with an on-board flight computer. A control system plan is presented that is capable of accepting a string of GPS waypoints generated on-board from a previously developed in- hardware Genetic Algorithm (GA) path planner and feeding them to the open source PX4 autopilot, while simultaneously respond- ing with flight status information.
Resumo:
For people with cognitive disabilities, technology is more often thought of as a support mechanism, rather than a source of division that may require intervention to equalize access across the cognitive spectrum. This paper presents a first attempt at formalizing the digital gap created by the generalization of search engines. This was achieved through the development of a mapping of cognitive abilities required by users to execute low- level tasks during a standard Web search task. The mapping demonstrates how critical these abilities are to successfully use search engines with an adequate level of independence. It will lead to a set of design guidelines for search engine interfaces that will allow for the engagement of users of all abilities, and also, more importantly, in search algorithms such as query suggestion and measure of relevance (i.e. ranking).
Resumo:
This reports a study that seeks to explore the experience of students majoring in technology and design in an undergraduate education degree. It examines their experiences in finding and using information for a practical assignment. In mapping the variation of the students' experience, the study uses a qualitative, interpretive approach to analyse the data, which was collected via one-to-one interviews. The analysis yielded five themes through which technology education students find and use information: interaction with others; experience (past and new); formal educational learning; the real world; and incidental occurrences. The intentions and strategies that form the students' approaches to finding and using information are discussed. So too are the implications for teaching practice.
Resumo:
An increasing amount of people seek health advice on the web using search engines; this poses challenging problems for current search technologies. In this paper we report an initial study of the effectiveness of current search engines in retrieving relevant information for diagnostic medical circumlocutory queries, i.e., queries that are issued by people seeking information about their health condition using a description of the symptoms they observes (e.g. hives all over body) rather than the medical term (e.g. urticaria). This type of queries frequently happens when people are unfamiliar with a domain or language and they are common among health information seekers attempting to self-diagnose or self-treat themselves. Our analysis reveals that current search engines are not equipped to effectively satisfy such information needs; this can have potential harmful outcomes on people’s health. Our results advocate for more research in developing information retrieval methods to support such complex information needs.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Preface The 9th Australasian Conference on Information Security and Privacy (ACISP 2004) was held in Sydney, 13–15 July, 2004. The conference was sponsored by the Centre for Advanced Computing – Algorithms and Cryptography (ACAC), Information and Networked Security Systems Research (INSS), Macquarie University and the Australian Computer Society. The aims of the conference are to bring together researchers and practitioners working in areas of information security and privacy from universities, industry and government sectors. The conference program covered a range of aspects including cryptography, cryptanalysis, systems and network security. The program committee accepted 41 papers from 195 submissions. The reviewing process took six weeks and each paper was carefully evaluated by at least three members of the program committee. We appreciate the hard work of the members of the program committee and external referees who gave many hours of their valuable time. Of the accepted papers, there were nine from Korea, six from Australia, five each from Japan and the USA, three each from China and Singapore, two each from Canada and Switzerland, and one each from Belgium, France, Germany, Taiwan, The Netherlands and the UK. All the authors, whether or not their papers were accepted, made valued contributions to the conference. In addition to the contributed papers, Dr Arjen Lenstra gave an invited talk, entitled Likely and Unlikely Progress in Factoring. This year the program committee introduced the Best Student Paper Award. The winner of the prize for the Best Student Paper was Yan-Cheng Chang from Harvard University for his paper Single Database Private Information Retrieval with Logarithmic Communication. We would like to thank all the people involved in organizing this conference. In particular we would like to thank members of the organizing committee for their time and efforts, Andrina Brennan, Vijayakrishnan Pasupathinathan, Hartono Kurnio, Cecily Lenton, and members from ACAC and INSS.
Resumo:
Substation Automation Systems have undergone many transformational changes triggered by improvements in technologies. Prior to the digital era, it made sense to confirm that the physical wiring matched the schematic design by meticulous and laborious point to point testing. In this way, human errors in either the design or the construction could be identified and fixed prior to entry into service. However, even though modern secondary systems today are largely computerised, we are still undertaking commissioning testing using the same philosophy as if each signal were hard wired. This is slow and tedious and doesn’t do justice to modern computer systems and software automation. One of the major architectural advantages of the IEC 61850 standard is that it “abstracts” the definition of data and services independently of any protocol allowing the mapping of them to any protocol that can meet the modelling and performance requirements. On this basis, any substation element can be defined using these common building blocks and are made available at the design, configuration and operational stages of the system. The primary advantage of accessing data using this methodology rather than the traditional position method (such as DNP 3.0) is that generic tools can be created to manipulate data. Self-describing data contains the information that these tools need to manipulate different data types correctly. More importantly, self-describing data makes the interface between programs robust and flexible. This paper proposes that the improved data definitions and methods for dealing with this data within a tightly bound and compliant IEC 61850 Substation Automation System could completely revolutionise the need to test systems when compared to traditional point to point methods. Using the outcomes of an undergraduate thesis project, we can demonstrate with some certainty that it is possible to automatically test the configuration of a protection relay by comparing the IEC 61850 configuration extracted from the relay against its SCL file for multiple relay vendors. The software tool provides a quick and automatic check that the data sets on a particular relay are correct according to its CID file, thus ensuring that no unexpected modifications are made at any stage of the commissioning process. This tool has been implemented in a Java programming environment using an open source IEC 61850 library to facilitate the server-client association with the relay.
Resumo:
This article considers the challenges posed to intellectual property law by the emerging field of bioinformatics. It examines the intellectual property strategies of established biotechnology companies, such as Celera Genomics, and information technology firms entering into the marketplace, such as IBM. First this paper argues that copyright law is not irrelevant to biotechnology, as some commentators would suggest. It claims that the use of copyright law and contract law is fundamental to the protection of biomedical and genomic databases. Second this article questions whether biotechnology companies are exclusively interested in patenting genes and genetics sequences. Recent evidence suggests that biotechnology companies and IT firms are patenting bioinformatics software and Internet business methods, as well as underlying instrumentation such as microarrays and genechips. Finally, this paper evaluates what impact the privatisation of bioinformatics will have on public research and scientific communication. It raises important questions about integration, interoperability, and the risks of monopoly. It finally considers whether open source software such as the Ensembl Project and peer to peer technology like DSAS will be able to counter this trend of privatisation.