82 resultados para Locality-Sensitive Hashing
em Queensland University of Technology - ePrints Archive
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..
Resumo:
This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents.
Resumo:
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.
Resumo:
In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.
Resumo:
Live-collected samples of four common reef building coral genera (Acropora, Pocillopora, Goniastrea, Porites) from subtidal and intertidal settings of Heron Reef, Great Barrier Reef, show extensive early marine diagenesis where parts of the coralla less than 3 years old contain abundant macro- and microborings and aragonite, high-Mg calcite, low-Mg calcite, and brucite cements. Many types of cement are associated directly with microendoliths and endobionts that inhabit parts of the corallum recently abandoned by coral polyps. The occurrence of cements that generally do not precipitate in normal shallow seawater (e.g., brucite, low-Mg calcite) highlights the importance of microenvironments in coral diagenesis. Cements precipitated in microenvironments may not reXect ambient seawater chemistry. Hence, geochemical sampling of these cements will contaminate trace-element and stable-isotope inventories used for palaeoclimate and dating analysis. Thus, great care must be taken in vetting samples for both bulk and microanalysis of geochemistry. Visual inspection using scanning electron microscopy may be required for vetting in many cases.
Resumo:
Objective: To investigate family members’ experiences of involvement in a previous study (conducted August 1995 to June 1997) following their child’s diagnosis with Ewing’s sarcoma. Design: Retrospective survey, conducted between 1 November and 30 November 1997, using a postal questionnaire. Participants: Eighty-one of 97 families who had previously completed an in-depth interview as part of a national case–control study of Ewing’s sarcoma. Main outcome measures: Participants’ views on how participation in the previous study had affected them and what motivated them to participate. Results: Most study participants indicated that taking part in the previous study had been a positive experience. Most (n = 79 [97.5%]) believed their involvement would benefit others and were glad to have participated, despite expecting and finding some parts of the interview to be painful. Parents whose child was still alive at the time of the interview recalled participation as more painful than those whose child had died before the interview. Parents who had completed the interview less than a year before our study recalled it as being more painful than those who had completed it more than a year before. Conclusions: That people suffering bereavement are generally eager to participate in research and may indeed find it a positive experience is useful information for members of ethics review boards and other “gatekeepers”, who frequently need to determine whether studies into sensitive areas should be approved. Such information may also help members of the community to make an informed decision regarding participation in such research.
Resumo:
Water quality issues are heavily dependent on land development and management decisions within river and lake catchments or watersheds. Economic benefits of urbanisation may be short‐ lived without cleaner environmental outcomes. However, whole‐of‐catchment thinking is not, as yet, as frequent a consideration in urban planning and development in China as it is in many other countries. Water is predominantly seen as a resource to be ‘owned’ by different jurisdictions and allocated to numerous users, both within a catchment and between catchments. An alternative to this approach is to think of water in the same way as other commodities that must be kept moving through a complex transport system. Water must ultimately arrive at particular destinations in the biosphere, although it travels across a broad landscape and may be held up temporarily at certain places along the way. While water extraction can be heavily controlled, water pollution is far more difficult to regulate. Both have significant impacts on water availability and flows both now and in the future. As Chinese cities strive to improve economic conditions for their citizens, new centres are being rebuilt and environmental valued