29 resultados para Imbalanced datasets

em University of Queensland eSpace - Australia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this article, we present several novel techniques to effectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important level-two topological relations: contains, contained, overlap, and disjoint. We first present a novel framework to construct a multiscale Euler histogram in 2D space with the guarantee of the exact summarization results for aligned windows in constant time. To minimize the storage space in such a multiscale Euler histogram, an approximate algorithm with the approximate ratio 19/12 is presented, while the problem is shown NP-hard generally. To conform to a limited storage space where a multiscale histogram may be allowed to have only k Euler histograms, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy in approximately summarizing aligned windows. Then, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. We also investigate the problem of nonaligned windows and the problem of effectively partitioning the data space to support nonaligned window queries. Finally, we extend our techniques to 3D space. Our extensive experiments against both synthetic and real world datasets demonstrate that the approximate multiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost efficiency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for many popular real datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose phase diagrams for an imbalanced (unequal number of atoms or Fermi surface in two pairing hyperfine states) gas of atomic fermions near a broad Feshbach resonance using mean-field theory. Particularly, in the plane of interaction and polarization we determine the region for a mixed phase composed of normal and superfluid components. We compare our prediction of phase boundaries with the recent measurement and find a good qualitative agreement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this paper, we present several novel techniques to eectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important topological relations: contains, contained, overlap, and disjoint. We rst present a novel framework to construct a multiscale histogram composed of multiple Euler histograms with the guarantee of the exact summarization results for aligned windows in constant time. Then we present an approximate algorithm, with the approximate ratio 19/12, to minimize the storage spaces of such multiscale Euler histograms, although the problem is generally NP-hard. To conform to a limited storage space where only k Euler histograms are allowed, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy. Finally, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. Our extensive experiments against both synthetic and real world datasets demonstrated that the approximate mul- tiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost effciency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for the real datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We recently evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach in delineating breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This document records the process of migrating eprints.org data to a Fez repository. Fez is a Web-based digital repository and workflow management system based on Fedora (http://www.fedora.info/). At the time of migration, the University of Queensland Library was using EPrints 2.2.1 [pepper] for its ePrintsUQ repository. Once we began to develop Fez, we did not upgrade to later versions of eprints.org software since we knew we would be migrating data from ePrintsUQ to the Fez-based UQ eSpace. Since this document records our experiences of migration from an earlier version of eprints.org, anyone seeking to migrate eprints.org data into a Fez repository might encounter some small differences. Moving UQ publication data from an eprints.org repository into a Fez repository (hereafter called UQ eSpace (http://espace.uq.edu.au/) was part of a plan to integrate metadata (and, in some cases, full texts) about all UQ research outputs, including theses, images, multimedia and datasets, in a single repository. This tied in with the plan to identify and capture the research output of a single institution, the main task of the eScholarshipUQ testbed for the Australian Partnership for Sustainable Repositories project (http://www.apsr.edu.au/). The migration could not occur at UQ until the functionality in Fez was at least equal to that of the existing ePrintsUQ repository. Accordingly, as Fez development occurred throughout 2006, a list of eprints.org functionality not currently supported in Fez was created so that programming of such development could be planned for and implemented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We previously evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach to delineate breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Geographic Information System (GIS) was used to model datasets of Leyte Island, the Philippines, to identify land which was suitable for a forest extension program on the island. The datasets were modelled to provide maps of the distance of land from cities and towns, land which was a suitable elevation and slope for smallholder forestry and land of various soil types. An expert group was used to assign numeric site suitabilities to the soil types and maps of site suitability were used to assist the selection of municipalities for the provision of extension assistance to smallholders. Modelling of the datasets was facilitated by recent developments of the ArcGIS® suite of computer programs and derivation of elevation and slope was assisted by the availability of digital elevation models (DEM) produced by the Shuttle Radar Topography (SRTM) mission. The usefulness of GIS software as a decision support tool for small-scale forestry extension programs is discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The area of private land suitable and available for growing hoop pine (Araucaria cunninghamii) on the Atherton Tablelands in North Queensland was modelled using a geographic information system (GIS). In Atherton, Eacham and Herberton shires, approximately 64,700 ha of privately owned land were identified as having a mean annual rainfall and soil type similar to Forestry Plantations Queensland (FPQ) hoop pine growth plots with an approximate growth rate of 20 m3 per annum. Land with slope of over 25° and land covered with native vegetation were excluded in the estimation. If land which is currently used for high-value agriculture is also excluded, the net area of land potentially suitable and available for expansion of hoop pine plantations is approximately 22,900 ha. Expert silvicultural advice emphasized the role of site preparation and weed control in affecting the long-term growth rate of hoop pine. Hence, sites with less than optimal fertility and rainfall may be considered as being potentially suitable for growing hoop pine at a lower growth rate. The datasets had been prepared at various scales and differing precision for their description of land attributes. Therefore, the results of this investigation have limited applicability for planning at the individual farm level but are useful at the regional level to target areas for plantation expansion.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to examine whether different populations show the same pattern of onset in the Southern Hemisphere, we examined the age-at-first-admission distribution for schizophrenia based on mental health registers from Australia and Brazil. Data on age-at-first-admission for individuals with schizophrenia were extracted from two names-linked registers, (1) the Queensland Mental Health Statistics System, Australia (N=7651, F= 3293, M=4358), and (2) a psychiatric hospital register in Pelotas, Brazil (N=4428, F=2220, M=2208). Age distributions were derived for males and females for both datasets. The general population structure tbr both countries was also obtained. There were significantly more males in the Queensland dataset (gz = 56.9, df3, p < 0.0001 ). Both dataset distributions were skewed to the right. Onset rose steeply after puberty to reach a modal age group of 20-29 for men and women, with a more gradual tail toward the older age groups. In Queensland 68% of women with schizophrenia had their first admissions after age 30, while the proportion from Brazil was 58%. Compared to the Australian dataset, the Brazilian dataset had a slightly greater proportion of first admissions under the age 30 and a slightly smaller proportion over the age of 60 years. This reflects the underlying age distributions of the two populations. This study confirms the wide age range and gender differences in age-at-first-admission distributions for schizophrenia and identified a significant difference in the gender ratio between the two datasets. Given widely differing health services, cultural practices, ethic variability, and the different underlying population distributions, the age-at-first-admission in Queensland and Brazil showed more similarities than differences. Acknowledgments: The Stanley Foundation supported this project.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We re-mapped the soils of the Murray-Darling Basin (MDB) in 1995-1998 with a minimum of new fieldwork, making the most out of existing data. We collated existing digital soil maps and used inductive spatial modelling to predict soil types from those maps combined with environmental predictor variables. Lithology, Landsat Multi Spectral Scanner (Landsat MSS), the 9-s digital elevation model (DEM) of Australia and derived terrain attributes, all gridded to 250-m pixels, were the predictor variables. Because the basin-wide datasets were very large data mining software was used for modelling. Rule induction by data mining was also used to define the spatial domain of extrapolation for the extension of soil-landscape models from existing soil maps. Procedures to estimate the uncertainty associated with the predictions and quality of information for the new soil-landforms map of the MDB are described. (C) 2002 Elsevier Science B.V. All rights reserved.