31 resultados para Large data repositories
em University of Queensland eSpace - Australia
Resumo:
Over recent years databases have become an extremely important resource for biomedical research. Immunology research is increasingly dependent on access to extensive biological databases to extract existing information, plan experiments, and analyse experimental results. This review describes 15 immunological databases that have appeared over the last 30 years. In addition, important issues regarding database design and the potential for misuse of information contained within these databases are discussed. Access pointers are provided for the major immunological databases and also for a number of other immunological resources accessible over the World Wide Web (WWW). (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.
Resumo:
We re-mapped the soils of the Murray-Darling Basin (MDB) in 1995-1998 with a minimum of new fieldwork, making the most out of existing data. We collated existing digital soil maps and used inductive spatial modelling to predict soil types from those maps combined with environmental predictor variables. Lithology, Landsat Multi Spectral Scanner (Landsat MSS), the 9-s digital elevation model (DEM) of Australia and derived terrain attributes, all gridded to 250-m pixels, were the predictor variables. Because the basin-wide datasets were very large data mining software was used for modelling. Rule induction by data mining was also used to define the spatial domain of extrapolation for the extension of soil-landscape models from existing soil maps. Procedures to estimate the uncertainty associated with the predictions and quality of information for the new soil-landforms map of the MDB are described. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.
Resumo:
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.
Resumo:
This paper provides information on the experimental set-up, data collection methods and results to date for the project Large scale modelling of coarse grained beaches, undertaken at the Large Wave Channel (GWK) of FZK in Hannover by an international group of researchers in Spring 2002. The main objective of the experiments was to provide full scale measurements of cross-shore processes on gravel and mixed beaches for the verification and further development of cross-shore numerical models of gravel and mixed sediment beaches. Identical random and regular wave tests were undertaken for a gravel beach and a mixed sand/gravel beach set up in the flume. Measurements included profile development, water surface elevation along the flume, internal pressures in the swash zone, piezometric head levels within the beach, run-up, flow velocities in the surf-zone and sediment size distributions. The purpose of the paper is to present to the scientific community the experimental procedure, a summary of the data collected, some initial results, as well as a brief outline of the on-going research being carried out with the data by different research groups. The experimental data is available to all the scientific community following submission of a statement of objectives, specification of data requirements and an agreement to abide with the GWK and EU protocols. (C) 2005 Elsevier B.V. All rights reserved.
Resumo:
This document records the process of migrating eprints.org data to a Fez repository. Fez is a Web-based digital repository and workflow management system based on Fedora (http://www.fedora.info/). At the time of migration, the University of Queensland Library was using EPrints 2.2.1 [pepper] for its ePrintsUQ repository. Once we began to develop Fez, we did not upgrade to later versions of eprints.org software since we knew we would be migrating data from ePrintsUQ to the Fez-based UQ eSpace. Since this document records our experiences of migration from an earlier version of eprints.org, anyone seeking to migrate eprints.org data into a Fez repository might encounter some small differences. Moving UQ publication data from an eprints.org repository into a Fez repository (hereafter called UQ eSpace (http://espace.uq.edu.au/) was part of a plan to integrate metadata (and, in some cases, full texts) about all UQ research outputs, including theses, images, multimedia and datasets, in a single repository. This tied in with the plan to identify and capture the research output of a single institution, the main task of the eScholarshipUQ testbed for the Australian Partnership for Sustainable Repositories project (http://www.apsr.edu.au/). The migration could not occur at UQ until the functionality in Fez was at least equal to that of the existing ePrintsUQ repository. Accordingly, as Fez development occurred throughout 2006, a list of eprints.org functionality not currently supported in Fez was created so that programming of such development could be planned for and implemented.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
The present study was designed to test the utility of a stress-coping model of employee adjustment to organisational change. Specifically, it was proposed that employee adjustment to this type of work stress would be influenced by the characteristics of the change situation, employees' appraisals of the situation, their coping strategies, and the extent of their personal resources. Data were collected from 140 middle managers and supervisors involved in a large-scale public sector integration. The results of the research provided some support for the proposed model: high levels of psychological distress were related to a reliance on informal sources of information, high appraised stress, low appraised certainty, and the use of avoidant rather than problem-focused strategies, whereas poor social functioning was associated with low self-esteem, high levels or disruption across the period of change, a reliance on informal sources of information, and the use of avoidant coping strategies. There was no evidence that coping strategies mediated the effects of the event characteristics, situational appraisals, and personal resources on adjustment; however, there was some evidence linking these variables to coping strategies, in particular, problem-focused coping. There was also some evidence to indicate that the experience of organisational change was different for managers and supervisors: levels of threat were higher for the managers than the supervisors, but there was no difference between the groups of employees in terms of adjustment.
Resumo:
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.
Resumo:
Physiological and kinematic data were collected from elite under-19 rugby union players to provide a greater understanding of the physical demands of rugby union. Heart rate, blood lactate and time-motion analysis data were collected from 24 players (mean +/- s((x) over bar): body mass 88.7 +/- 9.9 kg, height 185 +/- 7 cm, age 18.4 +/- 0.5 years) during six competitive premiership fixtures. Six players were chosen at random from each of four groups: props and locks, back row forwards, inside backs, outside backs. Heart rate records were classified based on percent time spent in four zones (>95%, 85-95%, 75-84%, <75% HRmax). Blood lactate concentration was measured periodically throughout each match, with movements being classified as standing, walking, jogging, cruising, sprinting, utility, rucking/mauling and scrummaging. The heart rate data indicated that props and locks (58.4%) and back row forwards (56.2%) spent significantly more time in high exertion (85-95% HRmax) than inside backs (40.5%) and outside backs (33.9%) (P < 0.001). Inside backs (36.5%) and outside backs (38.5%) spent significantly more time in moderate exertion (75-84% HRmax) than props and locks (22.6%) and back row forwards (19.8%) (P < 0.05). Outside backs (20.1%) spent significantly more time in low exertion (< 75% HRmax) than props and locks (5.8%) and back row forwards (5.6%) (P < 0.05). Mean blood lactate concentration did not differ significantly between groups (range: 4.67 mmol.l(-1) for outside backs to 7.22 mmol.l(-1) for back row forwards; P < 0.05). The motion analysis data indicated that outside backs (5750 m) covered a significantly greater total distance than either props and locks or back row forwards (4400 and 4080 m, respectively; P < 0.05). Inside backs and outside backs covered significantly greater distances walking (1740 and 1780 m, respectively; P < 0.001), in utility movements (417 and 475 m, respectively; P < 0.001) and sprinting (208 and 340 m, respectively; P < 0.001) than either props and locks or back row forwards (walking: 1000 and 991 m; utility movements: 106 and 154 m; sprinting: 72 and 94 m, respectively). Outside backs covered a significantly greater distance sprinting than inside backs (208 and 340 m, respectively; P < 0.001). Forwards maintained a higher level of exertion than backs, due to more constant motion and a large involvement in static high-intensity activities. A mean blood lactate concentration of 4.8-7.2 mmol.l(-1) indicated a need for 'lactate tolerance' training to improve hydrogen ion buffering and facilitate removal following high-intensity efforts. Furthermore, the large distances (4.2-5.6 km) covered during, and intermittent nature of, match-play indicated a need for sound aerobic conditioning in all groups (particularly backs) to minimize fatigue and facilitate recovery between high-intensity efforts.
Resumo:
With the advent of functional neuroimaging techniques, in particular functional magnetic resonance imaging (fMRI), we have gained greater insight into the neural correlates of visuospatial function. However, it may not always be easy to identify the cerebral regions most specifically associated with performance on a given task. One approach is to examine the quantitative relationships between regional activation and behavioral performance measures. In the present study, we investigated the functional neuroanatomy of two different visuospatial processing tasks, judgement of line orientation and mental rotation. Twenty-four normal participants were scanned with fMRI using blocked periodic designs for experimental task presentation. Accuracy and reaction time (RT) to each trial of both activation and baseline conditions in each experiment was recorded. Both experiments activated dorsal and ventral visual cortical areas as well as dorsolateral prefrontal cortex. More regionally specific associations with task performance were identified by estimating the association between (sinusoidal) power of functional response and mean RT to the activation condition; a permutation test based on spatial statistics was used for inference. There was significant behavioral-physiological association in right ventral extrastriate cortex for the line orientation task and in bilateral (predominantly right) superior parietal lobule for the mental rotation task. Comparable associations were not found between power of response and RT to the baseline conditions of the tasks. These data suggest that one region in a neurocognitive network may be most strongly associated with behavioral performance and this may be regarded as the computationally least efficient or rate-limiting node of the network.