17 resultados para Distributed data access
em University of Queensland eSpace - Australia
Resumo:
In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
Resumo:
The continuous plankton recorder (CPR) survey is the largest multi-decadal plankton monitoring programme in the world. It was initiated in 1931 and by the end of 2004 had counted 207,619 samples and identified 437 phyto- and zoo-plankton taxa throughout the North Atlantic. CPR data are used extensively by the research community and in recent years have been used increasingly to underpin marine management. Here, we take a critical look at how best to use CPR data. We first describe the CPR itself, CPR sampling, and plankton counting procedures. We discuss the spatial and temporal biases in the Survey, summarise environmental data that have not previously been available, and describe the new data access policy. We supply information essential to using CPR data, including descriptions of each CPR taxonomic entity., the idiosyncrasies associated with counting many of the taxa, the logic behind taxonomic changes in the Survey, the semi-quantitative nature of CPR sampling, and recommendations on choosing the spatial and temporal scale of study. This forms the basis for a broader discussion on how to use CPR data for deriving ecologically meaningful indices based on size, functional groups and biomass that can be used to support research and management. This contribution should be useful for plankton ecologists, modellers and policy makers that actively use CPR data. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
The Continuous Plankton Recorder (CPR) survey, operated by the Sir Alister Hardy Foundation for Ocean Science (SAHFOS), is the largest plankton monitoring programme in the world and has spanned > 70 yr. The dataset contains information from -200 000 samples, with over 2.3 million records of individual taxa. Here we outline the evolution of the CPR database through changes in technology, and how this has increased data access. Recent high-impact publications and the expanded role of CPR data in marine management demonstrate the usefulness of the dataset. We argue that solely supplying data to the research community is not sufficient in the current research climate; to promote wider use, additional tools need to be developed to provide visual representation and summary statistics. We outline 2 software visualisation tools, SAHFOS WinCPR and the digital CPR Atlas, which provide access to CPR data for both researchers and non-plankton specialists. We also describe future directions of the database, data policy and the development of visualisation tools. We believe that the approach at SAHFOS to increase data accessibility and provide new visualisation tools has enhanced awareness of the data and led to the financial security of the organisation; it also provides a good model of how long-term monitoring programmes can evolve to help secure their future.
Resumo:
Systems biology is based on computational modelling and simulation of large networks of interacting components. Models may be intended to capture processes, mechanisms, components and interactions at different levels of fidelity. Input data are often large and geographically disperse, and may require the computation to be moved to the data, not vice versa. In addition, complex system-level problems require collaboration across institutions and disciplines. Grid computing can offer robust, scaleable solutions for distributed data, compute and expertise. We illustrate some of the range of computational and data requirements in systems biology with three case studies: one requiring large computation but small data (orthologue mapping in comparative genomics), a second involving complex terabyte data (the Visible Cell project) and a third that is both computationally and data-intensive (simulations at multiple temporal and spatial scales). Authentication, authorisation and audit systems are currently not well scalable and may present bottlenecks for distributed collaboration particularly where outcomes may be commercialised. Challenges remain in providing lightweight standards to facilitate the penetration of robust, scalable grid-type computing into diverse user communities to meet the evolving demands of systems biology.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
Two major factors are likely to impact the utilisation of remotely sensed data in the near future: (1)an increase in the number and availability of commercial and non-commercial image data sets with a range of spatial, spectral and temporal dimensions, and (2) increased access to image display and analysis software through GIS. A framework was developed to provide an objective approach to selecting remotely sensed data sets for specific environmental monitoring problems. Preliminary applications of the framework have provided successful approaches for monitoring disturbed and restored wetlands in southern California.
Resumo:
There is no morphological synapomorphy for the disparate digeneans, the Fellodistomidae Nicoll, 1909. Although all known life-cycles of the group include bivalves as first intermediate hosts, there is no convincing morphological synapomorphy that can be used to unite the group. Sequences from the V4 region of small subunit (18S) rRNA genes were used to infer phylogenetic relationships among 13 species of Fellodistomidae from four subfamilies and eight species from seven other digenean families: Bivesiculidae; Brachylaimidae; Bucephalidae; Gorgoderidae; Gymnophallidae; Opecoelidae; and Zoogonidae. Outgroup comparison was made initially with an aspidogastrean. Various species from the other digenean families were used as outgroups in subsequent analyses. Three methods of analysis indicated polyphyly of the Fellodistomidae and at least two independent radiations of the subfamilies, such that they were more closely associated with other digeneans than to each other. The Tandanicolinae was monophyletic (100% bootstrap support) and was weakly associated with the Gymnophallidae (< 50-55% bootstrap support). Monophyly of the Baccigerinae was supported with 78-87% bootstrap support, and monophyly of the Zoogonidae + Baccigerinae received 77-86% support. The remaining fellodistomid species, Fellodistomum fellis, F. agnotum and Coomera brayi (Fellodistominae) plus Proctoeces maculatus and Complexobursa sp. (Proctoecinae), formed a separate clade with 74-92% bootstrap support. On the basis of molecular, morphological and life-cycle evidence, the subfamilies Baccigerinae and Tandanicolinae are removed from the Fellodistomidae and promoted to familial status. The Baccigerinae is promoted under the senior synonym Faustulidae Poche, 1926, and the Echinobrevicecinae Dronen, Blend & McEachran, 1994 is synonymised with the Faustulidae. Consequently, species that were formerly in the Fellodistomidae are now distributed in three families: Fellodistomidae; Faustulidae (syn. Baccigerinae Yamaguti, 1954); and Tandanicolidae Johnston, 1927. We infer that the use of bivalves as intermediate hosts by this broad range of families indicates multiple host-switching events within the radiation of the Digenea.
Resumo:
Over recent years databases have become an extremely important resource for biomedical research. Immunology research is increasingly dependent on access to extensive biological databases to extract existing information, plan experiments, and analyse experimental results. This review describes 15 immunological databases that have appeared over the last 30 years. In addition, important issues regarding database design and the potential for misuse of information contained within these databases are discussed. Access pointers are provided for the major immunological databases and also for a number of other immunological resources accessible over the World Wide Web (WWW). (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Recent research has begun to provide support for the assumptions that memories are stored as a composite and are accessed in parallel (Tehan & Humphreys, 1998). New predictions derived from these assumptions and from the Chappell and Humphreys (1994) implementation of these assumptions were tested. In three experiments, subjects studied relatively short lists of words. Some of the Lists contained two similar targets (thief and theft) or two dissimilar targets (thief and steal) associated with the same cue (ROBBERY). AS predicted, target similarity affected performance in cued recall but not free association. Contrary to predictions, two spaced presentations of a target did not improve performance in free association. Two additional experiments confirmed and extended this finding. Several alternative explanations for the target similarity effect, which incorporate assumptions about separate representations and sequential search, are rejected. The importance of the finding that, in at least one implicit memory paradigm, repetition does not improve performance is also discussed.
Resumo:
Objective: To compare rates of self-reported use of health services between rural, remote and urban South Australians. Methods: Secondary data analysis from a population-based survey to assess health and well-being, conducted in South Australia in 2000. In all, 2,454 adults were randomly selected and interviewed using the computer-assisted telephone interview (CATI) system. We analysed health service use by Accessibility and Remoteness Index of Australia (ARIA) category. Results: There was no statistically significant difference in the median number of uses of the four types of health services studied across ARIA categories. Significantly fewer residents of highly accessible areas reported never using primary care services (14.4% vs. 22.2% in very remote areas), and significantly more reported high use ( greater than or equal to6 visits, 29.3% vs. 21.5%). Fewer residents of remote areas reported never attending hospital (65.6% vs. 73.8% in highly accessible areas). Frequency of use of mental health services was not statistically significantly different across ARIA categories. Very remote residents were more likely to spend at least one night in a public hospital (15.8%) than were residents of other areas (e.g. 5.9% for highly accessible areas). Conclusion: The self-reported frequency of use of a range of health services in South Australia was broadly similar across ARIA categories. However, use of primary care services was higher among residents of highly accessible areas and public hospital use increased with increasing remoteness. There is no evidence for systematic rural disadvantage in terms of self-reported health service utilisation in this State.
Resumo:
This study evaluated whether projects conducted through the Access to Allied Health Services component of the Australian Better Outcomes in Mental Health Care initiative are improving access to evidence-based, non-pharmacological therapies for people with depression and anxiety. Synthesising data from the first 29 projects funded through the initiative, the study found that the models utilised in the projects have evolved over time. The projects have achieved a high level uptake; at a conservative estimate, 710 GPs and 160 allied health professionals (AHPs) have provided care to 3,476 consumers. The majority of these consumers have depression (77%) and/or anxiety disorders (55%); many are low income earners (57%); and a number have not previously accessed mental health care (40%). The projects have delivered 8,678 sessions of high quality care to these consumers, most commonly providing CBT-based cognitive and behavioural interventions (55% and 41%, respectively). In general, GPs, AHPs and consumers are sanguine about the projects, and have reported positive consumer outcomes. However, as with any new initiative, there are some practical and professional issues that need to be addressed. The projects are improving access to evidence-based, non-pharmacological therapies. The continuation and expansion of the initiative should be a priority.
Resumo:
A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.