55 resultados para databases and data mining
Resumo:
Details about the parameters of kinetic systems are crucial for progress in both medical and industrial research, including drug development, clinical diagnosis and biotechnology applications. Such details must be collected by a series of kinetic experiments and investigations. The correct design of the experiment is essential to collecting data suitable for analysis, modelling and deriving the correct information. We have developed a systematic and iterative Bayesian method and sets of rules for the design of enzyme kinetic experiments. Our method selects the optimum design to collect data suitable for accurate modelling and analysis and minimises the error in the parameters estimated. The rules select features of the design such as the substrate range and the number of measurements. We show here that this method can be directly applied to the study of other important kinetic systems, including drug transport, receptor binding, microbial culture and cell transport kinetics. It is possible to reduce the errors in the estimated parameters and, most importantly, increase the efficiency and cost-effectiveness by reducing the necessary amount of experiments and data points measured. (C) 2003 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Resumo:
This paper describes a prototype grid infrastructure, called the eMinerals minigrid, for molecular simulation scientists. which is based on an integration of shared compute and data resources. We describe the key components, namely the use of Condor pools, Linux/Unix clusters with PBS and IBM's LoadLeveller job handling tools, the use of Globus for security handling, the use of Condor-G tools for wrapping globus job submit commands, Condor's DAGman tool for handling workflow, the Storage Resource Broker for handling data, and the CCLRC dataportal and associated tools for both archiving data with metadata and making data available to other workers.
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
This article examines Corporate Social Responsibility (CSR) and mining community development, sustainability and viability. These issues are considered focussing on current and former company-owned mining towns in Namibia. Historically company towns have been a feature of mining activity in Namibia. However, the fate of such towns upon mine closure has been and remains controversial. Declining former mining communities and even ghost mining towns can be found across the country. This article draws upon research undertaken in Namibia and considers these issues with reference to three case study communities. This article examines the complexities which surround decision-making about these communities, and the challenges faced in efforts to encourage their sustainability after mining. In this article, mine company engagements through CSR with the development, sustainability and viability of such communities are also critically discussed. The role, responsibilities, and actions of the state in relation to these communities are furthermore reflected upon. Finally, ways forward for these communities are considered.
Resumo:
Version 1 of the Global Charcoal Database is now available for regional fire history reconstructions, data exploration, hypothesis testing, and evaluation of coupled climate–vegetation–fire model simulations. The charcoal database contains over 400 radiocarbon-dated records that document changes in charcoal abundance during the Late Quaternary. The aim of this public database is to stimulate cross-disciplinary research in fire sciences targeted at an increased understanding of the controls and impacts of natural and anthropogenic fire regimes on centennial-to-orbital timescales. We describe here the data standardization techniques for comparing multiple types of sedimentary charcoal records. Version 1 of the Global Charcoal Database has been used to characterize global and regional patterns in fire activity since the last glacial maximum. Recent studies using the charcoal database have explored the relation between climate and fire during periods of rapid climate change, including evidence of fire activity during the Younger Dryas Chronozone, and during the past two millennia.
Resumo:
Refractivity changes (ΔN) derived from radar ground clutter returns serve as a proxy for near-surface humidity changes (1 N unit ≡ 1% relative humidity at 20 °C). Previous studies have indicated that better humidity observations should improve forecasts of convection initiation. A preliminary assessment of the potential of refractivity retrievals from an operational magnetron-based C-band radar is presented. The increased phase noise at shorter wavelengths, exacerbated by the unknown position of the target within the 300 m gate, make it difficult to obtain absolute refractivity values, so we consider the information in 1 h changes. These have been derived to a range of 30 km with a spatial resolution of ∼4 km; the consistency of the individual estimates (within each 4 km × 4 km area) indicates that ΔN errors are about 1 N unit, in agreement with in situ observations. Measurements from an instrumented tower on summer days show that the 1 h refractivity changes up to a height of 100 m remain well correlated with near-surface values. The analysis of refractivity as represented in the operational Met Office Unified Model at 1.5, 4 and 12 km grid lengths demonstrates that, as model resolution increases, the spatial scales of the refractivity structures improve. It is shown that the magnitude of refractivity changes is progressively underestimated at larger grid lengths during summer. However, the daily time series of 1 h refractivity changes reveal that, whereas the radar-derived values are very well correlated with the in situ observations, the high-resolution model runs have little skill in getting the right values of ΔN in the right place at the right time. This suggests that the assimilation of these radar refractivity observations could benefit forecasts of the initiation of convection.
Resumo:
Systems Engineering often involves computer modelling the behaviour of proposed systems and their components. Where a component is human, fallibility must be modelled by a stochastic agent. The identification of a model of decision-making over quantifiable options is investigated using the game-domain of Chess. Bayesian methods are used to infer the distribution of players’ skill levels from the moves they play rather than from their competitive results. The approach is used on large sets of games by players across a broad FIDE Elo range, and is in principle applicable to any scenario where high-value decisions are being made under pressure.
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
Background: Since their inception, Twitter and related microblogging systems have provided a rich source of information for researchers and have attracted interest in their affordances and use. Since 2009 PubMed has included 123 journal articles on medicine and Twitter, but no overview exists as to how the field uses Twitter in research. // Objective: This paper aims to identify published work relating to Twitter indexed by PubMed, and then to classify it. This classification will provide a framework in which future researchers will be able to position their work, and to provide an understanding of the current reach of research using Twitter in medical disciplines. Limiting the study to papers indexed by PubMed ensures the work provides a reproducible benchmark. // Methods: Papers, indexed by PubMed, on Twitter and related topics were identified and reviewed. The papers were then qualitatively classified based on the paper’s title and abstract to determine their focus. The work that was Twitter focused was studied in detail to determine what data, if any, it was based on, and from this a categorization of the data set size used in the studies was developed. Using open coded content analysis additional important categories were also identified, relating to the primary methodology, domain and aspect. // Results: As of 2012, PubMed comprises more than 21 million citations from biomedical literature, and from these a corpus of 134 potentially Twitter related papers were identified, eleven of which were subsequently found not to be relevant. There were no papers prior to 2009 relating to microblogging, a term first used in 2006. Of the remaining 123 papers which mentioned Twitter, thirty were focussed on Twitter (the others referring to it tangentially). The early Twitter focussed papers introduced the topic and highlighted the potential, not carrying out any form of data analysis. The majority of published papers used analytic techniques to sort through thousands, if not millions, of individual tweets, often depending on automated tools to do so. Our analysis demonstrates that researchers are starting to use knowledge discovery methods and data mining techniques to understand vast quantities of tweets: the study of Twitter is becoming quantitative research. // Conclusions: This work is to the best of our knowledge the first overview study of medical related research based on Twitter and related microblogging. We have used five dimensions to categorise published medical related research on Twitter. This classification provides a framework within which researchers studying development and use of Twitter within medical related research, and those undertaking comparative studies of research relating to Twitter in the area of medicine and beyond, can position and ground their work.