17 resultados para Scaling-up

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Remote sensing can potentially provide information useful in improving pollution transport modelling in agricultural catchments. Realisation of this potential will depend on the availability of the raw data, development of information extraction techniques, and the impact of the assimilation of the derived information into models. High spatial resolution hyperspectral imagery of a farm near Hereford, UK is analysed. A technique is described to automatically identify the soil and vegetation endmembers within a field, enabling vegetation fractional cover estimation. The aerially-acquired laser altimetry is used to produce digital elevation models of the site. At the subfield scale the hypothesis that higher resolution topography will make a substantial difference to contaminant transport is tested using the AGricultural Non-Point Source (AGNPS) model. Slope aspect and direction information are extracted from the topography at different resolutions to study the effects on soil erosion, deposition, runoff and nutrient losses. Field-scale models are often used to model drainage water, nitrate and runoff/sediment loss, but the demanding input data requirements make scaling up to catchment level difficult. By determining the input range of spatial variables gathered from EO data, and comparing the response of models to the range of variation measured, the critical model inputs can be identified. Response surfaces to variation in these inputs constrain uncertainty in model predictions and are presented. Although optical earth observation analysis can provide fractional vegetation cover, cloud cover and semi-random weather patterns can hinder data acquisition in Northern Europe. A Spring and Autumn cloud cover analysis is carried out over seven UK sites close to agricultural districts, using historic satellite image metadata, climate modelling and historic ground weather observations. Results are assessed in terms of probability of acquisition probability and implications for future earth observation missions. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Measuring pollinator performance has become increasingly important with emerging needs for risk assessment in conservation and sustainable agriculture that require multi-year and multi-site comparisons across studies. However, comparing pollinator performance across studies is difficult because of the diversity of concepts and disparate methods in use. Our review of the literature shows many unresolved ambiguities. Two different assessment concepts predominate: the first estimates stigmatic pollen deposition and the underlying pollinator behaviour parameters, while the second estimates the pollinator’s contribution to plant reproductive success, for example in terms of seed set. Both concepts include a number of parameters combined in diverse ways and named under a diversity of synonyms and homonyms. However, these concepts are overlapping because pollen deposition success is the most frequently used proxy for assessing the pollinator’s contribution to plant reproductive success. We analyse the diverse concepts and methods in the context of a new proposed conceptual framework with a modular approach based on pollen deposition, visit frequency, and contribution to seed set relative to the plant’s maximum female reproductive potential. A system of equations is proposed to optimize the balance between idealised theoretical concepts and practical operational methods. Our framework permits comparisons over a range of floral phenotypes, and spatial and temporal scales, because scaling up is based on the same fundamental unit of analysis, the single visit.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A manageable, relatively inexpensive model was constructed to predict the loss of nitrogen and phosphorus from a complex catchment to its drainage system. The model used an export coefficient approach, calculating the total nitrogen (N) and total phosphorus (P) load delivered annually to a water body as the sum of the individual loads exported from each nutrient source in its catchment. The export coefficient modelling approach permits scaling up from plot-scale experiments to the catchment scale, allowing application of findings from field experimental studies at a suitable scale for catchment management. The catchment of the River Windrush, a tributary of the River Thames, UK, was selected as the initial study site. The Windrush model predicted nitrogen and phosphorus loading within 2% of observed total nitrogen load and 0.5% of observed total phosphorus load in 1989. The export coefficient modelling approach was then validated by application in a second research basin, the catchment of Slapton Ley, south Devon, which has markedly different catchment hydrology and land use. The Slapton model was calibrated within 2% of observed total nitrogen load and 2.5% of observed total phosphorus load in 1986. Both models proved sensitive to the impact of temporal changes in land use and management on water quality in both catchments, and were therefore used to evaluate the potential impact of proposed pollution control strategies on the nutrient loading delivered to the River Windrush and Slapton Ley

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Approximately 1–2% of net primary production by land plants is re-emitted to the atmosphere as isoprene and monoterpenes. These emissions play major roles in atmospheric chemistry and air pollution–climate interactions. Phenomenological models have been developed to predict their emission rates, but limited understanding of the function and regulation of these emissions has led to large uncertainties in model projections of air quality and greenhouse gas concentrations. We synthesize recent advances in diverse fields, from cell physiology to atmospheric remote sensing, and use this information to propose a simple conceptual model of volatile isoprenoid emission based on regulation of metabolism in the chloroplast. This may provide a robust foundation for scaling up emissions from the cellular to the global scale.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The development of versatile bioactive surfaces able to emulate in vivo conditions is of enormous importance to the future of cell and tissue therapy. Tuning cell behaviour on two-dimensional surfaces so that the cells perform as if they were in a natural three-dimensional tissue represents a significant challenge, but one that must be met if the early promise of cell and tissue therapy is to be fully realised. Due to the inherent complexities involved in the manufacture of biomimetic three-dimensional substrates, the scaling up of engineered tissue-based therapies may be simpler if based upon proven two-dimensional culture systems. In this work, we developed new coating materials composed of the self-assembling peptide amphiphiles (PAs) C16G3RGD (RGD) and C16G3RGDS (RGDS) shown to control cell adhesion and tissue architecture while avoiding the use of serum. When mixed with the C16ETTES diluent PA at 13 : 87 (mol mol-1) ratio at 1.25 times 10-3 M, the bioactive {PAs} were shown to support optimal adhesion, maximal proliferation, and prolonged viability of human corneal stromal fibroblasts ({hCSFs)}, while improving the cell phenotype. These {PAs} also provided stable adhesive coatings on highly-hydrophobic surfaces composed of striated polytetrafluoroethylene ({PTFE)}, significantly enhancing proliferation of aligned cells and increasing the complexity of the produced tissue. The thickness and structure of this highly-organised tissue were similar to those observed in vivo, comprising aligned newly-deposited extracellular matrix. As such, the developed coatings can constitute a versatile biomaterial for applications in cell biology, tissue engineering, and regenerative medicine requiring serum-free conditions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The UK new-build housing sector is facing dual pressures to expand supply, whilst delivering against tougher planning and Building Regulation requirements; predominantly in the areas of sustainability. The sector is currently responding by significantly scaling up production and incorporating new technical solutions into new homes. This trajectory of up-scaling and technical innovation has been of research interest; but this research has primarily focus on the ‘upstream’ implications for house builders’ business models and standardised design templates. There has been little attention, though, to the potential ‘downstream’ implications of the ramping up of supply and the introduction of new technologies for build quality and defects. This paper contributes to our understanding of the ‘downstream’ implications through a synthesis of the current UK defect literature with respect to new-build housing. It is found that the prevailing emphasis in the literature is limited to the responsibility, pathology and statistical analysis of defects (and failures). The literature does not extend to how house builders individually and collectively, in practice, collect and learn from defects information. The paper concludes by describing an ongoing collaborative research programme with the National House Building Council (NHBC) to: (a) understand house builders’ localised defects analysis procedures, and their current knowledge feedback loops to inform risk management strategies; and, (b) building on this understanding, design and test action research interventions to develop new data capture, learning processes and systems to reduce targeted defects.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this study was to first evaluate the benefits of including Jersey milk into Holstein-Friesian milk on the Cheddar cheese making process and secondly, using the data gathered, identify the effects and relative importance of a wide range of milk components on milk coagulation properties and the cheese making process. Blending Jersey and Holstein-Friesian milk led to quadratic trends on the size of casein micelle and fat globule and on coagulation properties. However this was not found to affect the cheese making process. Including Jersey milk was found, on a pilot scale, to increase cheese yield (up to + 35 %) but it did not affect cheese quality, which was defined as compliance with the legal requirements of cheese composition, cheese texture, colour and grading scores. Profitability increased linearly with the inclusion of Jersey milk (up to 11.18 p£ L-1 of milk). The commercial trials supported the pilot plant findings, demonstrating that including Jersey milk increased cheese yield without having a negative impact on cheese quality, despite the inherent challenges of scaling up such a process commercially. The successful use of a large array of milk components to model the cheese making process challenged the commonly accepted view that fat, protein and casein content and protein to fat ratio are the main contributors to the cheese making process as other components such as the size of casein micelle and fat globule were found to also play a key role with small casein micelle and large fat globule reducing coagulation time, improving curd firmness, fat recovery and influencing cheese moisture and fat content. The findings of this thesis indicated that milk suitability for Cheddar making could be improved by the inclusion of Jersey milk and that more compositional factors need to be taken into account when judging milk suitability.