945 resultados para scaling-up
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Semiconductor nanowires (NWs) are one- or quasi one-dimensional systems whose physical properties are unique as compared to bulk materials because of their nanoscaled sizes. They bring together quantum world and semiconductor devices. NWs-based technologies may achieve an impact comparable to that of current microelectronic devices if new challenges will be faced. This thesis primarily focuses on two different, cutting-edge aspects of research over semiconductor NW arrays as pivotal components of NW-based devices. The first part deals with the characterization of electrically active defects in NWs. It has been elaborated the set-up of a general procedure which enables to employ Deep Level Transient Spectroscopy (DLTS) to probe NW arrays’ defects. This procedure has been applied to perform the characterization of a specific system, i.e. Reactive Ion Etched (RIE) silicon NW arrays-based Schottky barrier diodes. This study has allowed to shed light over how and if growth conditions introduce defects in RIE processed silicon NWs. The second part of this thesis concerns the bowing induced by electron beam and the subsequent clustering of gallium arsenide NWs. After a justified rejection of the mechanisms previously reported in literature, an original interpretation of the electron beam induced bending has been illustrated. Moreover, this thesis has successfully interpreted the formation of NW clusters in the framework of the lateral collapse of fibrillar structures. These latter are both idealized models and actual artificial structures used to study and to mimic the adhesion properties of natural surfaces in lizards and insects (Gecko effect). Our conclusion are that mechanical and surface properties of the NWs, together with the geometry of the NW arrays, play a key role in their post-growth alignment. The same parameters open, then, to the benign possibility of locally engineering NW arrays in micro- and macro-templates.
Resumo:
Global investment in Sustainable Land Management (SLM) has been substantial, but knowledge gaps remain. Overviews of where land degradation (LD) is taking place and how land users are addressing the problem using SLM are still lacking for most individual countries and regions. Relevant maps focus more on LD than SLM, and they have been compiled using different methods. This makes it impossible to compare the benefits of SLM interventions and prevents informed decision-making on how best to invest in land. To fill this knowledge gap, a standardised mapping method has been collaboratively developed by the World Overview of Conservation Approaches and Technologies (WOCAT), FAO’s Land Degradation Assessment in Drylands (LADA) project, and the EU’s Mitigating Desertification and Remediating Degraded Land (DESIRE) project. The method generates information on the distribution and characteristics of LD and SLM activities and can be applied at the village, national, or regional level. It is based on participatory expert assessment, documents, and surveys. These data sources are spatially displayed across a land-use systems base map. By enabling mapping of the DPSIR framework (Driving Forces-Pressures-State-Impacts-Responses) for degradation and conservation, the method provides key information for decision-making. It may also be used to monitor LD and conservation following project implementation. This contribution explains the mapping method, highlighting findings made at different levels (national and local) in South Africa and the Mediterranean region. Keywords: Mapping, Decision Support, Land Degradation, Sustainable Land Management, Ecosystem Services, Participatory Expert Assessment
Resumo:
Although there are numerous examples of large-scale commercial microbial synthesis routes for organic bioproducts, few studies have addressed the obvious potential for microbial systems to produce inorganic functional biomaterials at scale. Here we address this by focusing on the production of nano-scale biomagnetite particles by the Fe(III)-reducing bacterium Geobacter sulfurreducens, which was scaled-up successfully from lab-scale to pilot plant-scale production, whilst maintaining the surface reactivity and magnetic properties which make this material well suited to commercial exploitation. At the largest scale tested, the bacterium was grown in a 50 L bioreactor, harvested and then inoculated into a buffer solution containing Fe(III)-oxyhydroxide and an electron donor and mediator, which promoted the formation of magnetite in under 24 hours. This procedure was capable of producing up to 120 g biomagnetite. The particle size distribution was maintained between 10 and 15 nm during scale-up of this second step from 10 ml to 10 L, with conserved magnetic properties and surface reactivity; the latter demonstrated by the reduction of Cr(VI). The process presented provides an environmentally benign route to magnetite production and serves as an alternative to harsher synthetic techniques, with the clear potential to be used to produce kg to tonne quantities.
Resumo:
Information and Communication Technologies (ICT) can help social enterprises and other organizations working on global sustainability issues and in the human development sector in general scale their social impact. The flexibility, dynamism, and ubiquity of ICTs make them powerful tools for improving relationships among organizations and their beneficiaries, multiplying the effects of action against many, if not all, aspects of global unsustainability, including poverty and exclusion. The scaling of social impact occurs in two different dimensions. On one hand, ICTs can increase the value proposition of a program or action (depth scaling) in different ways: providing accurate and fast needs recognition, adapting products and services, creating opportunities, building fairer markets, mobilizing actions on environmental and social issues, and creating social capital. On the other hand, ICTs can also increase the number of people reached by the organization (breadth scaling) by accessing new resources, creating synergies and networks, improving organizational efficiency, increasing its visibility, and designing new access channels to beneficiaries. This article analyzes the role of ICT in the depth and breadth scaling of social impact.