877 resultados para data-mining application
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
A hydrological–economic model is introduced to describe the dynamics of groundwater-dependent economics (agriculture and tourism) for sustainable use in sparse-data drylands. The Amtoudi Oasis, a remote area in southern Morocco, in the northern Sahara attractive for tourism and with evidence of groundwater degradation, was chosen to show the model operation. Governing system variables were identified and put into action through System Dynamics (SD) modeling causal diagrams to program basic formulations into a model having two modules coupled by the nexus ‘pumping’: (1) the hydrological module represents the net groundwater balance (G) dynamics; and (2) the economic module reproduces the variation in the consumers of water, both the population and tourists. The model was operated under similar influx of tourists and different scenarios of water availability, such as the wet 2009–2010 and the average 2010–2011 hydrological years. The rise in international tourism is identified as the main driving force reducing emigration and introducing new social habits in the population, in particular concerning water consumption. Urban water allotment (PU) was doubled for less than a 100-inhabitant net increase in recent decades. The water allocation for agriculture (PI), the largest consumer of water, had remained constant for decades. Despite that the 2-year monitoring period is not long enough to draw long-term conclusions, groundwater imbalance was reflected by net aquifer recharge (R) less than PI + PU (G < 0) in the average year 2010–2011, with net lateral inflow from adjacent Cambrian formations being the largest recharge component. R is expected to be much less than PI + PU in recurrent dry spells. Some low-technology actions are tentatively proposed to mitigate groundwater degradation, such as: wastewater capture, treatment, and reuse for irrigation; storm-water harvesting for irrigation; and active maintenance of the irrigation system to improve its efficiency.
Resumo:
Em época de crise financeira, as ferramentas open source de data mining representam uma nova tendência na investigação, educação e nas aplicações industriais, especialmente para as pequenas e médias empresas. Com o software open source, estas podem facilmente iniciar um projeto de data mining usando as tecnologias mais recentes, sem se preocuparem com os custos de aquisição das mesmas, podendo apostar na aprendizagem dos seus colaboradores. Os sistemas open source proporcionam o acesso ao código, facilitando aos colaboradores a compreensão dos sistemas e algoritmos e permitindo que estes o adaptem às necessidades dos seus projetos. No entanto, existem algumas questões inerentes ao uso deste tipo de ferramenta. Uma das mais importantes é a diversidade, e descobrir, tardiamente, que a ferramenta escolhida é inapropriada para os objetivos do nosso negócio pode ser um problema grave. Como o número de ferramentas de data mining continua a crescer, a escolha sobre aquela que é realmente mais apropriada ao nosso negócio torna-se cada vez mais difícil. O presente estudo aborda um conjunto de ferramentas de data mining, de acordo com as suas características e funcionalidades. As ferramentas abordadas provém da listagem do KDnuggets referente a Software Suites de Data Mining. Posteriormente, são identificadas as que reúnem melhores condições de trabalho, que por sua vez são as mais populares nas comunidades, e é feito um teste prático com datasets reais. Os testes pretendem identificar como reagem as ferramentas a cenários diferentes do tipo: performance no processamento de grandes volumes de dados; precisão de resultados; etc. Nos tempos que correm, as ferramentas de data mining open source representam uma oportunidade para os seus utilizadores, principalmente para as pequenas e médias empresas, deste modo, os resultados deste estudo pretendem ajudar no processo de tomada de decisão relativamente às mesmas.
Resumo:
Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.
Resumo:
Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.
Resumo:
This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.
Resumo:
Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.
Resumo:
This paper presents load profiles of electricity customers, using the knowledge discovery in databases (KDD) procedure, a data mining technique, to determine the load profiles for different types of customers. In this paper, the current load profiling methods are compared using data mining techniques, by analysing and evaluating these classification techniques. The objective of this study is to determine the best load profiling methods and data mining techniques to classify, detect and predict non-technical losses in the distribution sector, due to faulty metering and billing errors, as well as to gather knowledge on customer behaviour and preferences so as to gain a competitive advantage in the deregulated market. This paper focuses mainly on the comparative analysis of the classification techniques selected; a forthcoming paper will focus on the detection and prediction methods.
Resumo:
The best way of finding “natural groups” in management research remains subject to debate and within the literature there is no accepted consensus. The principle motivation behind this study is to explore the effect of choices of method upon strategic group research, an area that has suffered enduring criticism, as we believe that these method choices are still not fully exploited. Our study is novel in the use of a variety of more robust clustering and validation techniques, rarely used in management research, some borrowed from the natural sciences, which may provide a useful and more robust base for this type of research. Our results confirm that methods do exist to address the concerns over strategic group research and adoption of our chosen methods will improve the quality of management research.
Resumo:
In this paper we develop an index and an indicator of productivity change that can be used with negative data. For that purpose the range directional model (RDM), a particular case of the directional distance function, is used for computing efficiency in the presence of negative data. We use RDM efficiency measures to arrive at a Malmquist-type index, which can reflect productivity change, and we use RDM inefficiency measures to arrive at a Luenberger productivity indicator, and relate the two. The productivity index and indicator are developed relative to a fixed meta-technology and so they are referred to as a meta-Malmquist index and meta-Luenberger indicator. We also address the fact that VRS technologies are used for computing the productivity index and indicator (a requirement under negative data), which raises issues relating to the interpretability of the index. We illustrate how the meta-Malmquist index can be used, not only for comparing the performance of a unit in two time periods, but also for comparing the performance of two different units at the same or different time periods. The proposed approach is then applied to a sample of bank branches where negative data were involved. The paper shows how the approach yields information from a variety of perspectives on performance which management can use.