972 resultados para Mining applications
Resumo:
Presently power system operation produces huge volumes of data that is still treated in a very limited way. Knowledge discovery and machine learning can make use of these data resulting in relevant knowledge with very positive impact. In the context of competitive electricity markets these data is of even higher value making clear the trend to make data mining techniques application in power systems more relevant. This paper presents two cases based on real data, showing the importance of the use of data mining for supporting demand response and for supporting player strategic behavior.
Resumo:
This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
* The work is partially supported by Grant no. NIP917 of the Ministry of Science and Education – Republic of Bulgaria.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
In ubiquitous data stream mining applications, different devices often aim to learn concepts that are similar to some extent. In these applications, such as spam filtering or news recommendation, the data stream underlying concept (e.g., interesting mail/news) is likely to change over time. Therefore, the resultant model must be continuously adapted to such changes. This paper presents a novel Collaborative Data Stream Mining (Coll-Stream) approach that explores the similarities in the knowledge available from other devices to improve local classification accuracy. Coll-Stream integrates the community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the feature space. We evaluate Coll-Stream classification accuracy in situations with concept drift, noise, partition granularity and concept similarity in relation to the local underlying concept. The experimental results show that Coll-Stream resultant model achieves stability and accuracy in a variety of situations using both synthetic and real world datasets.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
La optimización de sistemas y modelos se ha convertido en uno de los factores más importantes a la hora de buscar la mayor eficiencia de un proceso. Este concepto no es ajeno al transporte escolar, ambiente que cambia constantemente al ritmo de las necesidades de sus clientes, y que responde ante una fuerte responsabilidad frente a sus usuarios, los niños que hacen uso del servicio, en cuanto al cumplimiento de tiempos y seguridad, mientras busca constantemente la reducción de costos. Este proyecto expone las problemáticas presentadas en The English School en esta área y propone un modelo de optimización simple que permitirá notables mejoras en términos de tiempos y costos, de tal forma que genere beneficios para la institución en términos financieros y de satisfacción al cliente. Por medio de la implementación de este modelo será posible identificar errores comunes del proceso, se identificarán soluciones prácticas de fácil aplicación en el manejo del transporte y se presentarán los resultados obtenidos en la muestra utilizada para desarrollar el proyecto.
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
The fatigue-crack propagation and threshold behaviour of a C-Mn steel containing boron has been investigated at a range of strength levels suitable for mining chain applications. The heat-treatment variables examined include two austenitizing temperatures (900 degree C and 1250 degree C) and a range of tempering treatments from the as-quenched condition to tempering at 400 degree C. In mining applications the haulage chains undergo a 'calibration' process which has the effect of imposing a tensile prestrain on the chain links before they go into service. Prestrain is shown to reduce threshold values in these steels and this behaviour is related to its effects on the residual stress distribution in the test specimens.
Fictitious capital and the elusive quest in understanding its implications : illusions and paradoxes
Resumo:
This paper deals with the interaction between fictitious capital and the neoliberal model of growth and distribution, inspired by the classical economic tradition. Our renewed interest in this literature has a close connection with the recent international crisis in the capitalist economy. However, this discussion takes as its point of departure the fact that standard economic theory teaches that financial capital, in this world of increasing globalization, leads to new investment opportunities which improve levels of growth, employment, income distribution, and equilibrium. Accordingly, it is said that such financial resources expand the welfare of people and countries worldwide. Here we examine some illusions and paradoxes of such a paradigm. We show some theoretical and empirical consequences of this vision, which are quite different and have harmful constraints.
Resumo:
In pursuit of aligning with the European Union's ambitious target of achieving a carbon-neutral economy by 2050, researchers, vehicle manufacturers, and original equipment manufacturers have been at the forefront of exploring cutting-edge technologies for internal combustion engines. The introduction of these technologies has significantly increased the effort required to calibrate the models implemented in the engine control units. Consequently the development of tools that reduce costs and the time required during the experimental phases, has become imperative. Additionally, to comply with ever-stricter limits on 〖"CO" 〗_"2" emissions, it is crucial to develop advanced control systems that enhance traditional engine management systems in order to reduce fuel consumption. Furthermore, the introduction of new homologation cycles, such as the real driving emissions cycle, compels manufacturers to bridge the gap between engine operation in laboratory tests and real-world conditions. Within this context, this thesis showcases the performance and cost benefits achievable through the implementation of an auto-adaptive closed-loop control system, leveraging in-cylinder pressure sensors in a heavy-duty diesel engine designed for mining applications. Additionally, the thesis explores the promising prospect of real-time self-adaptive machine learning models, particularly neural networks, to develop an automatic system, using in-cylinder pressure sensors for the precise calibration of the target combustion phase and optimal spark advance in a spark-ignition engines. To facilitate the application of these combustion process feedback-based algorithms in production applications, the thesis discusses the results obtained from the development of a cost-effective sensor for indirect cylinder pressure measurement. Finally, to ensure the quality control of the proposed affordable sensor, the thesis provides a comprehensive account of the design and validation process for a piezoelectric washer test system.