979 resultados para Text Mining
Resumo:
In many countries the use of renewable energy is increasing due to the introduction of new energy and environmental policies. Thus, the focus on the efficient integration of renewable energy into electric power systems is becoming extremely important. Several European countries have already achieved high penetration of wind based electricity generation and are gradually evolving towards intensive use of this generation technology. The introduction of wind based generation in power systems poses new challenges for the power system operators. This is mainly due to the variability and uncertainty in weather conditions and, consequently, in the wind based generation. In order to deal with this uncertainty and to improve the power system efficiency, adequate wind forecasting tools must be used. This paper proposes a data-mining-based methodology for very short-term wind forecasting, which is suitable to deal with large real databases. The paper includes a case study based on a real database regarding the last three years of wind speed, and results for wind speed forecasting at 5 minutes intervals.
Resumo:
In recent decades, all over the world, competition in the electric power sector has deeply changed the way this sector’s agents play their roles. In most countries, electric process deregulation was conducted in stages, beginning with the clients of higher voltage levels and with larger electricity consumption, and later extended to all electrical consumers. The sector liberalization and the operation of competitive electricity markets were expected to lower prices and improve quality of service, leading to greater consumer satisfaction. Transmission and distribution remain noncompetitive business areas, due to the large infrastructure investments required. However, the industry has yet to clearly establish the best business model for transmission in a competitive environment. After generation, the electricity needs to be delivered to the electrical system nodes where demand requires it, taking into consideration transmission constraints and electrical losses. If the amount of power flowing through a certain line is close to or surpasses the safety limits, then cheap but distant generation might have to be replaced by more expensive closer generation to reduce the exceeded power flows. In a congested area, the optimal price of electricity rises to the marginal cost of the local generation or to the level needed to ration demand to the amount of available electricity. Even without congestion, some power will be lost in the transmission system through heat dissipation, so prices reflect that it is more expensive to supply electricity at the far end of a heavily loaded line than close to an electric power generation. Locational marginal pricing (LMP), resulting from bidding competition, represents electrical and economical values at nodes or in areas that may provide economical indicator signals to the market agents. This article proposes a data-mining-based methodology that helps characterize zonal prices in real power transmission networks. To test our methodology, we used an LMP database from the California Independent System Operator for 2009 to identify economical zones. (CAISO is a nonprofit public benefit corporation charged with operating the majority of California’s high-voltage wholesale power grid.) To group the buses into typical classes that represent a set of buses with the approximate LMP value, we used two-step and k-means clustering algorithms. By analyzing the various LMP components, our goal was to extract knowledge to support the ISO in investment and network-expansion planning.
Resumo:
This paper presents a methodology supported on the data base knowledge discovery process (KDD), in order to find out the failure probability of electrical equipments’, which belong to a real electrical high voltage network. Data Mining (DM) techniques are used to discover a set of outcome failure probability and, therefore, to extract knowledge concerning to the unavailability of the electrical equipments such us power transformers and high-voltages power lines. The framework includes several steps, following the analysis of the real data base, the pre-processing data, the application of DM algorithms, and finally, the interpretation of the discovered knowledge. To validate the proposed methodology, a case study which includes real databases is used. This data have a heavy uncertainty due to climate conditions for this reason it was used fuzzy logic to determine the set of the electrical components failure probabilities in order to reestablish the service. The results reflect an interesting potential of this approach and encourage further research on the topic.
Resumo:
Presently power system operation produces huge volumes of data that is still treated in a very limited way. Knowledge discovery and machine learning can make use of these data resulting in relevant knowledge with very positive impact. In the context of competitive electricity markets these data is of even higher value making clear the trend to make data mining techniques application in power systems more relevant. This paper presents two cases based on real data, showing the importance of the use of data mining for supporting demand response and for supporting player strategic behavior.
Resumo:
A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.
Resumo:
In the present study we focus on the interaction between the acquisition of new words and text organisation. In the acquisition of new words we emphasise the acquisition of paradigmatic relations such as hyponymy, meronymy and semantic sets. We work with a group of girls attending a private school for adolescents in serious difficulties. The subjects are from disadvantaged families. Their writing skills were very poor. When asked to describe a garden, they write a short text of a single paragraph, the lexical items were generic, there were no adjectives, and all of them use mainly existential verbs. The intervention plan assumed that subjects must to be exposed to new words, working out its meaning. In presence of referents subjects were taught new words making explicit the intended relation of the new term to a term already known. In the classroom subjects were asked to write all the words they knew drawing the relationships among them. They talk about the words specifying the relation making explicit pragmatic directions like is a kind of, is a part of or are all x. After that subjects were exposed to the task of choosing perspective. The work presented in this paper accounts for significant differences in the text of the subjects before and after the intervention. While working new words subjects were organising their lexicon and learning to present a whole entity in perspective.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
Mestrado em Engenharia Electrotécnica – Sistemas Eléctricos de Energia
Resumo:
For some years now, translation theorist and educator Anthony Pym has been trying to establish a dialogue between the academic tradition he comes from and the world of the language industries into which he is meant to introduce his students: in other words, between the Translation Studies discipline and the localisation sector. This rapprochement is also the stated aim of his new book The Moving Text (p. 159). Rather than collect and synthesise what was previously dispersed over several articles, Pym has rewritten his material completely, both literally and conceptually, all in the light of the more than three decades of research he has conducted into the field of cross--cultural communication. The theoretical arguments are ably supported by a few short but telling and well-exploited examples.
Resumo:
Dissertação para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
A descoberta de conhecimento em dados hoje em dia é um ponto forte para as empresas. Atualmente a CardMobili não dispõe de qualquer sistema de mineração de dados, sendo a existência deste uma mais-valia para as suas operações de marketing diárias, nomeadamente no lançamento de cupões a um grupo restrito de clientes com uma elevada probabilidade que os mesmos os utilizem. Para isso foi analisada a base de dados da aplicação tentando extrair o maior número de dados e aplicadas as transformações necessárias para posteriormente serem processados pelos algoritmos de mineração de dados. Durante a etapa de mineração de dados foram aplicadas as técnicas de associação e classificação, sendo que os melhores resultados foram obtidos com técnicas de associação. Desta maneira pretende-se que os resultados obtidos auxiliem o decisor na sua tomada de decisões.
Resumo:
The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.
Resumo:
Doctoral Thesis in Information Systems and Technologies Area of Engineering and Manag ement Information Systems
Resumo:
ABSTRACT This study aimed to describe the digital disease detection and participatory surveillance in different countries. The systems or platforms consolidated in the scientific field were analyzed by describing the strategy, type of data source, main objectives, and manner of interaction with users. Eleven systems or platforms, developed from 1996 to 2016, were analyzed. There was a higher frequency of data mining on the web and active crowdsourcing as well as a trend in the use of mobile applications. It is important to provoke debate in the academia and health services for the evolution of methods and insights into participatory surveillance in the digital age.