28 resultados para Data pre-processing
em Instituto Politécnico do Porto, Portugal
Resumo:
This paper presents an electricity medium voltage (MV) customer characterization framework supportedby knowledge discovery in database (KDD). The main idea is to identify typical load profiles (TLP) of MVconsumers and to develop a rule set for the automatic classification of new consumers. To achieve ourgoal a methodology is proposed consisting of several steps: data pre-processing; application of severalclustering algorithms to segment the daily load profiles; selection of the best partition, corresponding tothe best consumers’ segmentation, based on the assessments of several clustering validity indices; andfinally, a classification model is built based on the resulting clusters. To validate the proposed framework,a case study which includes a real database of MV consumers is performed.
Resumo:
This paper presents a methodology supported on the data base knowledge discovery process (KDD), in order to find out the failure probability of electrical equipments’, which belong to a real electrical high voltage network. Data Mining (DM) techniques are used to discover a set of outcome failure probability and, therefore, to extract knowledge concerning to the unavailability of the electrical equipments such us power transformers and high-voltages power lines. The framework includes several steps, following the analysis of the real data base, the pre-processing data, the application of DM algorithms, and finally, the interpretation of the discovered knowledge. To validate the proposed methodology, a case study which includes real databases is used. This data have a heavy uncertainty due to climate conditions for this reason it was used fuzzy logic to determine the set of the electrical components failure probabilities in order to reestablish the service. The results reflect an interesting potential of this approach and encourage further research on the topic.
Resumo:
This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.
Resumo:
The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.
Resumo:
XML Schema is one of the most used specifications for defining types of XML documents. It provides an extensive set of primitive data types, ways to extend and reuse definitions and an XML syntax that simplifies automatic manipulation. However, many features that make XML Schema Definitions (XSD) so interesting also make them rather cumbersome to read. Several tools to visualize and browse schema definitions have been proposed to cope with this issue. The novel approach proposed in this paper is to base XSD visualization and navigation on the XML document itself, using solely the web browser, without requiring a pre-processing step or an intermediate representation. We present the design and implementation of a web-based XML Schema browser called schem@Doc that operates over the XSD file itself. With this approach, XSD visualization is synchronized with the source file and always reflects its current state. This tool fits well in the schema development process and is easy to integrate in web repositories containing large numbers of XSD files.
Resumo:
A presente tese tem como principal objetivo a comparação entre dois software de CFD (Computer Fluid Dynamics) na simulação de escoamentos atmosféricos com vista à sua aplicação ao estudo e caracterização de parques eólicos. O software em causa são o OpenFOAM (Open Field Operation and Manipulation) - freeware open source genérico - e o Windie, ferramenta especializada no estudo de parques eólicos. Para este estudo foi usada a topografia circundante a um parque eólico situado na Grécia, do qual dispúnhamos de resultados de uma campanha de medições efetuada previamente. Para este _m foram usados procedimentos e ferramentas complementares ao Open-FOAM, desenvolvidas por da Silva Azevedo (2013) adequados para a realização do pré-processamento, extração de dados e pós-processamento, aplicados na simulação do caso pratico. As condições de cálculo usadas neste trabalho limitaram-se às usadas na simulação de escoamentos previamente simulados pelo software Windie: condições de escoamento turbulento, estacionário, incompressível e em regime não estratificado, com o recurso ao modelo de turbulência RaNS (Reynolds-averaged Navier-Stokes ) k - E atmosférico. Os resultados de ambas as simulações - OpenFOAM e Windie - foram comparados com resultados de uma campanha de medições, através dos valores de speed-up e intensidade turbulenta nas posições dos anemómetros.
Resumo:
Mestrado em Engenharia Informática
Resumo:
O estudo das curvas características de um transístor permite conhecer um conjunto de parâmetros essenciais à sua utilização tanto no domínio da amplificação de sinais como em circuitos de comutação. Deste estudo é possível obter dados em condições que muitas vezes não constam na documentação fornecida pelos fabricantes. O trabalho que aqui se apresenta consiste no desenvolvimento de um sistema que permite de forma simples, eficiente e económica obter as curvas características de um transístor (bipolar de junção, efeito de campo de junção e efeito de campo de metal-óxido semicondutor), podendo ainda ser utilizado como instrumento pedagógico na introdução ao estudo dos dispositivos semicondutores ou no projecto de amplificadores transistorizados. O sistema é constituído por uma unidade de condicionamento de sinal, uma unidade de processamento de dados (hardware) e por um programa informático que permite o processamento gráfico dos dados obtidos, isto é, traçar as curvas características do transístor. O seu princípio de funcionamento consiste na utilização de um conversor Digital-Analógico (DAC) como fonte de tensão variável, alimentando a base (TBJ) ou a porta (JFET e MOSFET) do dispositivo a testar. Um segundo conversor fornece a variação da tensão VCE ou VDS necessária à obtenção de cada uma das curvas. O controlo do processo é garantido por uma unidade de processamento local, baseada num microcontrolador da família 8051, responsável pela leitura dos valores em corrente e em tensão recorrendo a conversores Analógico-Digital (ADC). Depois de processados, os dados são transmitidos através de uma ligação USB para um computador no qual um programa procede à representação gráfica, das curvas características de saída e à determinação de outros parâmetros característicos do dispositivo semicondutor em teste. A utilização de componentes convencionais e a simplicidade construtiva do projecto tornam este sistema económico, de fácil utilização e flexível, pois permite com pequenas alterações
Resumo:
In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.
Resumo:
Trabalho de Projeto apresentado ao Instituto de Contabilidade e Administração do Porto para a obtenção do grau de Mestre em Tradução e Interpretação Especializadas, sob orientação da Doutora Sara Cerqueira Pascoal
Resumo:
Network control systems (NCSs) are spatially distributed systems in which the communication between sensors, actuators and controllers occurs through a shared band-limited digital communication network. However, the use of a shared communication network, in contrast to using several dedicated independent connections, introduces new challenges which are even more acute in large scale and dense networked control systems. In this paper we investigate a recently introduced technique of gathering information from a dense sensor network to be used in networked control applications. Obtaining efficiently an approximate interpolation of the sensed data is exploited as offering a good tradeoff between accuracy in the measurement of the input signals and the delay to the actuation. These are important aspects to take into account for the quality of control. We introduce a variation to the state-of-the-art algorithms which we prove to perform relatively better because it takes into account the changes over time of the input signal within the process of obtaining an approximate interpolation.
Resumo:
Cooperating objects (COs) is a recently coined term used to signify the convergence of classical embedded computer systems, wireless sensor networks and robotics and control. We present essential elements of a reference architecture for scalable data processing for the CO paradigm.
Resumo:
Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.
Resumo:
Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files
Resumo:
Over time, XML markup language has acquired a considerable importance in applications development, standards definition and in the representation of large volumes of data, such as databases. Today, processing XML documents in a short period of time is a critical activity in a large range of applications, which imposes choosing the most appropriate mechanism to parse XML documents quickly and efficiently. When using a programming language for XML processing, such as Java, it becomes necessary to use effective mechanisms, e.g. APIs, which allow reading and processing of large documents in appropriated manners. This paper presents a performance study of the main existing Java APIs that deal with XML documents, in order to identify the most suitable one for processing large XML files.