947 resultados para Data pre-processing
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Esta dissertação consiste no projeto e implementação parcial de um sistema integrado de monitoramento da ave Harpia (Harpia harpyja), espécie encontrada na Amazônia, Cerrado e Mata Atlântica. O sistema de monitoramento é estruturado em três etapas: coleta, armazenamento e transmissão de dados. A primeira etapa consiste na coleta de dados a partir de sensores, podendo detectar a presença de pássaros no ninho, também o sistema conta com o auxílio de uma câmera responsável pela captura de vídeo e áudio. A segunda etapa destina-se ao pré-processamento e armazenamento de todas as informações coletadas. A terceira etapa é responsável pela transmissão dos dados através de satélite, utilizando o Sistema Brasileiro de Coleta de Dados Ambientais (SBCDA). Além disso, foi realizado o desenvolvimento de um protótipo utilizado para o monitoramento. Técnicas de sistemas embarcados são expostas para o leitor e o processo de detecção desta espécie é avaliado.
Resumo:
Pós-graduação em Engenharia Elétrica - FEIS
Resumo:
Nuclear Magnetic Resonance (NMR) is a branch of spectroscopy that is based on the fact that many atomic nuclei may be oriented by a strong magnetic field and will absorb radiofrequency radiation at characteristic frequencies. The parameters that can be measured on the resulting spectral lines (line positions, intensities, line widths, multiplicities and transients in time-dependent experi-ments) can be interpreted in terms of molecular structure, conformation, molecular motion and other rate processes. In this way, high resolution (HR) NMR allows performing qualitative and quantitative analysis of samples in solution, in order to determine the structure of molecules in solution and not only. In the past, high-field NMR spectroscopy has mainly concerned with the elucidation of chemical structure in solution, but today is emerging as a powerful exploratory tool for probing biochemical and physical processes. It represents a versatile tool for the analysis of foods. In literature many NMR studies have been reported on different type of food such as wine, olive oil, coffee, fruit juices, milk, meat, egg, starch granules, flour, etc using different NMR techniques. Traditionally, univariate analytical methods have been used to ex-plore spectroscopic data. This method is useful to measure or to se-lect a single descriptive variable from the whole spectrum and , at the end, only this variable is analyzed. This univariate methods ap-proach, applied to HR-NMR data, lead to different problems due especially to the complexity of an NMR spectrum. In fact, the lat-ter is composed of different signals belonging to different mole-cules, but it is also true that the same molecules can be represented by different signals, generally strongly correlated. The univariate methods, in this case, takes in account only one or a few variables, causing a loss of information. Thus, when dealing with complex samples like foodstuff, univariate analysis of spectra data results not enough powerful. Spectra need to be considered in their wholeness and, for analysing them, it must be taken in consideration the whole data matrix: chemometric methods are designed to treat such multivariate data. Multivariate data analysis is used for a number of distinct, differ-ent purposes and the aims can be divided into three main groups: • data description (explorative data structure modelling of any ge-neric n-dimensional data matrix, PCA for example); • regression and prediction (PLS); • classification and prediction of class belongings for new samples (LDA and PLS-DA and ECVA). The aim of this PhD thesis was to verify the possibility of identify-ing and classifying plants or foodstuffs, in different classes, based on the concerted variation in metabolite levels, detected by NMR spectra and using the multivariate data analysis as a tool to inter-pret NMR information. It is important to underline that the results obtained are useful to point out the metabolic consequences of a specific modification on foodstuffs, avoiding the use of a targeted analysis for the different metabolites. The data analysis is performed by applying chemomet-ric multivariate techniques to the NMR dataset of spectra acquired. The research work presented in this thesis is the result of a three years PhD study. This thesis reports the main results obtained from these two main activities: A1) Evaluation of a data pre-processing system in order to mini-mize unwanted sources of variations, due to different instrumental set up, manual spectra processing and to sample preparations arte-facts; A2) Application of multivariate chemiometric models in data analy-sis.
Resumo:
Innerhalb des Untersuchungsgebiets Schleswig-Holstein wurden 39.712 topographische Hohlformen detektiert. Genutzt wurden dazu ESRI ArcMap 9.3 und 10.0. Der Datenaufbereitung folgten weitere Kalkulationen in MATLAB R2010b. Jedes Objekt wurde räumlich mit seinen individuellen Eigenschaften verschnitten. Dazu gehörten Fläche, Umfang, Koordinaten (Zentroide), Tiefe und maximale Tiefe der Hohlform und Formfaktoren wie Rundheit, Konvexität und Elongation. Ziel der vorgestellten Methoden war die Beantwortung von drei Fragestellungen: Sind negative Landformen dazu geeignet Landschaftseinheiten und Eisvorstöße zu unterscheiden und zu bestimmen? Existiert eine Kopplung von Depressionen an der rezenten Topographie zu geologischen Tiefenstrukturen? Können Senken unterschiedlicher Entstehung anhand ihrer Formcharakteristik unterteilt werden? Die vorgenommene Klassifikation der großen Landschaftseinheiten basiert auf der Annahme, dass sowohl Jungmoränengebiete, ihre Vorflächen als auch Altmoränengebiete durch charakteristische, abflusslose Hohlformen, wie Toteislöcher, Seen, etc. abgegrenzt werden können. Normalerweise sind solche Depressionen in der Natur eher selten, werden jedoch für ehemalige Glaziallandschaften als typisch erachtet. Ziel war es, die geologischen Haupteinheiten, Eisvorstöße und Moränengebiete der letzten Vereisungen zu differenzieren. Zur Bearbeitung wurde ein Detektionsnetz verwendet, das auf quadratischen Zellen beruht. Die Ergebnisse zeigen, dass durch die alleinige Nutzung von Depressionen zur Klassifizierung von Landschaftseinheiten Gesamtgenauigkeiten von bis zu 71,4% erreicht werden können. Das bedeutet, dass drei von vier Detektionszellen korrekt zugeordnet werden können. Jungmoränen, Altmoränen, periglazialeVorflächen und holozäne Bereiche können mit Hilfe der Hohlformen mit großer Sicherheit voneinander unterschieden und korrekt zugeordnet werden. Dies zeigt, dass für die jeweiligen Einheiten tatsächlich bestimmte Senkenformen typisch sind. Die im ersten Schritt detektierten Senken wurden räumlich mit weiterreichenden geologischen Informationen verschnitten, um zu untersuchen, inwieweit natürliche Depressionen nur glazial entstanden sind oder ob ihre Ausprägung auch mit tiefengeologischen Strukturen in Zusammenhang steht. 25.349 (63,88%) aller Senken sind kleiner als 10.000 m² und liegen in Jungmoränengebieten und können vermutlich auf glaziale und periglaziale Einflüsse zurückgeführt werden. 2.424 Depressionen liegen innerhalb der Gebiete subglazialer Rinnen. 1.529 detektierte Hohlformen liegen innerhalb von Subsidenzgebieten, von denen 1.033 innerhalb der Marschländer im Westen verortet sind. 919 große Strukturen über 1 km Größe entlang der Nordsee sind unter anderem besonders gut mit Kompaktionsbereichen elsterzeitlicher Rinnen zu homologisieren.344 dieser Hohlformen sind zudem mit Tunneltälern im Untergrund assoziiert. Diese Parallelität von Depressionen und den teils über 100 m tiefen Tunneltälern kann auf Sedimentkompaktion zurückgeführt werden. Ein Zusammenhang mit der Zersetzung postglazialen, organischen Materials ist ebenfalls denkbar. Darüber hinaus wurden in einer Distanz von 10 km um die miozän aktiven Flanken des Glückstadt-Grabens negative Landformen detektiert, die Verbindungen zu oberflächennahen Störungsstrukturen zeigen. Dies ist ein Anzeichen für Grabenaktivität während und gegen Ende der Vereisung und während des Holozäns. Viele dieser störungsbezogenen Senken sind auch mit Tunneltälern assoziiert. Entsprechend werden drei zusammenspielende Prozesse identifiziert, die mit der Entstehung der Hohlformen in Verbindung gebracht werden können. Eine mögliche Interpretation ist, dass die östliche Flanke des Glückstadt-Grabens auf die Auflast des elsterzeitlichen Eisschilds reagierte, während sich subglazial zeitgleich Entwässerungsrinnen entlang der Schwächezonen ausbildeten. Diese wurden in den Warmzeiten größtenteils durch Torf und unverfestigte Sedimente verfüllt. Die Gletschervorstöße der späten Weichselzeit aktivierten erneut die Flanken und zusätzlich wurde das Lockermaterial exariert, wodurch große Seen, wie z. B. der Große Plöner See entstanden sind. Insgesamt konnten 29 große Depressionen größer oder gleich 5 km in Schleswig-Holstein identifiziert werden, die zumindest teilweise mit Beckensubsidenz und Aktivität der Grabenflanken verbunden sind, bzw. sogar auf diese zurückgehen.Die letzte Teilstudie befasste sich mit der Differenzierung von Senken nach deren potentieller Genese sowie der Unterscheidung natürlicher von künstlichen Hohlformen. Dazu wurde ein DEM für einen Bereich im Norden Niedersachsens verwendet, das eine Gesamtgröße von 252 km² abdeckt. Die Ergebnisse zeigen, dass glazial entstandene Depressionen gute Rundheitswerte aufweisen und auch Elongation und Exzentrizität eher kompakte Formen anzeigen. Lineare negative Strukturen sind oft Flüsse oder Altarme. Sie können als holozäne Strukturen identifiziert werden. Im Gegensatz zu den potentiell natürlichen Senkenformen sind künstlich geschaffene Depressionen eher eckig oder ungleichmäßig und tendieren meist nicht zu kompakten Formen. Drei Hauptklassen topographischer Depressionen konnten identifiziert und voneinander abgegrenzt werden: Potentiell glaziale Senken (Toteisformen), Flüsse, Seiten- und Altarme sowie künstliche Senken. Die Methode der Senkenklassifikation nach Formparametern ist ein sinnvolles Instrument, um verschiedene Typen unterscheiden zu können und um bei geologischen Fragestellungen künstliche Senken bereits vor der Verarbeitung auszuschließen. Jedoch zeigte sich, dass die Ergebnisse im Wesentlichen von der Auflösung des entsprechenden Höhenmodells abhängen.
Resumo:
It has been demonstrated that rating trust and reputation of individual nodes is an effective approach in distributed environments in order to improve security, support decision-making and promote node collaboration. Nevertheless, these systems are vulnerable to deliberate false or unfair testimonies. In one scenario, the attackers collude to give negative feedback on the victim in order to lower or destroy its reputation. This attack is known as bad mouthing attack. In another scenario, a number of entities agree to give positive feedback on an entity (often with adversarial intentions). This attack is known as ballot stuffing. Both attack types can significantly deteriorate the performances of the network. The existing solutions for coping with these attacks are mainly concentrated on prevention techniques. In this work, we propose a solution that detects and isolates the abovementioned attackers, impeding them in this way to further spread their malicious activity. The approach is based on detecting outliers using clustering, in this case self-organizing maps. An important advantage of this approach is that we have no restrictions on training data, and thus there is no need for any data pre-processing. Testing results demonstrate the capability of the approach in detecting both bad mouthing and ballot stuffing attack in various scenarios.
Resumo:
É importante que as redes elétricas tenham altos índices de confiabilidade, de forma a se manter a agilidade e a manutenção ideais para um melhor funcionamento. Por outro lado, o crescimento inesperado da carga, falhas em equipamentos e uma parametrização inadequada das funções de proteção tornam a análise de eventos de proteção mais complexas e demoradas. Além disso, a quantidade de informações que pode ser obtida de relés digitais modernos tem crescido constantemente. Para que seja possível uma rápida tomada de decisão e manutenção, esse projeto de pesquisa teve como objetivo a implementação de um sistema completo de diagnóstico que é ativado automaticamente quando um evento de proteção ocorrer. As informações a serem analisadas são obtidas de uma base de dados e de relés de proteção, via protocolo de comunicação IEC 61850 e arquivos de oscilografia. O trabalho aborda o sistema Smart Grid completo incluindo: a aquisição de dados nos relés, detalhando o sistema de comunicação desenvolvido através de um software com um cliente IEC61850 e um servidor OPC e um software com um cliente OPC, que é ativado por eventos configurados para dispará-lo (por exemplo, atuação da proteção); o sistema de pré-tratamento de dados, onde os dados provenientes dos relés e equipamentos de proteção são filtrados, pré-processados e formatados; e o sistema de diagnóstico. Um banco de dados central mantém atualizados os dados de todas essas etapas. O sistema de diagnóstico utiliza algoritmos convencionais e técnicas de inteligência artificial, em particular, um sistema especialista. O sistema especialista foi desenvolvido para lidar com diferentes conjuntos de dados de entrada e com uma possível falta de dados, sempre garantindo a entrega de diagnósticos. Foram realizados testes e simulações para curtos-circuitos (trifásico, dupla-fase, dupla-fase-terra e fase-terra) em alimentadores, transformadores e barras de uma subestação. Esses testes incluíram diferentes estados do sistema de proteção (funcionamento correto e impróprio). O sistema se mostrou totalmente eficaz tanto no caso de disponibilidade completa quanto parcial de informações, sempre fornecendo um diagnóstico do curto-circuito e analisando o funcionamento das funções de proteção da subestação. Dessa forma, possibilita-se uma manutenção muito mais eficiente pelas concessionárias de energia, principalmente no que diz respeito à prevenção de defeitos em equipamentos, rápida resposta a problemas, e necessidade de reparametrização das funções de proteção. O sistema foi instalado com sucesso em uma subestação de distribuição da Companhia Paulista de Força e Luz.
Resumo:
Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.
Resumo:
As introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM.
Resumo:
As introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM.
Resumo:
Wind energy is one of the most promising and fast growing sector of energy production. Wind is ecologically friendly and relatively cheap energy resource available for development in practically all corners of the world (where only the wind blows). Today wind power gained broad development in the Scandinavian countries. Three important challenges concerning sustainable development, i.e. energy security, climate change and energy access make a compelling case for large-scale utilization of wind energy. In Finland, according to the climate and energy strategy, accepted in 2008, the total consumption of electricity generated by means of wind farms by 2020, should reach 6 - 7% of total consumption in the country [1]. The main challenges associated with wind energy production are harsh operational conditions that often accompany the turbine operation in the climatic conditions of the north and poor accessibility for maintenance and service. One of the major problems that require a solution is the icing of turbine structures. Icing reduces the performance of wind turbines, which in the conditions of a long cold period, can significantly affect the reliability of power supply. In order to predict and control power performance, the process of ice accretion has to be carefully tracked. There are two ways to detect icing – directly or indirectly. The first way applies to the special ice detection instruments. The second one is using indirect characteristics of turbine performance. One of such indirect methods for ice detection and power loss estimation has been proposed and used in this paper. The results were compared to the results directly gained from the ice sensors. The data used was measured in Muukko wind farm, southeast Finland during a project 'Wind power in cold climate and complex terrain'. The project was carried out in 9/2013 - 8/2015 with the partners Lappeenranta university of technology, Alstom renovables España S.L., TuuliMuukko, and TuuliSaimaa.
Resumo:
Riding the wave of recent groundbreaking achievements, artificial intelligence (AI) is currently the buzzword on everybody’s lips and, allowing algorithms to learn from historical data, Machine Learning (ML) emerged as its pinnacle. The multitude of algorithms, each with unique strengths and weaknesses, highlights the absence of a universal solution and poses a challenging optimization problem. In response, automated machine learning (AutoML) navigates vast search spaces within minimal time constraints. By lowering entry barriers, AutoML emerged as promising the democratization of AI, yet facing some challenges. In data-centric AI, the discipline of systematically engineering data used to build an AI system, the challenge of configuring data pipelines is rather simple. We devise a methodology for building effective data pre-processing pipelines in supervised learning as well as a data-centric AutoML solution for unsupervised learning. In human-centric AI, many current AutoML tools were not built around the user but rather around algorithmic ideas, raising ethical and social bias concerns. We contribute by deploying AutoML tools aiming at complementing, instead of replacing, human intelligence. In particular, we provide solutions for single-objective and multi-objective optimization and showcase the challenges and potential of novel interfaces featuring large language models. Finally, there are application areas that rely on numerical simulators, often related to earth observations, they tend to be particularly high-impact and address important challenges such as climate change and crop life cycles. We commit to coupling these physical simulators with (Auto)ML solutions towards a physics-aware AI. Specifically, in precision farming, we design a smart irrigation platform that: allows real-time monitoring of soil moisture, predicts future moisture values, and estimates water demand to schedule the irrigation.
Resumo:
PAMELA (Phased Array Monitoring for Enhanced Life Assessment) SHMTM System is an integrated embedded ultrasonic guided waves based system consisting of several electronic devices and one system manager controller. The data collected by all PAMELA devices in the system must be transmitted to the controller, who will be responsible for carrying out the advanced signal processing to obtain SHM maps. PAMELA devices consist of hardware based on a Virtex 5 FPGA with a PowerPC 440 running an embedded Linux distribution. Therefore, PAMELA devices, in addition to the capability of performing tests and transmitting the collected data to the controller, have the capability of perform local data processing or pre-processing (reduction, normalization, pattern recognition, feature extraction, etc.). Local data processing decreases the data traffic over the network and allows CPU load of the external computer to be reduced. Even it is possible that PAMELA devices are running autonomously performing scheduled tests, and only communicates with the controller in case of detection of structural damages or when programmed. Each PAMELA device integrates a software management application (SMA) that allows to the developer downloading his own algorithm code and adding the new data processing algorithm to the device. The development of the SMA is done in a virtual machine with an Ubuntu Linux distribution including all necessary software tools to perform the entire cycle of development. Eclipse IDE (Integrated Development Environment) is used to develop the SMA project and to write the code of each data processing algorithm. This paper presents the developed software architecture and describes the necessary steps to add new data processing algorithms to SMA in order to increase the processing capabilities of PAMELA devices.An example of basic damage index estimation using delay and sum algorithm is provided.
Resumo:
This work proposes a method based on both preprocessing and data mining with the objective of identify harmonic current sources in residential consumers. In addition, this methodology can also be applied to identify linear and nonlinear loads. It should be emphasized that the entire database was obtained through laboratory essays, i.e., real data were acquired from residential loads. Thus, the residential system created in laboratory was fed by a configurable power source and in its output were placed the loads and the power quality analyzers (all measurements were stored in a microcomputer). So, the data were submitted to pre-processing, which was based on attribute selection techniques in order to minimize the complexity in identifying the loads. A newer database was generated maintaining only the attributes selected, thus, Artificial Neural Networks were trained to realized the identification of loads. In order to validate the methodology proposed, the loads were fed both under ideal conditions (without harmonics), but also by harmonic voltages within limits pre-established. These limits are in accordance with IEEE Std. 519-1992 and PRODIST (procedures to delivery energy employed by Brazilian`s utilities). The results obtained seek to validate the methodology proposed and furnish a method that can serve as alternative to conventional methods.
Resumo:
This paper presents a methodology supported on the data base knowledge discovery process (KDD), in order to find out the failure probability of electrical equipments’, which belong to a real electrical high voltage network. Data Mining (DM) techniques are used to discover a set of outcome failure probability and, therefore, to extract knowledge concerning to the unavailability of the electrical equipments such us power transformers and high-voltages power lines. The framework includes several steps, following the analysis of the real data base, the pre-processing data, the application of DM algorithms, and finally, the interpretation of the discovered knowledge. To validate the proposed methodology, a case study which includes real databases is used. This data have a heavy uncertainty due to climate conditions for this reason it was used fuzzy logic to determine the set of the electrical components failure probabilities in order to reestablish the service. The results reflect an interesting potential of this approach and encourage further research on the topic.