947 resultados para Data pre-processing


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Behavioral biometrics is one of the areas with growing interest within the biosignal research community. A recent trend in the field is ECG-based biometrics, where electrocardiographic (ECG) signals are used as input to the biometric system. Previous work has shown this to be a promising trait, with the potential to serve as a good complement to other existing, and already more established modalities, due to its intrinsic characteristics. In this paper, we propose a system for ECG biometrics centered on signals acquired at the subject's hand. Our work is based on a previously developed custom, non-intrusive sensing apparatus for data acquisition at the hands, and involved the pre-processing of the ECG signals, and evaluation of two classification approaches targeted at real-time or near real-time applications. Preliminary results show that this system leads to competitive results both for authentication and identification, and further validate the potential of ECG signals as a complementary modality in the toolbox of the biometric system designer.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

XML Schema is one of the most used specifications for defining types of XML documents. It provides an extensive set of primitive data types, ways to extend and reuse definitions and an XML syntax that simplifies automatic manipulation. However, many features that make XML Schema Definitions (XSD) so interesting also make them rather cumbersome to read. Several tools to visualize and browse schema definitions have been proposed to cope with this issue. The novel approach proposed in this paper is to base XSD visualization and navigation on the XML document itself, using solely the web browser, without requiring a pre-processing step or an intermediate representation. We present the design and implementation of a web-based XML Schema browser called schem@Doc that operates over the XSD file itself. With this approach, XSD visualization is synchronized with the source file and always reflects its current state. This tool fits well in the schema development process and is easy to integrate in web repositories containing large numbers of XSD files.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertation presented to obtain the degree of Doctor of Philosophy in Electrical Engineering, speciality on Perceptional Systems, by the Universidade Nova de Lisboa, Faculty of Sciences and Technology

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A presente tese tem como principal objetivo a comparação entre dois software de CFD (Computer Fluid Dynamics) na simulação de escoamentos atmosféricos com vista à sua aplicação ao estudo e caracterização de parques eólicos. O software em causa são o OpenFOAM (Open Field Operation and Manipulation) - freeware open source genérico - e o Windie, ferramenta especializada no estudo de parques eólicos. Para este estudo foi usada a topografia circundante a um parque eólico situado na Grécia, do qual dispúnhamos de resultados de uma campanha de medições efetuada previamente. Para este _m foram usados procedimentos e ferramentas complementares ao Open-FOAM, desenvolvidas por da Silva Azevedo (2013) adequados para a realização do pré-processamento, extração de dados e pós-processamento, aplicados na simulação do caso pratico. As condições de cálculo usadas neste trabalho limitaram-se às usadas na simulação de escoamentos previamente simulados pelo software Windie: condições de escoamento turbulento, estacionário, incompressível e em regime não estratificado, com o recurso ao modelo de turbulência RaNS (Reynolds-averaged Navier-Stokes ) k - E atmosférico. Os resultados de ambas as simulações - OpenFOAM e Windie - foram comparados com resultados de uma campanha de medições, através dos valores de speed-up e intensidade turbulenta nas posições dos anemómetros.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the last few years, we have observed an exponential increasing of the information systems, and parking information is one more example of them. The needs of obtaining reliable and updated information of parking slots availability are very important in the goal of traffic reduction. Also parking slot prediction is a new topic that has already started to be applied. San Francisco in America and Santander in Spain are examples of such projects carried out to obtain this kind of information. The aim of this thesis is the study and evaluation of methodologies for parking slot prediction and the integration in a web application, where all kind of users will be able to know the current parking status and also future status according to parking model predictions. The source of the data is ancillary in this work but it needs to be understood anyway to understand the parking behaviour. Actually, there are many modelling techniques used for this purpose such as time series analysis, decision trees, neural networks and clustering. In this work, the author explains the best techniques at this work, analyzes the result and points out the advantages and disadvantages of each one. The model will learn the periodic and seasonal patterns of the parking status behaviour, and with this knowledge it can predict future status values given a date. The data used comes from the Smart Park Ontinyent and it is about parking occupancy status together with timestamps and it is stored in a database. After data acquisition, data analysis and pre-processing was needed for model implementations. The first test done was with the boosting ensemble classifier, employed over a set of decision trees, created with C5.0 algorithm from a set of training samples, to assign a prediction value to each object. In addition to the predictions, this work has got measurements error that indicates the reliability of the outcome predictions being correct. The second test was done using the function fitting seasonal exponential smoothing tbats model. Finally as the last test, it has been tried a model that is actually a combination of the previous two models, just to see the result of this combination. The results were quite good for all of them, having error averages of 6.2, 6.6 and 5.4 in vacancies predictions for the three models respectively. This means from a parking of 47 places a 10% average error in parking slot predictions. This result could be even better with longer data available. In order to make this kind of information visible and reachable from everyone having a device with internet connection, a web application was made for this purpose. Beside the data displaying, this application also offers different functions to improve the task of searching for parking. The new functions, apart from parking prediction, were: - Park distances from user location. It provides all the distances to user current location to the different parks in the city. - Geocoding. The service for matching a literal description or an address to a concrete location. - Geolocation. The service for positioning the user. - Parking list panel. This is not a service neither a function, is just a better visualization and better handling of the information.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Doctoral Thesis in Information Systems and Technologies Area of Information Systems and Technology

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The differentiation between benign and malignant focal liver lesions plays an important role in diagnosis of liver disease and therapeutic planning of local or general disease. This differentiation, based on characterization, relies on the observation of the dynamic vascular patterns (DVP) of lesions with respect to adjacent parenchyma, and may be assessed during contrast-enhanced ultrasound imaging after a bolus injection. For instance, hemangiomas (i.e., benign lesions) exhibit hyper-enhanced signatures over time, whereas metastases (i.e., malignant lesions) frequently present hyperenhanced foci during the arterial phase and always become hypo-enhanced afterwards. The objective of this work was to develop a new parametric imaging technique, aimed at mapping the DVP signatures into a single image called a DVP parametric image, conceived as a diagnostic aid tool for characterizing lesion types. The methodology consisted in processing a time sequence of images (DICOM video data) using four consecutive steps: (1) pre-processing combining image motion correction and linearization to derive an echo-power signal, in each pixel, proportional to local contrast agent concentration over time; (2) signal modeling, by means of a curve-fitting optimization, to compute a difference signal in each pixel, as the subtraction of adjacent parenchyma kinetic from the echopower signal; (3) classification of difference signals; and (4) parametric image rendering to represent classified pixels as a support for diagnosis. DVP parametric imaging was the object of a clinical assessment on a total of 146 lesions, imaged using different medical ultrasound systems. The resulting sensitivity and specificity were 97% and 91%, respectively, which compare favorably with scores of 81 to 95% and 80 to 95% reported in medical literature for sensitivity and specificity, respectively.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Conventional magnetic resonance imaging (MRI) techniques are highly sensitive to detect multiple sclerosis (MS) plaques, enabling a quantitative assessment of inflammatory activity and lesion load. In quantitative analyses of focal lesions, manual or semi-automated segmentations have been widely used to compute the total number of lesions and the total lesion volume. These techniques, however, are both challenging and time-consuming, being also prone to intra-observer and inter-observer variability.Aim: To develop an automated approach to segment brain tissues and MS lesions from brain MRI images. The goal is to reduce the user interaction and to provide an objective tool that eliminates the inter- and intra-observer variability.Methods: Based on the recent methods developed by Souplet et al. and de Boer et al., we propose a novel pipeline which includes the following steps: bias correction, skull stripping, atlas registration, tissue classification, and lesion segmentation. After the initial pre-processing steps, a MRI scan is automatically segmented into 4 classes: white matter (WM), grey matter (GM), cerebrospinal fluid (CSF) and partial volume. An expectation maximisation method which fits a multivariate Gaussian mixture model to T1-w, T2-w and PD-w images is used for this purpose. Based on the obtained tissue masks and using the estimated GM mean and variance, we apply an intensity threshold to the FLAIR image, which provides the lesion segmentation. With the aim of improving this initial result, spatial information coming from the neighbouring tissue labels is used to refine the final lesion segmentation.Results:The experimental evaluation was performed using real data sets of 1.5T and the corresponding ground truth annotations provided by expert radiologists. The following values were obtained: 64% of true positive (TP) fraction, 80% of false positive (FP) fraction, and an average surface distance of 7.89 mm. The results of our approach were quantitatively compared to our implementations of the works of Souplet et al. and de Boer et al., obtaining higher TP and lower FP values.Conclusion: Promising MS lesion segmentation results have been obtained in terms of TP. However, the high number of FP which is still a well-known problem of all the automated MS lesion segmentation approaches has to be improved in order to use them for the standard clinical practice. Our future work will focus on tackling this issue.