971 resultados para Data filtering
Resumo:
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and foreground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.
Resumo:
Esta Tesis presenta un nuevo método para filtrar errores en bases de datos multidimensionales. Este método no precisa ninguna información a priori sobre la naturaleza de los errores. En concreto, los errrores no deben ser necesariamente pequeños, ni de distribución aleatoria ni tener media cero. El único requerimiento es que no estén correlados con la información limpia propia de la base de datos. Este nuevo método se basa en una extensión mejorada del método básico de reconstrucción de huecos (capaz de reconstruir la información que falta de una base de datos multidimensional en posiciones conocidas) inventado por Everson y Sirovich (1995). El método de reconstrucción de huecos mejorado ha evolucionado como un método de filtrado de errores de dos pasos: en primer lugar, (a) identifica las posiciones en la base de datos afectadas por los errores y después, (b) reconstruye la información en dichas posiciones tratando la información de éstas como información desconocida. El método resultante filtra errores O(1) de forma eficiente, tanto si son errores aleatorios como sistemáticos e incluso si su distribución en la base de datos está concentrada o esparcida por ella. Primero, se ilustra el funcionamiento delmétodo con una base de datosmodelo bidimensional, que resulta de la dicretización de una función transcendental. Posteriormente, se presentan algunos casos prácticos de aplicación del método a dos bases de datos tridimensionales aerodinámicas que contienen la distribución de presiones sobre un ala a varios ángulos de ataque. Estas bases de datos resultan de modelos numéricos calculados en CFD. ABSTRACT A method is presented to filter errors out in multidimensional databases. The method does not require any a priori information about the nature the errors. In particular, the errors need not to be small, neither random, nor exhibit zero mean. Instead, they are only required to be relatively uncorrelated to the clean information contained in the database. The method is based on an improved extension of a seminal iterative gappy reconstruction method (able to reconstruct lost information at known positions in the database) due to Everson and Sirovich (1995). The improved gappy reconstruction method is evolved as an error filtering method in two steps, since it is adapted to first (a) identify the error locations in the database and then (b) reconstruct the information in these locations by treating the associated data as gappy data. The resultingmethod filters out O(1) errors in an efficient fashion, both when these are random and when they are systematic, and also both when they concentrated and when they are spread along the database. The performance of the method is first illustrated using a two-dimensional toymodel database resulting fromdiscretizing a transcendental function and then tested on two CFD-calculated, three-dimensional aerodynamic databases containing the pressure coefficient on the surface of a wing for varying values of the angle of attack. A more general performance analysis of the method is presented with the intention of quantifying the randomness factor the method admits maintaining a correct performance and secondly, quantifying the size of error the method can detect. Lastly, some improvements of the method are proposed with their respective verification.
Resumo:
We describe the use of singular value decomposition in transforming genome-wide expression data from genes × arrays space to reduced diagonalized “eigengenes” × “eigenarrays” space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.
Resumo:
3D sensors provides valuable information for mobile robotic tasks like scene classification or object recognition, but these sensors often produce noisy data that makes impossible applying classical keypoint detection and feature extraction techniques. Therefore, noise removal and downsampling have become essential steps in 3D data processing. In this work, we propose the use of a 3D filtering and down-sampling technique based on a Growing Neural Gas (GNG) network. GNG method is able to deal with outliers presents in the input data. These features allows to represent 3D spaces, obtaining an induced Delaunay Triangulation of the input space. Experiments show how the state-of-the-art keypoint detectors improve their performance using GNG output representation as input data. Descriptors extracted on improved keypoints perform better matching in robotics applications as 3D scene registration.
Resumo:
Power line interference is one of the main problems in surface electromyogram signals (EMG) analysis. In this work, a new method based on the stationary wavelet packet transform is proposed to estimate and remove this kind of noise from EMG data records. The performance has been quantitatively evaluated with synthetic noisy signals, obtaining good results independently from the signal to noise ratio (SNR). For the analyzed cases, the obtained results show that the correlation coefficient is around 0.99, the energy respecting to the pure EMG signal is 98–104%, the SNR is between 16.64 and 20.40 dB and the mean absolute error (MAE) is in the range of −69.02 and −65.31 dB. It has been also applied on 18 real EMG signals, evaluating the percentage of energy respecting to the noisy signals. The proposed method adjusts the reduction level to the amplitude of each harmonic present in the analyzed noisy signals (synthetic and real), reducing the harmonics with no alteration of the desired signal.
Resumo:
A MATLAB-based computer code has been developed for the simultaneous wavelet analysis and filtering of several environmental time series, particularly focused on the analyses of cave monitoring data. The continuous wavelet transform, the discrete wavelet transform and the discrete wavelet packet transform have been implemented to provide a fast and precise time–period examination of the time series at different period bands. Moreover, statistic methods to examine the relation between two signals have been included. Finally, the entropy of curves and splines based methods have also been developed for segmenting and modeling the analyzed time series. All these methods together provide a user-friendly and fast program for the environmental signal analysis, with useful, practical and understandable results.
Resumo:
Information of crop phenology is essential for evaluating crop productivity. In a previous work, we determined phenological stages with remote sensing data using a dynamic system framework and an extended Kalman filter (EKF) approach. In this paper, we demonstrate that the particle filter is a more reliable method to infer any phenological stage compared to the EKF. The improvements achieved with this approach are discussed. In addition, this methodology enables the estimation of key cultivation dates, thus providing a practical product for many applications. The dates of some important stages, as the sowing date and the day when the crop reaches the panicle initiation stage, have been chosen to show the potential of this technique.
Resumo:
The continuous plankton recorder (CPR) survey is an upper layer plankton monitoring program that has regularly collected samples, at monthly intervals, in the North Atlantic and adjacent seas since 1946. Water from approximately 6 m depth enters the CPR through a small aperture at the front of the sampler and travels down a tunnel where it passes through a silk filtering mesh of 270 µm before exiting at the back of the CPR. The plankton filtered on the silk is analyzed in sections corresponding to 10 nautical miles (approx. 3 m**3 of seawater filtered) and the plankton microscopically identified (Richardson et al., 2006 and reference therein). In the present study we used the CPR data to investigate the current basin scale distribution of C. finmarchicus (C5-C6), C. helgolandicus (C5-C6), C. hyperboreus (C5-C6), Pseudocalanus spp. (C6), Oithona spp. (C1-C6), total Euphausiida, total Thecosomata and the presence/absence of Cnidaria and the Phytoplankton Colour Index (PCI). The PCI, which is a visual assessment of the greenness of the silk, is used as an indicator of the distribution of total phytoplankton biomass across the Atlantic basin (Batten et al., 2003). Monthly data collected between 2000 and 2009 were gridded using the inverse-distance interpolation method, in which the interpolated values were the nodes of a 2 degree by 2 degree grid. The resulting twelve monthly matrices were then averaged within the year and in the case of the zooplankton the data were log-transformed (i.e. log10 (x+1).
Resumo:
Collaborative filtering is regarded as one of the most promising recommendation algorithms. The item-based approaches for collaborative filtering identify the similarity between two items by comparing users' ratings on them. In these approaches, ratings produced at different times are weighted equally. That is to say, changes in user purchase interest are not taken into consideration. For example, an item that was rated recently by a user should have a bigger impact on the prediction of future user behaviour than an item that was rated a long time ago. In this paper, we present a novel algorithm to compute the time weights for different items in a manner that will assign a decreasing weight to old data. More specifically, the users' purchase habits vary. Even the same user has quite different attitudes towards different items. Our proposed algorithm uses clustering to discriminate between different kinds of items. To each item cluster, we trace each user's purchase interest change and introduce a personalized decay factor according to the user own purchase behaviour. Empirical studies have shown that our new algorithm substantially improves the precision of item-based collaborative filtering without introducing higher order computational complexity.
Resumo:
The paper describes two new transport layer (TCP) options and an expanded transport layer queuing strategy that facilitate three functions that are fundamental to the dispatching-based clustered service. A transport layer option has been developed to facilitate. the use of client wait time data within the service request processing of the cluster. A second transport layer option has been developed to facilitate the redirection of service requests by the cluster dispatcher to the cluster processing member. An expanded transport layer service request queuing strategy facilitates the trust based filtering of incoming service requests so that a graceful degradation of service delivery may be achieved during periods of overload - most dramatically evidenced by distributed denial of service attacks against the clustered service. We describe how these new options and queues have been implemented and successfully tested within the transport layer of the Linux kernel.
Resumo:
O trabalho desenvolvido analisa a Comunicação Social no contexto da internet e delineia novas metodologias de estudo para a área na filtragem de significados no âmbito científico dos fluxos de informação das redes sociais, mídias de notícias ou qualquer outro dispositivo que permita armazenamento e acesso a informação estruturada e não estruturada. No intento de uma reflexão sobre os caminhos, que estes fluxos de informação se desenvolvem e principalmente no volume produzido, o projeto dimensiona os campos de significados que tal relação se configura nas teorias e práticas de pesquisa. O objetivo geral deste trabalho é contextualizar a área da Comunicação Social dentro de uma realidade mutável e dinâmica que é o ambiente da internet e fazer paralelos perante as aplicações já sucedidas por outras áreas. Com o método de estudo de caso foram analisados três casos sob duas chaves conceituais a Web Sphere Analysis e a Web Science refletindo os sistemas de informação contrapostos no quesito discursivo e estrutural. Assim se busca observar qual ganho a Comunicação Social tem no modo de visualizar seus objetos de estudo no ambiente das internet por essas perspectivas. O resultado da pesquisa mostra que é um desafio para o pesquisador da Comunicação Social buscar novas aprendizagens, mas a retroalimentação de informação no ambiente colaborativo que a internet apresenta é um caminho fértil para pesquisa, pois a modelagem de dados ganha corpus analítico quando o conjunto de ferramentas promovido e impulsionado pela tecnologia permite isolar conteúdos e possibilita aprofundamento dos significados e suas relações.
Resumo:
Marr's work offered guidelines on how to investigate vision (the theory - algorithm - implementation distinction), as well as specific proposals on how vision is done. Many of the latter have inevitably been superseded, but the approach was inspirational and remains so. Marr saw the computational study of vision as tightly linked to psychophysics and neurophysiology, but the last twenty years have seen some weakening of that integration. Because feature detection is a key stage in early human vision, we have returned to basic questions about representation of edges at coarse and fine scales. We describe an explicit model in the spirit of the primal sketch, but tightly constrained by psychophysical data. Results from two tasks (location-marking and blur-matching) point strongly to the central role played by second-derivative operators, as proposed by Marr and Hildreth. Edge location and blur are evaluated by finding the location and scale of the Gaussian-derivative `template' that best matches the second-derivative profile (`signature') of the edge. The system is scale-invariant, and accurately predicts blur-matching data for a wide variety of 1-D and 2-D images. By finding the best-fitting scale, it implements a form of local scale selection and circumvents the knotty problem of integrating filter outputs across scales. [Supported by BBSRC and the Wellcome Trust]
Resumo:
Perception of Mach bands may be explained by spatial filtering ('lateral inhibition') that can be approximated by 2nd derivative computation, and several alternative models have been proposed. To distinguish between them, we used a novel set of ‘generalised Gaussian’ images, in which the sharp ramp-plateau junction of the Mach ramp was replaced by smoother transitions. The images ranged from a slightly blurred Mach ramp to a Gaussian edge and beyond, and also included a sine-wave edge. The probability of seeing Mach Bands increased with the (relative) sharpness of the junction, but was largely independent of absolute spatial scale. These data did not fit the predictions of MIRAGE, nor 2nd derivative computation at a single fine scale. In experiment 2, observers used a cursor to mark features on the same set of images. Data on perceived position of Mach bands did not support the local energy model. Perceived width of Mach bands was poorly explained by a single-scale edge detection model, despite its previous success with Mach edges (Wallis & Georgeson, 2009, Vision Research, 49, 1886-1893). A more successful model used separate (odd and even) scale-space filtering for edges and bars, local peak detection to find candidate features, and the MAX operator to compare odd- and even-filter response maps (Georgeson, VSS 2006, Journal of Vision 6(6), 191a). Mach bands are seen when there is a local peak in the even-filter (bar) response map, AND that peak value exceeds corresponding responses in the odd-filter (edge) maps.
Resumo:
This thesis first considers the calibration and signal processing requirements of a neuromagnetometer for the measurement of human visual function. Gradiometer calibration using straight wire grids is examined and optimal grid configurations determined, given realistic constructional tolerances. Simulations show that for gradiometer balance of 1:104 and wire spacing error of 0.25mm the achievable calibration accuracy of gain is 0.3%, of position is 0.3mm and of orientation is 0.6°. Practical results with a 19-channel 2nd-order gradiometer based system exceed this performance. The real-time application of adaptive reference noise cancellation filtering to running-average evoked response data is examined. In the steady state, the filter can be assumed to be driven by a non-stationary step input arising at epoch boundaries. Based on empirical measures of this driving step an optimal progression for the filter time constant is proposed which improves upon fixed time constant filter performance. The incorporation of the time-derivatives of the reference channels was found to improve the performance of the adaptive filtering algorithm by 15-20% for unaveraged data, falling to 5% with averaging. The thesis concludes with a neuromagnetic investigation of evoked cortical responses to chromatic and luminance grating stimuli. The global magnetic field power of evoked responses to the onset of sinusoidal gratings was shown to have distinct chromatic and luminance sensitive components. Analysis of the results, using a single equivalent current dipole model, shows that these components arise from activity within two distinct cortical locations. Co-registration of the resulting current source localisations with MRI shows a chromatically responsive area lying along the midline within the calcarine fissure, possibly extending onto the lingual and cuneal gyri. It is postulated that this area is the human homologue of the primate cortical area V4.