971 resultados para Data filtering
Resumo:
In this chapter we present the relevant mathematical background to address two well defined signal and image processing problems. Namely, the problem of structured noise filtering and the problem of interpolation of missing data. The former is addressed by recourse to oblique projection based techniques whilst the latter, which can be considered equivalent to impulsive noise filtering, is tackled by appropriate interpolation methods.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as “histogram binning” inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. ^ Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. ^ The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. ^ These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. ^ In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation. ^
Resumo:
We develop a new autoregressive conditional process to capture both the changes and the persistency of the intraday seasonal (U-shape) pattern of volatility in essay 1. Unlike other procedures, this approach allows for the intraday volatility pattern to change over time without the filtering process injecting a spurious pattern of noise into the filtered series. We show that prior deterministic filtering procedures are special cases of the autoregressive conditional filtering process presented here. Lagrange multiplier tests prove that the stochastic seasonal variance component is statistically significant. Specification tests using the correlogram and cross-spectral analyses prove the reliability of the autoregressive conditional filtering process. In essay 2 we develop a new methodology to decompose return variance in order to examine the informativeness embedded in the return series. The variance is decomposed into the information arrival component and the noise factor component. This decomposition methodology differs from previous studies in that both the informational variance and the noise variance are time-varying. Furthermore, the covariance of the informational component and the noisy component is no longer restricted to be zero. The resultant measure of price informativeness is defined as the informational variance divided by the total variance of the returns. The noisy rational expectations model predicts that uninformed traders react to price changes more than informed traders, since uninformed traders cannot distinguish between price changes caused by information arrivals and price changes caused by noise. This hypothesis is tested in essay 3 using intraday data with the intraday seasonal volatility component removed, as based on the procedure in the first essay. The resultant seasonally adjusted variance series is decomposed into components caused by unexpected information arrivals and by noise in order to examine informativeness.
Resumo:
Airborne Light Detection and Ranging (LIDAR) technology has become the primary method to derive high-resolution Digital Terrain Models (DTMs), which are essential for studying Earth's surface processes, such as flooding and landslides. The critical step in generating a DTM is to separate ground and non-ground measurements in a voluminous point LIDAR dataset, using a filter, because the DTM is created by interpolating ground points. As one of widely used filtering methods, the progressive morphological (PM) filter has the advantages of classifying the LIDAR data at the point level, a linear computational complexity, and preserving the geometric shapes of terrain features. The filter works well in an urban setting with a gentle slope and a mixture of vegetation and buildings. However, the PM filter often removes ground measurements incorrectly at the topographic high area, along with large sizes of non-ground objects, because it uses a constant threshold slope, resulting in "cut-off" errors. A novel cluster analysis method was developed in this study and incorporated into the PM filter to prevent the removal of the ground measurements at topographic highs. Furthermore, to obtain the optimal filtering results for an area with undulating terrain, a trend analysis method was developed to adaptively estimate the slope-related thresholds of the PM filter based on changes of topographic slopes and the characteristics of non-terrain objects. The comparison of the PM and generalized adaptive PM (GAPM) filters for selected study areas indicates that the GAPM filter preserves the most "cut-off" points removed incorrectly by the PM filter. The application of the GAPM filter to seven ISPRS benchmark datasets shows that the GAPM filter reduces the filtering error by 20% on average, compared with the method used by the popular commercial software TerraScan. The combination of the cluster method, adaptive trend analysis, and the PM filter allows users without much experience in processing LIDAR data to effectively and efficiently identify ground measurements for the complex terrains in a large LIDAR data set. The GAPM filter is highly automatic and requires little human input. Therefore, it can significantly reduce the effort of manually processing voluminous LIDAR measurements.
Resumo:
This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.
Resumo:
We develop a new autoregressive conditional process to capture both the changes and the persistency of the intraday seasonal (U-shape) pattern of volatility in essay 1. Unlike other procedures, this approach allows for the intraday volatility pattern to change over time without the filtering process injecting a spurious pattern of noise into the filtered series. We show that prior deterministic filtering procedures are special cases of the autoregressive conditional filtering process presented here. Lagrange multiplier tests prove that the stochastic seasonal variance component is statistically significant. Specification tests using the correlogram and cross-spectral analyses prove the reliability of the autoregressive conditional filtering process. In essay 2 we develop a new methodology to decompose return variance in order to examine the informativeness embedded in the return series. The variance is decomposed into the information arrival component and the noise factor component. This decomposition methodology differs from previous studies in that both the informational variance and the noise variance are time-varying. Furthermore, the covariance of the informational component and the noisy component is no longer restricted to be zero. The resultant measure of price informativeness is defined as the informational variance divided by the total variance of the returns. The noisy rational expectations model predicts that uninformed traders react to price changes more than informed traders, since uninformed traders cannot distinguish between price changes caused by information arrivals and price changes caused by noise. This hypothesis is tested in essay 3 using intraday data with the intraday seasonal volatility component removed, as based on the procedure in the first essay. The resultant seasonally adjusted variance series is decomposed into components caused by unexpected information arrivals and by noise in order to examine informativeness.
Resumo:
The amount and quality of available biomass is a key factor for the sustainable livestock industry and agricultural management related decision making. Globally 31.5% of land cover is grassland while 80% of Ireland’s agricultural land is grassland. In Ireland, grasslands are intensively managed and provide the cheapest feed source for animals. This dissertation presents a detailed state of the art review of satellite remote sensing of grasslands, and the potential application of optical (Moderate–resolution Imaging Spectroradiometer (MODIS)) and radar (TerraSAR-X) time series imagery to estimate the grassland biomass at two study sites (Moorepark and Grange) in the Republic of Ireland using both statistical and state of the art machine learning algorithms. High quality weather data available from the on-site weather station was also used to calculate the Growing Degree Days (GDD) for Grange to determine the impact of ancillary data on biomass estimation. In situ and satellite data covering 12 years for the Moorepark and 6 years for the Grange study sites were used to predict grassland biomass using multiple linear regression, Neuro Fuzzy Inference Systems (ANFIS) models. The results demonstrate that a dense (8-day composite) MODIS image time series, along with high quality in situ data, can be used to retrieve grassland biomass with high performance (R2 = 0:86; p < 0:05, RMSE = 11.07 for Moorepark). The model for Grange was modified to evaluate the synergistic use of vegetation indices derived from remote sensing time series and accumulated GDD information. As GDD is strongly linked to the plant development, or phonological stage, an improvement in biomass estimation would be expected. It was observed that using the ANFIS model the biomass estimation accuracy increased from R2 = 0:76 (p < 0:05) to R2 = 0:81 (p < 0:05) and the root mean square error was reduced by 2.72%. The work on the application of optical remote sensing was further developed using a TerraSAR-X Staring Spotlight mode time series over the Moorepark study site to explore the extent to which very high resolution Synthetic Aperture Radar (SAR) data of interferometrically coherent paddocks can be exploited to retrieve grassland biophysical parameters. After filtering out the non-coherent plots it is demonstrated that interferometric coherence can be used to retrieve grassland biophysical parameters (i. e., height, biomass), and that it is possible to detect changes due to the grass growth, and grazing and mowing events, when the temporal baseline is short (11 days). However, it not possible to automatically uniquely identify the cause of these changes based only on the SAR backscatter and coherence, due to the ambiguity caused by tall grass laid down due to the wind. Overall, the work presented in this dissertation has demonstrated the potential of dense remote sensing and weather data time series to predict grassland biomass using machine-learning algorithms, where high quality ground data were used for training. At present a major limitation for national scale biomass retrieval is the lack of spatial and temporal ground samples, which can be partially resolved by minor modifications in the existing PastureBaseIreland database by adding the location and extent ofeach grassland paddock in the database. As far as remote sensing data requirements are concerned, MODIS is useful for large scale evaluation but due to its coarse resolution it is not possible to detect the variations within the fields and between the fields at the farm scale. However, this issue will be resolved in terms of spatial resolution by the Sentinel-2 mission, and when both satellites (Sentinel-2A and Sentinel-2B) are operational the revisit time will reduce to 5 days, which together with Landsat-8, should enable sufficient cloud-free data for operational biomass estimation at a national scale. The Synthetic Aperture Radar Interferometry (InSAR) approach is feasible if there are enough coherent interferometric pairs available, however this is difficult to achieve due to the temporal decorrelation of the signal. For repeat-pass InSAR over a vegetated area even an 11 days temporal baseline is too large. In order to achieve better coherence a very high resolution is required at the cost of spatial coverage, which limits its scope for use in an operational context at a national scale. Future InSAR missions with pair acquisition in Tandem mode will minimize the temporal decorrelation over vegetation areas for more focused studies. The proposed approach complements the current paradigm of Big Data in Earth Observation, and illustrates the feasibility of integrating data from multiple sources. In future, this framework can be used to build an operational decision support system for retrieval of grassland biophysical parameters based on data from long term planned optical missions (e. g., Landsat, Sentinel) that will ensure the continuity of data acquisition. Similarly, Spanish X-band PAZ and TerraSAR-X2 missions will ensure the continuity of TerraSAR-X and COSMO-SkyMed.
Resumo:
Site 1103 was one of a transect of three sites drilled across the Antarctic Peninsula continental shelf during Leg 178. The aim of drilling on the shelf was to determine the age of the sedimentary sequences and to ground truth previous interpretations of the depositional environment (i.e., topsets and foresets) of progradational seismostratigraphic sequences S1, S2, S3, and S4. The ultimate objective was to obtain a better understanding of the history of glacial advances and retreats in this west Antarctic margin. Drilling the topsets of the progradational wedge (0-247 m below seafloor [mbsf]), which consist of unsorted and unconsolidated materials of seismic Unit S1, was very unfavorable, resulting in very low (2.3%) core recovery. Recovery improved (34%) below 247 mbsf, corresponding to sediments of seismic Unit S3, which have a consolidated matrix. Logs were only obtained from the interval between 75 and 244 mbsf, and inconsistencies on the automatic analog picking of the signals received from the sonic log at the array and at the two other receivers prevented accurate shipboard time-depth conversions. This, in turn, limited the capacity for making seismic stratigraphic interpretations at this site and regionally. This study is an attempt to compile all available data sources, perform quality checks, and introduce nonstandard processing techniques for the logging data obtained to arrive at a reliable and continuous depth vs. velocity profile. We defined 13 data categories using differential traveltime information. Polynomial exclusion techniques with various orders and low-pass filtering reduced the noise of the initial data pool and produced a definite velocity depth profile that is synchronous with the resistivity logging data. A comparison of the velocity profile produced with various other logs of Site 1103 further validates the presented data. All major logging units are expressed within the new velocity data. A depth-migrated section with the new velocity data is presented together with the original time section and initial depth estimates published within the Leg 178 Initial Reports volume. The presented data confirms the location of the shelf unconformity at 222 ms two-way traveltime (TWT), or 243 mbsf, and allows its seismic identification as a strong negative and subsequent positive reflection.
Resumo:
Power system engineers face a double challenge: to operate electric power systems within narrow stability and security margins, and to maintain high reliability. There is an acute need to better understand the dynamic nature of power systems in order to be prepared for critical situations as they arise. Innovative measurement tools, such as phasor measurement units, can capture not only the slow variation of the voltages and currents but also the underlying oscillations in a power system. Such dynamic data accessibility provides us a strong motivation and a useful tool to explore dynamic-data driven applications in power systems. To fulfill this goal, this dissertation focuses on the following three areas: Developing accurate dynamic load models and updating variable parameters based on the measurement data, applying advanced nonlinear filtering concepts and technologies to real-time identification of power system models, and addressing computational issues by implementing the balanced truncation method. By obtaining more realistic system models, together with timely updated parameters and stochastic influence consideration, we can have an accurate portrait of the ongoing phenomena in an electrical power system. Hence we can further improve state estimation, stability analysis and real-time operation.
Resumo:
Nearest neighbour collaborative filtering (NNCF) algorithms are commonly used in multimedia recommender systems to suggest media items based on the ratings of users with similar preferences. However, the prediction accuracy of NNCF algorithms is affected by the reduced number of items – the subset of items co-rated by both users – typically used to determine the similarity between pairs of users. In this paper, we propose a different approach, which substantially enhances the accuracy of the neighbour selection process – a user-based CF (UbCF) with semantic neighbour discovery (SND). Our neighbour discovery methodology, which assesses pairs of users by taking into account all the items rated at least by one of the users instead of just the set of co-rated items, semantically enriches this enlarged set of items using linked data and, finally, applies the Collinearity and Proximity Similarity metric (CPS), which combines the cosine similarity with Chebyschev distance dissimilarity metric. We tested the proposed SND against the Pearson Correlation neighbour discovery algorithm off-line, using the HetRec data set, and the results show a clear improvement in terms of accuracy and execution time for the predicted recommendations.
Resumo:
The conjugate gradient is the most popular optimization method for solving large systems of linear equations. In a system identification problem, for example, where very large impulse response is involved, it is necessary to apply a particular strategy which diminishes the delay, while improving the convergence time. In this paper we propose a new scheme which combines frequency-domain adaptive filtering with a conjugate gradient technique in order to solve a high order multichannel adaptive filter, while being delayless and guaranteeing a very short convergence time.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.
Resumo:
Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.