980 resultados para Data errors
Resumo:
Multibeam data were measured during R/V Polarstern cruise ANT-XIX/1 on track lines of about 5,200 NM total length in the Atlantic Ocean during the transit from Bremerhaven to Cape Town. The multibeam sonar system Hydrosweep DS-2 was operated using 59 beams and 90° aperture angle. The refraction correction was achieved utilizing the system's own cross fan calibration. The quality of data might be reduced during bad weather periods. The dataset contains raw data that are not processed and thus may contain errors and blunders in depth and position.
Resumo:
The decomposition technique introduced by Blinder (1973) and Oaxaca (1973) is widely used to study outcome differences between groups. For example, the technique is commonly applied to the analysis of the gender wage gap. However, despite the procedure's frequent use, very little attention has been paid to the issue of estimating the sampling variances of the decomposition components. We therefore suggest an approach that introduces consistent variance estimators for several variants of the decomposition. The accuracy of the new estimators under ideal conditions is illustrated with the results of a Monte Carlo simulation. As a second check, the estimators are compared to bootstrap results obtained using real data. In contrast to previously proposed statistics, the new method takes into account the extra variation imposed by stochastic regressors.
Resumo:
Researchers in ecology commonly use multivariate analyses (e.g. redundancy analysis, canonical correspondence analysis, Mantel correlation, multivariate analysis of variance) to interpret patterns in biological data and relate these patterns to environmental predictors. There has been, however, little recognition of the errors associated with biological data and the influence that these may have on predictions derived from ecological hypotheses. We present a permutational method that assesses the effects of taxonomic uncertainty on the multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites. After each re-assignment of species identities, the multivariate method at stake is run and a parameter of interest is calculated. Consequently, one can estimate a range of plausible values for the parameter of interest under different scenarios of re-assigned species identities. We demonstrate the use of our approach in the calculation of two parameters with an example involving tropical tree species from western Amazonia: 1) the Mantel correlation between compositional similarity and environmental distances between pairs of sites, and; 2) the variance explained by environmental predictors in redundancy analysis (RDA). We also investigated the effects of increasing taxonomic uncertainty (i.e. number of unidentified species), and the taxonomic resolution at which morphospecies are determined (genus-resolution, family-resolution, or fully undetermined species) on the uncertainty range of these parameters. To achieve this, we performed simulations on a tree dataset from southern Mexico by randomly selecting a portion of the species contained in the dataset and classifying them as unidentified at each level of decreasing taxonomic resolution. An analysis of covariance showed that both taxonomic uncertainty and resolution significantly influence the uncertainty range of the resulting parameters. Increasing taxonomic uncertainty expands our uncertainty of the parameters estimated both in the Mantel test and RDA. The effects of increasing taxonomic resolution, however, are not as evident. The method presented in this study improves the traditional approaches to study compositional change in ecological communities by accounting for some of the uncertainty inherent to biological data. We hope that this approach can be routinely used to estimate any parameter of interest obtained from compositional data tables when faced with taxonomic uncertainty.
Resumo:
A new and effective method for reduction of truncation errors in partial spherical near-field (SNF) measurements is proposed. The method is useful when measuring electrically large antennas, where the measurement time with the classical SNF technique is prohibitively long and an acquisition over the whole spherical surface is not practical. Therefore, to reduce the data acquisition time, partial sphere measurement is usually made, taking samples over a portion of the spherical surface in the direction of the main beam. But in this case, the radiation pattern is not known outside the measured angular sector as well as a truncation error is present in the calculated far-field pattern within this sector. The method is based on the Gerchberg-Papoulis algorithm used to extrapolate functions and it is able to extend the valid region of the calculated far-field pattern up to the whole forward hemisphere. To verify the effectiveness of the method, several examples are presented using both simulated and measured truncated near-field data.
Resumo:
This paper describes two methods to cancel the effect of two kinds of leakage signals which may be presented when an antenna is measured in a planar near-field range. One method tries to reduce leakage bias errors from the receiver¿s quadrature detector and it is based on estimating the bias constant added to every near-field data sample. Then, that constant is subtracted from the data, removing its undesired effect on the far-field pattern. The estimation is performed by back-propagating the field from the scan plane to the antenna under test plane (AUT) and averaging all the data located outside the AUT aperture. The second method is able to cancel the effect of the leakage from faulty transmission lines, connectors or rotary joints. The basis of this method is also a reconstruction process to determine the field distribution on the AUT plane. Once this distribution is known, a spatial filtering is applied to cancel the contribution due to those faulty elements. After that, a near-field-to-far-field transformation is applied, obtaining a new radiation pattern where the leakage effects have disappeared. To verify the effectiveness of both methods, several examples are presented.
Resumo:
We present a methodology for reducing a straight line fitting regression problem to a Least Squares minimization one. This is accomplished through the definition of a measure on the data space that takes into account directional dependences of errors, and the use of polar descriptors for straight lines. This strategy improves the robustness by avoiding singularities and non-describable lines. The methodology is powerful enough to deal with non-normal bivariate heteroscedastic data error models, but can also supersede classical regression methods by making some particular assumptions. An implementation of the methodology for the normal bivariate case is developed and evaluated.
Resumo:
The creation of language resources is a time-consuming process requiring the efforts of many people. The use of resources collaboratively created by non-linguists can potentially ameliorate this situation. However, such resources often contain more errors compared to resources created by experts. For the particular case of lexica, we analyse the case of Wiktionary, a resource created along wiki principles and argue that through the use of a principled lexicon model, namely lemon, the resulting data could be better understandable to machines. We then present a platform called lemon source that supports the creation of linked lexical data along the lemon model. This tool builds on the concept of a semantic wiki to enable collaborative editing of the resources by many users concurrently. In this paper, we describe the model, the tool and present an evaluation of its usability based on a small group of users.
Resumo:
La mayoría de las aplicaciones forestales del escaneo laser aerotransportado (ALS, del inglés airborne laser scanning) requieren la integración y uso simultaneo de diversas fuentes de datos, con el propósito de conseguir diversos objetivos. Los proyectos basados en sensores remotos normalmente consisten en aumentar la escala de estudio progresivamente a lo largo de varias fases de fusión de datos: desde la información más detallada obtenida sobre un área limitada (la parcela de campo), hasta una respuesta general de la cubierta forestal detectada a distancia de forma más incierta pero cubriendo un área mucho más amplia (la extensión cubierta por el vuelo o el satélite). Todas las fuentes de datos necesitan en ultimo termino basarse en las tecnologías de sistemas de navegación global por satélite (GNSS, del inglés global navigation satellite systems), las cuales son especialmente erróneas al operar por debajo del dosel forestal. Otras etapas adicionales de procesamiento, como la ortorectificación, también pueden verse afectadas por la presencia de vegetación, deteriorando la exactitud de las coordenadas de referencia de las imágenes ópticas. Todos estos errores introducen ruido en los modelos, ya que los predictores se desplazan de la posición real donde se sitúa su variable respuesta. El grado por el que las estimaciones forestales se ven afectadas depende de la dispersión espacial de las variables involucradas, y también de la escala utilizada en cada caso. Esta tesis revisa las fuentes de error posicional que pueden afectar a los diversos datos de entrada involucrados en un proyecto de inventario forestal basado en teledetección ALS, y como las propiedades del dosel forestal en sí afecta a su magnitud, aconsejando en consecuencia métodos para su reducción. También se incluye una discusión sobre las formas más apropiadas de medir exactitud y precisión en cada caso, y como los errores de posicionamiento de hecho afectan a la calidad de las estimaciones, con vistas a una planificación eficiente de la adquisición de los datos. La optimización final en el posicionamiento GNSS y de la radiometría del sensor óptico permitió detectar la importancia de este ultimo en la predicción de la desidad relativa de un bosque monoespecífico de Pinus sylvestris L. ABSTRACT Most forestry applications of airborne laser scanning (ALS) require the integration and simultaneous use of various data sources, pursuing a variety of different objectives. Projects based on remotely-sensed data generally consist in upscaling data fusion stages: from the most detailed information obtained for a limited area (field plot) to a more uncertain forest response sensed over a larger extent (airborne and satellite swath). All data sources ultimately rely on global navigation satellite systems (GNSS), which are especially error-prone when operating under forest canopies. Other additional processing stages, such as orthorectification, may as well be affected by vegetation, hence deteriorating the accuracy of optical imagery’s reference coordinates. These errors introduce noise to the models, as predictors displace from their corresponding response. The degree to which forest estimations are affected depends on the spatial dispersion of the variables involved and the scale used. This thesis reviews the sources of positioning errors which may affect the different inputs involved in an ALS-assisted forest inventory project, and how the properties of the forest canopy itself affects their magnitude, advising on methods for diminishing them. It is also discussed how accuracy should be assessed, and how positioning errors actually affect forest estimation, toward a cost-efficient planning for data acquisition. The final optimization in positioning the GNSS and optical image allowed to detect the importance of the latter in predicting relative density in a monospecific Pinus sylvestris L. forest.
Resumo:
Traffic flow time series data are usually high dimensional and very complex. Also they are sometimes imprecise and distorted due to data collection sensor malfunction. Additionally, events like congestion caused by traffic accidents add more uncertainty to real-time traffic conditions, making traffic flow forecasting a complicated task. This article presents a new data preprocessing method targeting multidimensional time series with a very high number of dimensions and shows its application to real traffic flow time series from the California Department of Transportation (PEMS web site). The proposed method consists of three main steps. First, based on a language for defining events in multidimensional time series, mTESL, we identify a number of types of events in time series that corresponding to either incorrect data or data with interference. Second, each event type is restored utilizing an original method that combines real observations, local forecasted values and historical data. Third, an exponential smoothing procedure is applied globally to eliminate noise interference and other random errors so as to provide good quality source data for future work.
Resumo:
Acknowledgements The iHARP database was funded by unrestricted grants from Mundipharma International Ltd and Research in Real-Life Ltd; these analyses were funded by an unrestricted grant from Teva Pharmaceuticals. Mundipharma and Teva played no role in study conduct or analysis and did not modify or approve the manuscript. The authors wish to direct a special appreciation to all the participants of the iHARP group who contributed data to this study and to Mundipharma, sponsors of the iHARP group. In addition, we thank Julie von Ziegenweidt for assistance with data extraction and Anna Gilchrist and Valerie L. Ashton, PhD, for editorial assistance. Elizabeth V. Hillyer, DVM, provided editorial and writing support, funded by Research in Real-Life, Ltd.
Resumo:
Nuclear-localized mtDNA pseudogenes might explain a recent report describing a heteroplasmic mtDNA molecule containing five linked missense mutations dispersed over the contiguous mtDNA CO1 and CO2 genes in Alzheimer’s disease (AD) patients. To test this hypothesis, we have used the PCR primers utilized in the original report to amplify CO1 and CO2 sequences from two independent ρ° (mtDNA-less) cell lines. CO1 and CO2 sequences amplified from both of the ρ° cells, demonstrating that these sequences are also present in the human nuclear DNA. The nuclear pseudogene CO1 and CO2 sequences were then tested for each of the five “AD” missense mutations by restriction endonuclease site variant assays. All five mutations were found in the nuclear CO1 and CO2 PCR products from ρ° cells, but none were found in the PCR products obtained from cells with normal mtDNA. Moreover, when the overlapping nuclear CO1 and CO2 PCR products were cloned and sequenced, all five missense mutations were found, as well as a linked synonymous mutation. Unlike the findings in the original report, an additional 32 base substitutions were found, including two in adjacent tRNAs and a two base pair deletion in the CO2 gene. Phylogenetic analysis of the nuclear CO1 and CO2 sequences revealed that they diverged from modern human mtDNAs early in hominid evolution about 770,000 years before present. These data would be consistent with the interpretation that the missense mutations proposed to cause AD may be the product of ancient mtDNA variants preserved as nuclear pseudogenes.
Resumo:
Previously conducted sequence analysis of Arabidopsis thaliana (ecotype Columbia-0) reported an insertion of 270-kb mtDNA into the pericentric region on the short arm of chromosome 2. DNA fiber-based fluorescence in situ hybridization analyses reveal that the mtDNA insert is 618 ± 42 kb, ≈2.3 times greater than that determined by contig assembly and sequencing analysis. Portions of the mitochondrial genome previously believed to be absent were identified within the insert. Sections of the mtDNA are repeated throughout the insert. The cytological data illustrate that DNA contig assembly by using bacterial artificial chromosomes tends to produce a minimal clone path by skipping over duplicated regions, thereby resulting in sequencing errors. We demonstrate that fiber-fluorescence in situ hybridization is a powerful technique to analyze large repetitive regions in the higher eukaryotic genomes and is a valuable complement to ongoing large genome sequencing projects.
Resumo:
Moderate resolution remote sensing data, as provided by MODIS, can be used to detect and map active or past wildfires from daily records of suitable combinations of reflectance bands. The objective of the present work was to develop and test simple algorithms and variations for automatic or semiautomatic detection of burnt areas from time series data of MODIS biweekly vegetation indices for a Mediterranean region. MODIS-derived NDVI 250m time series data for the Valencia region, East Spain, were subjected to a two-step process for the detection of candidate burnt areas, and the results compared with available fire event records from the Valencia Regional Government. For each pixel and date in the data series, a model was fitted to both the previous and posterior time series data. Combining drops between two consecutive points and 1-year average drops, we used discrepancies or jumps between the pre and post models to identify seed pixels, and then delimitated fire scars for each potential wildfire using an extension algorithm from the seed pixels. The resulting maps of the detected burnt areas showed a very good agreement with the perimeters registered in the database of fire records used as reference. Overall accuracies and indices of agreement were very high, and omission and commission errors were similar or lower than in previous studies that used automatic or semiautomatic fire scar detection based on remote sensing. This supports the effectiveness of the method for detecting and mapping burnt areas in the Mediterranean region.
Resumo:
LIDAR (LIght Detection And Ranging) first return elevation data of the Boston, Massachusetts region from MassGIS at 1-meter resolution. This LIDAR data was captured in Spring 2002. LIDAR first return data (which shows the highest ground features, e.g. tree canopy, buildings etc.) can be used to produce a digital terrain model of the Earth's surface. This dataset consists of 74 First Return DEM tiles. The tiles are 4km by 4km areas corresponding with the MassGIS orthoimage index. This data set was collected using 3Di's Digital Airborne Topographic Imaging System II (DATIS II). The area of coverage corresponds to the following MassGIS orthophoto quads covering the Boston region (MassGIS orthophoto quad ID: 229890, 229894, 229898, 229902, 233886, 233890, 233894, 233898, 233902, 233906, 233910, 237890, 237894, 237898, 237902, 237906, 237910, 241890, 241894, 241898, 241902, 245898, 245902). The geographic extent of this dataset is the same as that of the MassGIS dataset: Boston, Massachusetts Region 1:5,000 Color Ortho Imagery (1/2-meter Resolution), 2001 and was used to produce the MassGIS dataset: Boston, Massachusetts, 2-Dimensional Building Footprints with Roof Height Data (from LIDAR data), 2002 [see cross references].
Resumo:
This dataset consists of 2D footprints of the buildings in the metropolitan Boston area, based on tiles in the orthoimage index (orthophoto quad ID: 229890, 229894, 229898, 229902, 233886, 233890, 233894, 233898, 233902, 237890, 237894, 237898, 237902, 241890, 241894, 241898, 241902, 245898, 245902). This data set was collected using 3Di's Digital Airborne Topographic Imaging System II (DATIS II). Roof height and footprint elevation attributes (derived from 1-meter resolution LIDAR (LIght Detection And Ranging) data) are included as part of each building feature. This data can be combined with other datasets to create 3D representations of buildings and the surrounding environment.