973 resultados para MISSING VALUE ESTIMATION
Resumo:
Providing transportation system operators and travelers with accurate travel time information allows them to make more informed decisions, yielding benefits for individual travelers and for the entire transportation system. Most existing advanced traveler information systems (ATIS) and advanced traffic management systems (ATMS) use instantaneous travel time values estimated based on the current measurements, assuming that traffic conditions remain constant in the near future. For more effective applications, it has been proposed that ATIS and ATMS should use travel times predicted for short-term future conditions rather than instantaneous travel times measured or estimated for current conditions. ^ This dissertation research investigates short-term freeway travel time prediction using Dynamic Neural Networks (DNN) based on traffic detector data collected by radar traffic detectors installed along a freeway corridor. DNN comprises a class of neural networks that are particularly suitable for predicting variables like travel time, but has not been adequately investigated for this purpose. Before this investigation, it was necessary to identifying methods for data imputation to account for missing data usually encountered when collecting data using traffic detectors. It was also necessary to identify a method to estimate the travel time on the freeway corridor based on data collected using point traffic detectors. A new travel time estimation method referred to as the Piecewise Constant Acceleration Based (PCAB) method was developed and compared with other methods reported in the literatures. The results show that one of the simple travel time estimation methods (the average speed method) can work as well as the PCAB method, and both of them out-perform other methods. This study also compared the travel time prediction performance of three different DNN topologies with different memory setups. The results show that one DNN topology (the time-delay neural networks) out-performs the other two DNN topologies for the investigated prediction problem. This topology also performs slightly better than the simple multilayer perceptron (MLP) neural network topology that has been used in a number of previous studies for travel time prediction.^
Resumo:
Providing transportation system operators and travelers with accurate travel time information allows them to make more informed decisions, yielding benefits for individual travelers and for the entire transportation system. Most existing advanced traveler information systems (ATIS) and advanced traffic management systems (ATMS) use instantaneous travel time values estimated based on the current measurements, assuming that traffic conditions remain constant in the near future. For more effective applications, it has been proposed that ATIS and ATMS should use travel times predicted for short-term future conditions rather than instantaneous travel times measured or estimated for current conditions. This dissertation research investigates short-term freeway travel time prediction using Dynamic Neural Networks (DNN) based on traffic detector data collected by radar traffic detectors installed along a freeway corridor. DNN comprises a class of neural networks that are particularly suitable for predicting variables like travel time, but has not been adequately investigated for this purpose. Before this investigation, it was necessary to identifying methods for data imputation to account for missing data usually encountered when collecting data using traffic detectors. It was also necessary to identify a method to estimate the travel time on the freeway corridor based on data collected using point traffic detectors. A new travel time estimation method referred to as the Piecewise Constant Acceleration Based (PCAB) method was developed and compared with other methods reported in the literatures. The results show that one of the simple travel time estimation methods (the average speed method) can work as well as the PCAB method, and both of them out-perform other methods. This study also compared the travel time prediction performance of three different DNN topologies with different memory setups. The results show that one DNN topology (the time-delay neural networks) out-performs the other two DNN topologies for the investigated prediction problem. This topology also performs slightly better than the simple multilayer perceptron (MLP) neural network topology that has been used in a number of previous studies for travel time prediction.
Resumo:
Fixed-step-size (FSS) and Bayesian staircases are widely used methods to estimate sensory thresholds in 2AFC tasks, although a direct comparison of both types of procedure under identical conditions has not previously been reported. A simulation study and an empirical test were conducted to compare the performance of optimized Bayesian staircases with that of four optimized variants of FSS staircase differing as to up-down rule. The ultimate goal was to determine whether FSS or Bayesian staircases are the best choice in experimental psychophysics. The comparison considered the properties of the estimates (i.e. bias and standard errors) in relation to their cost (i.e. the number of trials to completion). The simulation study showed that mean estimates of Bayesian and FSS staircases are dependable when sufficient trials are given and that, in both cases, the standard deviation (SD) of the estimates decreases with number of trials, although the SD of Bayesian estimates is always lower than that of FSS estimates (and thus, Bayesian staircases are more efficient). The empirical test did not support these conclusions, as (1) neither procedure rendered estimates converging on some value, (2) standard deviations did not follow the expected pattern of decrease with number of trials, and (3) both procedures appeared to be equally efficient. Potential factors explaining the discrepancies between simulation and empirical results are commented upon and, all things considered, a sensible recommendation is for psychophysicists to run no fewer than 18 and no more than 30 reversals of an FSS staircase implementing the 1-up/3-down rule.
Resumo:
Threshold estimation with sequential procedures is justifiable on the surmise that the index used in the so-called dynamic stopping rule has diagnostic value for identifying when an accurate estimate has been obtained. The performance of five types of Bayesian sequential procedure was compared here to that of an analogous fixed-length procedure. Indices for use in sequential procedures were: (1) the width of the Bayesian probability interval, (2) the posterior standard deviation, (3) the absolute change, (4) the average change, and (5) the number of sign fluctuations. A simulation study was carried out to evaluate which index renders estimates with less bias and smaller standard error at lower cost (i.e. lower average number of trials to completion), in both yes–no and two-alternative forced-choice (2AFC) tasks. We also considered the effect of the form and parameters of the psychometric function and its similarity with themodel function assumed in the procedure. Our results show that sequential procedures do not outperform fixed-length procedures in yes–no tasks. However, in 2AFC tasks, sequential procedures not based on sign fluctuations all yield minimally better estimates than fixed-length procedures, although most of the improvement occurs with short runs that render undependable estimates and the differences vanish when the procedures run for a number of trials (around 70) that ensures dependability. Thus, none of the indices considered here (some of which are widespread) has the diagnostic value that would justify its use. In addition, difficulties of implementation make sequential procedures unfit as alternatives to fixed-length procedures.
Resumo:
Purpose
The objective of our study was to test a new approach to approximating organ dose by using the effective energy of the combined 80kV/140kV beam used in fast kV switch dual-energy (DE) computed tomography (CT). The two primary focuses of the study were to first validate experimentally the dose equivalency between MOSFET and ion chamber (as a gold standard) in a fast kV switch DE environment, and secondly to estimate effective dose (ED) of DECT scans using MOSFET detectors and an anthropomorphic phantom.
Materials and Methods
A GE Discovery 750 CT scanner was employed using a fast-kV switch abdomen/pelvis protocol alternating between 80 kV and 140 kV. The specific aims of our study were to (1) Characterize the effective energy of the dual energy environment; (2) Estimate the f-factor for soft tissue; (3) Calibrate the MOSFET detectors using a beam with effective energy equal to the combined DE environment; (4) Validate our calibration by using MOSFET detectors and ion chamber to measure dose at the center of a CTDI body phantom; (5) Measure ED for an abdomen/pelvis scan using an anthropomorphic phantom and applying ICRP 103 tissue weighting factors; and (6) Estimate ED using AAPM Dose Length Product (DLP) method. The effective energy of the combined beam was calculated by measuring dose with an ion chamber under varying thicknesses of aluminum to determine half-value layer (HVL).
Results
The effective energy of the combined dual-energy beams was found to be 42.8 kV. After calibration, tissue dose in the center of the CTDI body phantom was measured at 1.71 ± 0.01 cGy using an ion chamber, and 1.73±0.04 and 1.69±0.09 using two separate MOSFET detectors. This result showed a -0.93% and 1.40 % difference, respectively, between ion chamber and MOSFET. ED from the dual-energy scan was calculated as 16.49 ± 0.04 mSv by the MOSFET method and 14.62 mSv by the DLP method.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
Estimates of HIV prevalence are important for policy in order to establish the health status of a country's population and to evaluate the effectiveness of population-based interventions and campaigns. However, participation rates in testing for surveillance conducted as part of household surveys, on which many of these estimates are based, can be low. HIV positive individuals may be less likely to participate because they fear disclosure, in which case estimates obtained using conventional approaches to deal with missing data, such as imputation-based methods, will be biased. We develop a Heckman-type simultaneous equation approach which accounts for non-ignorable selection, but unlike previous implementations, allows for spatial dependence and does not impose a homogeneous selection process on all respondents. In addition, our framework addresses the issue of separation, where for instance some factors are severely unbalanced and highly predictive of the response, which would ordinarily prevent model convergence. Estimation is carried out within a penalized likelihood framework where smoothing is achieved using a parametrization of the smoothing criterion which makes estimation more stable and efficient. We provide the software for straightforward implementation of the proposed approach, and apply our methodology to estimating national and sub-national HIV prevalence in Swaziland, Zimbabwe and Zambia.
Resumo:
In Germany the upscaling algorithm is currently the standard approach for evaluating the PV power produced in a region. This method involves spatially interpolating the normalized power of a set of reference PV plants to estimate the power production by another set of unknown plants. As little information on the performances of this method could be found in the literature, the first goal of this thesis is to conduct an analysis of the uncertainty associated to this method. It was found that this method can lead to large errors when the set of reference plants has different characteristics or weather conditions than the set of unknown plants and when the set of reference plants is small. Based on these preliminary findings, an alternative method is proposed for calculating the aggregate power production of a set of PV plants. A probabilistic approach has been chosen by which a power production is calculated at each PV plant from corresponding weather data. The probabilistic approach consists of evaluating the power for each frequently occurring value of the parameters and estimating the most probable value by averaging these power values weighted by their frequency of occurrence. Most frequent parameter sets (e.g. module azimuth and tilt angle) and their frequency of occurrence have been assessed on the basis of a statistical analysis of parameters of approx. 35 000 PV plants. It has been found that the plant parameters are statistically dependent on the size and location of the PV plants. Accordingly, separate statistical values have been assessed for 14 classes of nominal capacity and 95 regions in Germany (two-digit zip-code areas). The performances of the upscaling and probabilistic approaches have been compared on the basis of 15 min power measurements from 715 PV plants provided by the German distribution system operator LEW Verteilnetz. It was found that the error of the probabilistic method is smaller than that of the upscaling method when the number of reference plants is sufficiently large (>100 reference plants in the case study considered in this chapter). When the number of reference plants is limited (<50 reference plants for the considered case study), it was found that the proposed approach provides a noticeable gain in accuracy with respect to the upscaling method.
Resumo:
Les préhenseurs robotiques sont largement utilisés en industrie et leur déploiement pourrait être encore plus important si ces derniers étaient plus intelligents. En leur conférant des capacités tactiles et une intelligence leur permettant d’estimer la pose d’un objet saisi, une plus vaste gamme de tâches pourraient être accomplies par les robots. Ce mémoire présente le développement d’algorithmes d’estimation de la pose d’objets saisis par un préhenseur robotique. Des algorithmes ont été développés pour trois systèmes robotisés différents, mais pour les mêmes considérations. Effectivement, pour les trois systèmes la pose est estimée uniquement à partir d’une saisie d’objet, de données tactiles et de la configuration du préhenseur. Pour chaque système, la performance atteignable pour le système minimaliste étudié est évaluée. Dans ce mémoire, les concepts généraux sur l’estimation de la pose sont d’abord exposés. Ensuite, un préhenseur plan à deux doigts comprenant deux phalanges chacun est modélisé dans un environnement de simulation et un algorithme permettant d’estimer la pose d’un objet saisi par le préhenseur est décrit. Cet algorithme est basé sur les arbres d’interprétation et l’algorithme de RANSAC. Par la suite, un système expérimental plan comprenant une phalange supplémentaire par doigt est modélisé et étudié pour le développement d’un algorithme approprié d’estimation de la pose. Les principes de ce dernier sont similaires au premier algorithme, mais les capteurs compris dans le système sont moins précis et des adaptations et améliorations ont dû être appliquées. Entre autres, les mesures des capteurs ont été mieux exploitées. Finalement, un système expérimental spatial composé de trois doigts comprenant trois phalanges chacun est étudié. Suite à la modélisation, l’algorithme développé pour ce système complexe est présenté. Des hypothèses partiellement aléatoires sont générées, complétées, puis évaluées. L’étape d’évaluation fait notamment appel à l’algorithme de Levenberg-Marquardt.
Resumo:
The need for high temporal and spatial resolution precipitation data for hydrological analyses has been discussed in several studies. Although rain gauges provide valuable information, a very dense rain gauge network is costly. As a result, several new ideas have been emerged to help estimating areal rainfall with higher temporal and spatial resolution. Rabiei et al. (2013) observed that moving cars, called RainCars (RCs), can potentially be a new source of data for measuring rainfall amounts. The optical sensors used in that study are designed for operating the windscreen wipers and showed promising results for rainfall measurement purposes. Their measurement accuracy has been quantified in laboratory experiments. Considering explicitly those errors, the main objective of this study is to investigate the benefit of using RCs for estimating areal rainfall. For that, computer experiments are carried out, where radar rainfall is considered as the reference and the other sources of data, i.e. RCs and rain gauges, are extracted from radar data. Comparing the quality of areal rainfall estimation by RCs with rain gauges and reference data helps to investigate the benefit of the RCs. The value of this additional source of data is not only assessed for areal rainfall estimation performance, but also for use in hydrological modeling. The results show that the RCs considering measurement errors derived from laboratory experiments provide useful additional information for areal rainfall estimation as well as for hydrological modeling. Even assuming higher uncertainties for RCs as obtained from the laboratory up to a certain level is observed practical.
Resumo:
This work evaluates the mercury (Hg) contamination status (sediments and biota) of the Bijagós archipelago, off the coast of Guinea-Bissau. Sediments exhibited very low concentrations (<1-12ngg(-1)), pointing to negligible sources of anthropogenic Hg in the region. Nevertheless, Hg is well correlated to the fine fraction, aluminium, and loss on ignition, indicating the effect of grain size and organic matter content on the presence of Hg in sediments. Mercury in the bivalves Tagelus adansoni and Senilia senilis did not vary considerably among sites, ranging within narrow intervals (0.09-0.12 and 0.12-0.14μgg(-1) (dry weight), respectively). Divergent substrate preferences/feeding tactics may justify slight differences between species. The value 11ngg(-1) is proposed as the sediment background concentration for this West-African coastal region, and concentrations within the interval 8-10ngg(-1) (wet weight) may be considered as reference range for S. senilis and T. adansoni in future monitoring studies.
Resumo:
In quantitative risk analysis, the problem of estimating small threshold exceedance probabilities and extreme quantiles arise ubiquitously in bio-surveillance, economics, natural disaster insurance actuary, quality control schemes, etc. A useful way to make an assessment of extreme events is to estimate the probabilities of exceeding large threshold values and extreme quantiles judged by interested authorities. Such information regarding extremes serves as essential guidance to interested authorities in decision making processes. However, in such a context, data are usually skewed in nature, and the rarity of exceedance of large threshold implies large fluctuations in the distribution's upper tail, precisely where the accuracy is desired mostly. Extreme Value Theory (EVT) is a branch of statistics that characterizes the behavior of upper or lower tails of probability distributions. However, existing methods in EVT for the estimation of small threshold exceedance probabilities and extreme quantiles often lead to poor predictive performance in cases where the underlying sample is not large enough or does not contain values in the distribution's tail. In this dissertation, we shall be concerned with an out of sample semiparametric (SP) method for the estimation of small threshold probabilities and extreme quantiles. The proposed SP method for interval estimation calls for the fusion or integration of a given data sample with external computer generated independent samples. Since more data are used, real as well as artificial, under certain conditions the method produces relatively short yet reliable confidence intervals for small exceedance probabilities and extreme quantiles.
Resumo:
We develop the energy norm a-posteriori error estimation for hp-version discontinuous Galerkin (DG) discretizations of elliptic boundary-value problems on 1-irregularly, isotropically refined affine hexahedral meshes in three dimensions. We derive a reliable and efficient indicator for the errors measured in terms of the natural energy norm. The ratio of the efficiency and reliability constants is independent of the local mesh sizes and weakly depending on the polynomial degrees. In our analysis we make use of an hp-version averaging operator in three dimensions, which we explicitly construct and analyze. We use our error indicator in an hp-adaptive refinement algorithm and illustrate its practical performance in a series of numerical examples. Our numerical results indicate that exponential rates of convergence are achieved for problems with smooth solutions, as well as for problems with isotropic corner singularities.
Resumo:
Statistical approaches to study extreme events require, by definition, long time series of data. In many scientific disciplines, these series are often subject to variations at different temporal scales that affect the frequency and intensity of their extremes. Therefore, the assumption of stationarity is violated and alternative methods to conventional stationary extreme value analysis (EVA) must be adopted. Using the example of environmental variables subject to climate change, in this study we introduce the transformed-stationary (TS) methodology for non-stationary EVA. This approach consists of (i) transforming a non-stationary time series into a stationary one, to which the stationary EVA theory can be applied, and (ii) reverse transforming the result into a non-stationary extreme value distribution. As a transformation, we propose and discuss a simple time-varying normalization of the signal and show that it enables a comprehensive formulation of non-stationary generalized extreme value (GEV) and generalized Pareto distribution (GPD) models with a constant shape parameter. A validation of the methodology is carried out on time series of significant wave height, residual water level, and river discharge, which show varying degrees of long-term and seasonal variability. The results from the proposed approach are comparable with the results from (a) a stationary EVA on quasi-stationary slices of non-stationary series and (b) the established method for non-stationary EVA. However, the proposed technique comes with advantages in both cases. For example, in contrast to (a), the proposed technique uses the whole time horizon of the series for the estimation of the extremes, allowing for a more accurate estimation of large return levels. Furthermore, with respect to (b), it decouples the detection of non-stationary patterns from the fitting of the extreme value distribution. As a result, the steps of the analysis are simplified and intermediate diagnostics are possible. In particular, the transformation can be carried out by means of simple statistical techniques such as low-pass filters based on the running mean and the standard deviation, and the fitting procedure is a stationary one with a few degrees of freedom and is easy to implement and control. An open-source MAT-LAB toolbox has been developed to cover this methodology, which is available at https://github.com/menta78/tsEva/(Mentaschi et al., 2016).
Resumo:
One of the objectives of this study is to perform classification of socio-demographic components for the level of city section in City of Lisbon. In order to accomplish suitable platform for the restaurant potentiality map, the socio-demographic components were selected to produce a map of spatial clusters in accordance to restaurant suitability. Consequently, the second objective is to obtain potentiality map in terms of underestimation and overestimation in number of restaurants. To the best of our knowledge there has not been found identical methodology for the estimation of restaurant potentiality. The results were achieved with combination of SOM (Self-Organized Map) which provides a segmentation map and GAM (Generalized Additive Model) with spatial component for restaurant potentiality. Final results indicate that the highest influence in restaurant potentiality is given to tourist sites, spatial autocorrelation in terms of neighboring restaurants (spatial component), and tax value, where lower importance is given to household with 1 or 2 members and employed population, respectively. In addition, an important conclusion is that the most attractive market sites have shown no change or moderate underestimation in terms of restaurants potentiality.