884 resultados para Mean squared error
Resumo:
An iterative travel time forecasting scheme, named the Advanced Multilane Prediction based Real-time Fastest Path (AMPRFP) algorithm, is presented in this dissertation. This scheme is derived from the conventional kernel estimator based prediction model by the association of real-time nonlinear impacts that caused by neighboring arcs’ traffic patterns with the historical traffic behaviors. The AMPRFP algorithm is evaluated by prediction of the travel time of congested arcs in the urban area of Jacksonville City. Experiment results illustrate that the proposed scheme is able to significantly reduce both the relative mean error (RME) and the root-mean-squared error (RMSE) of the predicted travel time. To obtain high quality real-time traffic information, which is essential to the performance of the AMPRFP algorithm, a data clean scheme enhanced empirical learning (DCSEEL) algorithm is also introduced. This novel method investigates the correlation between distance and direction in the geometrical map, which is not considered in existing fingerprint localization methods. Specifically, empirical learning methods are applied to minimize the error that exists in the estimated distance. A direction filter is developed to clean joints that have negative influence to the localization accuracy. Synthetic experiments in urban, suburban and rural environments are designed to evaluate the performance of DCSEEL algorithm in determining the cellular probe’s position. The results show that the cellular probe’s localization accuracy can be notably improved by the DCSEEL algorithm. Additionally, a new fast correlation technique for overcoming the time efficiency problem of the existing correlation algorithm based floating car data (FCD) technique is developed. The matching process is transformed into a 1-dimensional (1-D) curve matching problem and the Fast Normalized Cross-Correlation (FNCC) algorithm is introduced to supersede the Pearson product Moment Correlation Co-efficient (PMCC) algorithm in order to achieve the real-time requirement of the FCD method. The fast correlation technique shows a significant improvement in reducing the computational cost without affecting the accuracy of the matching process.
Resumo:
Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.
Resumo:
The detection and diagnosis of faults, ie., find out how , where and why failures occur is an important area of study since man came to be replaced by machines. However, no technique studied to date can solve definitively the problem. Differences in dynamic systems, whether linear, nonlinear, variant or invariant in time, with physical or analytical redundancy, hamper research in order to obtain a unique solution . In this paper, a technique for fault detection and diagnosis (FDD) will be presented in dynamic systems using state observers in conjunction with other tools in order to create a hybrid FDD. A modified state observer is used to create a residue that allows also the detection and diagnosis of faults. A bank of faults signatures will be created using statistical tools and finally an approach using mean squared error ( MSE ) will assist in the study of the behavior of fault diagnosis even in the presence of noise . This methodology is then applied to an educational plant with coupled tanks and other with industrial instrumentation to validate the system.
Resumo:
The great interest in nonlinear system identification is mainly due to the fact that a large amount of real systems are complex and need to have their nonlinearities considered so that their models can be successfully used in applications of control, prediction, inference, among others. This work evaluates the application of Fuzzy Wavelet Neural Networks (FWNN) to identify nonlinear dynamical systems subjected to noise and outliers. Generally, these elements cause negative effects on the identification procedure, resulting in erroneous interpretations regarding the dynamical behavior of the system. The FWNN combines in a single structure the ability to deal with uncertainties of fuzzy logic, the multiresolution characteristics of wavelet theory and learning and generalization abilities of the artificial neural networks. Usually, the learning procedure of these neural networks is realized by a gradient based method, which uses the mean squared error as its cost function. This work proposes the replacement of this traditional function by an Information Theoretic Learning similarity measure, called correntropy. With the use of this similarity measure, higher order statistics can be considered during the FWNN training process. For this reason, this measure is more suitable for non-Gaussian error distributions and makes the training less sensitive to the presence of outliers. In order to evaluate this replacement, FWNN models are obtained in two identification case studies: a real nonlinear system, consisting of a multisection tank, and a simulated system based on a model of the human knee joint. The results demonstrate that the application of correntropy as the error backpropagation algorithm cost function makes the identification procedure using FWNN models more robust to outliers. However, this is only achieved if the gaussian kernel width of correntropy is properly adjusted.
Resumo:
In the last decades the study of integer-valued time series has gained notoriety due to its broad applicability (modeling the number of car accidents in a given highway, or the number of people infected by a virus are two examples). One of the main interests of this area of study is to make forecasts, and for this reason it is very important to propose methods to make such forecasts, which consist of nonnegative integer values, due to the discrete nature of the data. In this work, we focus on the study and proposal of forecasts one, two and h steps ahead for integer-valued second-order autoregressive conditional heteroskedasticity processes [INARCH (2)], and in determining some theoretical properties of this model, such as the ordinary moments of its marginal distribution and the asymptotic distribution of its conditional least squares estimators. In addition, we study, via Monte Carlo simulation, the behavior of the estimators for the parameters of INARCH(2) processes obtained using three di erent methods (Yule- Walker, conditional least squares, and conditional maximum likelihood), in terms of mean squared error, mean absolute error and bias. We present some forecast proposals for INARCH(2) processes, which are compared again via Monte Carlo simulation. As an application of this proposed theory, we model a dataset related to the number of live male births of mothers living at Riachuelo city, in the state of Rio Grande do Norte, Brazil.
Resumo:
In the last decades the study of integer-valued time series has gained notoriety due to its broad applicability (modeling the number of car accidents in a given highway, or the number of people infected by a virus are two examples). One of the main interests of this area of study is to make forecasts, and for this reason it is very important to propose methods to make such forecasts, which consist of nonnegative integer values, due to the discrete nature of the data. In this work, we focus on the study and proposal of forecasts one, two and h steps ahead for integer-valued second-order autoregressive conditional heteroskedasticity processes [INARCH (2)], and in determining some theoretical properties of this model, such as the ordinary moments of its marginal distribution and the asymptotic distribution of its conditional least squares estimators. In addition, we study, via Monte Carlo simulation, the behavior of the estimators for the parameters of INARCH(2) processes obtained using three di erent methods (Yule- Walker, conditional least squares, and conditional maximum likelihood), in terms of mean squared error, mean absolute error and bias. We present some forecast proposals for INARCH(2) processes, which are compared again via Monte Carlo simulation. As an application of this proposed theory, we model a dataset related to the number of live male births of mothers living at Riachuelo city, in the state of Rio Grande do Norte, Brazil.
Resumo:
My dissertation has three chapters which develop and apply microeconometric tech- niques to empirically relevant problems. All the chapters examines the robustness issues (e.g., measurement error and model misspecification) in the econometric anal- ysis. The first chapter studies the identifying power of an instrumental variable in the nonparametric heterogeneous treatment effect framework when a binary treat- ment variable is mismeasured and endogenous. I characterize the sharp identified set for the local average treatment effect under the following two assumptions: (1) the exclusion restriction of an instrument and (2) deterministic monotonicity of the true treatment variable in the instrument. The identification strategy allows for general measurement error. Notably, (i) the measurement error is nonclassical, (ii) it can be endogenous, and (iii) no assumptions are imposed on the marginal distribution of the measurement error, so that I do not need to assume the accuracy of the measure- ment. Based on the partial identification result, I provide a consistent confidence interval for the local average treatment effect with uniformly valid size control. I also show that the identification strategy can incorporate repeated measurements to narrow the identified set, even if the repeated measurements themselves are endoge- nous. Using the the National Longitudinal Study of the High School Class of 1972, I demonstrate that my new methodology can produce nontrivial bounds for the return to college attendance when attendance is mismeasured and endogenous.
The second chapter, which is a part of a coauthored project with Federico Bugni, considers the problem of inference in dynamic discrete choice problems when the structural model is locally misspecified. We consider two popular classes of estimators for dynamic discrete choice models: K-step maximum likelihood estimators (K-ML) and K-step minimum distance estimators (K-MD), where K denotes the number of policy iterations employed in the estimation problem. These estimator classes include popular estimators such as Rust (1987)’s nested fixed point estimator, Hotz and Miller (1993)’s conditional choice probability estimator, Aguirregabiria and Mira (2002)’s nested algorithm estimator, and Pesendorfer and Schmidt-Dengler (2008)’s least squares estimator. We derive and compare the asymptotic distributions of K- ML and K-MD estimators when the model is arbitrarily locally misspecified and we obtain three main results. In the absence of misspecification, Aguirregabiria and Mira (2002) show that all K-ML estimators are asymptotically equivalent regardless of the choice of K. Our first result shows that this finding extends to a locally misspecified model, regardless of the degree of local misspecification. As a second result, we show that an analogous result holds for all K-MD estimators, i.e., all K- MD estimator are asymptotically equivalent regardless of the choice of K. Our third and final result is to compare K-MD and K-ML estimators in terms of asymptotic mean squared error. Under local misspecification, the optimally weighted K-MD estimator depends on the unknown asymptotic bias and is no longer feasible. In turn, feasible K-MD estimators could have an asymptotic mean squared error that is higher or lower than that of the K-ML estimators. To demonstrate the relevance of our asymptotic analysis, we illustrate our findings using in a simulation exercise based on a misspecified version of Rust (1987) bus engine problem.
The last chapter investigates the causal effect of the Omnibus Budget Reconcil- iation Act of 1993, which caused the biggest change to the EITC in its history, on unemployment and labor force participation among single mothers. Unemployment and labor force participation are difficult to define for a few reasons, for example, be- cause of marginally attached workers. Instead of searching for the unique definition for each of these two concepts, this chapter bounds unemployment and labor force participation by observable variables and, as a result, considers various competing definitions of these two concepts simultaneously. This bounding strategy leads to partial identification of the treatment effect. The inference results depend on the construction of the bounds, but they imply positive effect on labor force participa- tion and negligible effect on unemployment. The results imply that the difference- in-difference result based on the BLS definition of unemployment can be misleading
due to misclassification of unemployment.
Resumo:
Measurement of joint kinematics can provide knowledge to help improve joint prosthesis design, as well as identify joint motion patterns that may lead to joint degeneration or injury. More investigation into how the hip translates in live human subjects during high amplitude motions is needed. This work presents a design of a non-invasive method using the registration between images from conventional Magnetic Resonance Imaging (MRI) and open MRI to calculate three dimensional hip joint kinematics. The method was tested on a single healthy subject in three different poses. MRI protocols for the conventional gantry, high-resolution MRI and the open gantry, lowresolution MRI were developed. The scan time for the low-resolution protocol was just under 6 minutes. High-resolution meshes and low resolution contours were derived from segmentation of the high-resolution and low-resolution images, respectively. Low-resolution contours described the poses as scanned, whereas the meshes described the bones’ geometries. The meshes and contours were registered to each other, and joint kinematics were calculated. The segmentation and registration were performed for both cortical and sub-cortical bone surfaces. A repeatability study was performed by comparing the kinematic results derived from three users’ segmentations of the sub-cortical bone surfaces from a low-resolution scan. The root mean squared error of all registrations was below 1.92mm. The maximum range between segmenters in translation magnitude was 0.95mm, and the maximum deviation from the average of all orientations was 1.27◦. This work demonstrated that this method for non-invasive measurement of hip kinematics is promising for measuring high-range-of-motion hip motions in vivo.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Departamento de Administração, Programa de Pós-graduação em Administração, 2016.
Resumo:
Little information is available on the degree of within-field variability of potential production of Tall wheatgrass (Thinopyrum ponticum) forage under unirrigated conditions. The aim of this study was to characterize the spatial variability of the accumulated biomass (AB) without nutritional limitations through vegetation indexes, and then use this information to determine potential management zones. A 27-×-27-m grid cell size was chosen and 84 biomass sampling areas (BSA), each 2 m(2) in size, were georeferenced. Nitrogen and phosphorus fertilizers were applied after an initial cut at 3 cm height. At 500 °C day, the AB from each sampling area, was collected and evaluated. The spatial variability of AB was estimated more accurately using the Normalized Difference Vegetation Index (NDVI), calculated from LANDSAT 8 images obtained on 24 November 2014 (NDVInov) and 10 December 2014 (NDVIdec) because the potential AB was highly associated with NDVInov and NDVIdec (r (2) = 0.85 and 0.83, respectively). These models between the potential AB data and NDVI were evaluated by root mean squared error (RMSE) and relative root mean squared error (RRMSE). This last coefficient was 12 and 15 % for NDVInov and NDVIdec, respectively. Potential AB and NDVI spatial correlation were quantified with semivariograms. The spatial dependence of AB was low. Six classes of NDVI were analyzed for comparison, and two management zones (MZ) were established with them. In order to evaluate if the NDVI method allows us to delimit MZ with different attainable yields, the AB estimated for these MZ were compared through an ANOVA test. The potential AB had significant differences among MZ. Based on these findings, it can be concluded that NDVI obtained from LANDSAT 8 images can be reliably used for creating MZ in soils under permanent pastures dominated by Tall wheatgrass.
Resumo:
Models of the dynamics of nitrogen in soil (soil-N) can be used to aid the fertilizer management of a crop. The predictions of soil-N models can be validated by comparison with observed data. Validation generally involves calculating non-spatial statistics of the observations and predictions, such as their means, their mean squared-difference, and their correlation. However, when the model predictions are spatially distributed across a landscape the model requires validation with spatial statistics. There are three reasons for this: (i) the model may be more or less successful at reproducing the variance of the observations at different spatial scales; (ii) the correlation of the predictions with the observations may be different at different spatial scales; (iii) the spatial pattern of model error may be informative. In this study we used a model, parameterized with spatially variable input information about the soil, to predict the mineral-N content of soil in an arable field, and compared the results with observed data. We validated the performance of the N model spatially with a linear mixed model of the observations and model predictions, estimated by residual maximum likelihood. This novel approach allowed us to describe the joint variation of the observations and predictions as: (i) independent random variation that occurred at a fine spatial scale; (ii) correlated random variation that occurred at a coarse spatial scale; (iii) systematic variation associated with a spatial trend. The linear mixed model revealed that, in general, the performance of the N model changed depending on the spatial scale of interest. At the scales associated with random variation, the N model underestimated the variance of the observations, and the predictions were correlated poorly with the observations. At the scale of the trend, the predictions and observations shared a common surface. The spatial pattern of the error of the N model suggested that the observations were affected by the local soil condition, but this was not accounted for by the N model. In summary, the N model would be well-suited to field-scale management of soil nitrogen, but suited poorly to management at finer spatial scales. This information was not apparent with a non-spatial validation. (c),2007 Elsevier B.V. All rights reserved.