986 resultados para Probability estimation
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In biostatistical applications interest often focuses on the estimation of the distribution of a time-until-event variable T. If one observes whether or not T exceeds an observed monitoring time at a random number of monitoring times, then the data structure is called interval censored data. We extend this data structure by allowing the presence of a possibly time-dependent covariate process that is observed until end of follow up. If one only assumes that the censoring mechanism satisfies coarsening at random, then, by the curve of dimensionality, typically no regular estimators will exist. To fight the curse of dimensionality we follow the approach of Robins and Rotnitzky (1992) by modeling parameters of the censoring mechanism. We model the right-censoring mechanism by modeling the hazard of the follow up time, conditional on T and the covariate process. For the monitoring mechanism we avoid modeling the joint distribution of the monitoring times by only modeling a univariate hazard of the pooled monitoring times, conditional on the follow up time, T, and the covariates process, which can be estimated by treating the pooled sample of monitoring times as i.i.d. In particular, it is assumed that the monitoring times and the right-censoring times only depend on T through the observed covariate process. We introduce inverse probability of censoring weighted (IPCW) estimator of the distribution of T and of smooth functionals thereof which are guaranteed to be consistent and asymptotically normal if we have available correctly specified semiparametric models for the two hazards of the censoring process. Furthermore, given such correctly specified models for these hazards of the censoring process, we propose a one-step estimator which will improve on the IPCW estimator if we correctly specify a lower-dimensional working model for the conditional distribution of T, given the covariate process, that remains consistent and asymptotically normal if this latter working model is misspecified. It is shown that the one-step estimator is efficient if each subject is at most monitored once and the working model contains the truth. In general, it is shown that the one-step estimator optimally uses the surrogate information if the working model contains the truth. It is not optimal in using the interval information provided by the current status indicators at the monitoring times, but simulations in Peterson, van der Laan (1997) show that the efficiency loss is small.
Resumo:
In many applications the observed data can be viewed as a censored high dimensional full data random variable X. By the curve of dimensionality it is typically not possible to construct estimators that are asymptotically efficient at every probability distribution in a semiparametric censored data model of such a high dimensional censored data structure. We provide a general method for construction of one-step estimators that are efficient at a chosen submodel of the full-data model, are still well behaved off this submodel and can be chosen to always improve on a given initial estimator. These one-step estimators rely on good estimators of the censoring mechanism and thus will require a parametric or semiparametric model for the censoring mechanism. We present a general theorem that provides a template for proving the desired asymptotic results. We illustrate the general one-step estimation methods by constructing locally efficient one-step estimators of marginal distributions and regression parameters with right-censored data, current status data and bivariate right-censored data, in all models allowing the presence of time-dependent covariates. The conditions of the asymptotics theorem are rigorously verified in one of the examples and the key condition of the general theorem is verified for all examples.
Resumo:
Marshall's (1970) lemma is an analytical result which implies root-n-consistency of the distribution function corresponding to the Grenander (1956) estimator of a non-decreasing probability density. The present paper derives analogous results for the setting of convex densities on [0,\infty).
Resumo:
This paper addresses the problem of fully-automatic localization and segmentation of 3D intervertebral discs (IVDs) from MR images. Our method contains two steps, where we first localize the center of each IVD, and then segment IVDs by classifying image pixels around each disc center as foreground (disc) or background. The disc localization is done by estimating the image displacements from a set of randomly sampled 3D image patches to the disc center. The image displacements are estimated by jointly optimizing the training and test displacement values in a data-driven way, where we take into consideration both the training data and the geometric constraint on the test image. After the disc centers are localized, we segment the discs by classifying image pixels around disc centers as background or foreground. The classification is done in a similar data-driven approach as we used for localization, but in this segmentation case we are aiming to estimate the foreground/background probability of each pixel instead of the image displacements. In addition, an extra neighborhood smooth constraint is introduced to enforce the local smoothness of the label field. Our method is validated on 3D T2-weighted turbo spin echo MR images of 35 patients from two different studies. Experiments show that compared to state of the art, our method achieves better or comparable results. Specifically, we achieve for localization a mean error of 1.6-2.0 mm, and for segmentation a mean Dice metric of 85%-88% and a mean surface distance of 1.3-1.4 mm.
Resumo:
We propose a nonparametric variance estimator when ranked set sampling (RSS) and judgment post stratification (JPS) are applied by measuring a concomitant variable. Our proposed estimator is obtained by conditioning on observed concomitant values and using nonparametric kernel regression.
Resumo:
To deliver sample estimates provided with the necessary probability foundation to permit generalization from the sample data subset to the whole target population being sampled, probability sampling strategies are required to satisfy three necessary not sufficient conditions: (i) All inclusion probabilities be greater than zero in the target population to be sampled. If some sampling units have an inclusion probability of zero, then a map accuracy assessment does not represent the entire target region depicted in the map to be assessed. (ii) The inclusion probabilities must be: (a) knowable for nonsampled units and (b) known for those units selected in the sample: since the inclusion probability determines the weight attached to each sampling unit in the accuracy estimation formulas, if the inclusion probabilities are unknown, so are the estimation weights. This original work presents a novel (to the best of these authors' knowledge, the first) probability sampling protocol for quality assessment and comparison of thematic maps generated from spaceborne/airborne Very High Resolution (VHR) images, where: (I) an original Categorical Variable Pair Similarity Index (CVPSI, proposed in two different formulations) is estimated as a fuzzy degree of match between a reference and a test semantic vocabulary, which may not coincide, and (II) both symbolic pixel-based thematic quality indicators (TQIs) and sub-symbolic object-based spatial quality indicators (SQIs) are estimated with a degree of uncertainty in measurement in compliance with the well-known Quality Assurance Framework for Earth Observation (QA4EO) guidelines. Like a decision-tree, any protocol (guidelines for best practice) comprises a set of rules, equivalent to structural knowledge, and an order of presentation of the rule set, known as procedural knowledge. The combination of these two levels of knowledge makes an original protocol worth more than the sum of its parts. The several degrees of novelty of the proposed probability sampling protocol are highlighted in this paper, at the levels of understanding of both structural and procedural knowledge, in comparison with related multi-disciplinary works selected from the existing literature. In the experimental session the proposed protocol is tested for accuracy validation of preliminary classification maps automatically generated by the Satellite Image Automatic MapperT (SIAMT) software product from two WorldView-2 images and one QuickBird-2 image provided by DigitalGlobe for testing purposes. In these experiments, collected TQIs and SQIs are statistically valid, statistically significant, consistent across maps and in agreement with theoretical expectations, visual (qualitative) evidence and quantitative quality indexes of operativeness (OQIs) claimed for SIAMT by related papers. As a subsidiary conclusion, the statistically consistent and statistically significant accuracy validation of the SIAMT pre-classification maps proposed in this contribution, together with OQIs claimed for SIAMT by related works, make the operational (automatic, accurate, near real-time, robust, scalable) SIAMT software product eligible for opening up new inter-disciplinary research and market opportunities in accordance with the visionary goal of the Global Earth Observation System of Systems (GEOSS) initiative and the QA4EO international guidelines.
Resumo:
The purpose of this paper is to present a program written in Matlab-Octave for the simulation of the time evolution of student curricula, i.e, how students pass their subjects along time until graduation. The program computes, from the simulations, the academic performance rates for the subjects of the study plan for each semester as well as the overall rates, which are a) the efficiency rate defined as the ratio of the number of students passing the exam to the number of students who registered for it and b) the success rate, defined as the ratio of the number of students passing the exam to the number of students who not only registered for it but also actually took it. Additionally, we compute the rates for the bachelor academic degree which are established for Spain by the National Quality Evaluation and Accreditation Agency (ANECA) and which are the graduation rate (measured as the percentage of students who finish as scheduled in the plan or taking an extra year) and the efficiency rate (measured as the percentage of credits which a student who graduated has really taken). The simulation is done in terms of the probabilities of passing all the subjects in their study plan. The application of the simulator to Polytech students in Madrid, where requirements for passing are specially stiff in first and second year subjects, is particularly relevant to analyze student cohorts and the probabilities of students finishing in the minimum of four years, or taking and extra year or two extra years, and so forth. It is a very useful tool when designing new study plans. The calculation of the probability distribution of the random variable "number of semesters a student has taken to complete the curricula and graduate" is difficult or even unfeasible to obtain analytically, and this is even truer when we incorporate uncertainty in parameter estimation. This is why we apply Monte Carlo simulation which not only provides illustration of the stochastic process but also a method for computation. The stochastic simulator is proving to be a useful tool for identification of the subjects most critical in the distribution of the number of semesters for curriculum vitae (CV) completion and subsequently for a decision making process in terms of CV planning and passing standards in the University. Simulations are performed through a graphical interface where also the results are presented in appropriate figures. The Project has been funded by the Call for Innovation in Education Projects of Universidad Politécnica de Madrid (UPM) through a Project of its school Escuela Técnica Superior de Ingenieros Industriales ETSII during the period September 2010-September 2011.
Resumo:
A multivariate analysis on flood variables is needed to design some hydraulic structures like dams, as the complexity of the routing process in a reservoir requires a representation of the full hydrograph. In this work, a bivariate copula model was used to obtain the bivariate joint distribution of flood peak and volume, in order to know the probability of occurrence of a given inflow hydrograph. However, the risk of dam overtopping is given by the maximum water elevation reached during the routing process, which depends on the hydrograph variables, the reservoir volume and the spillway crest length. Consequently, an additional bivariate return period, the so-called routed return period, was defined in terms of risk of dam overtopping based on this maximum water elevation obtained after routing the inflow hydrographs. The theoretical return periods, which give the probability of occurrence of a hydrograph prior to accounting for the reservoir routing, were compared with the routed return period, as in both cases hydrographs with the same probability will draw a curve in the peak-volume space. The procedure was applied to the case study of the Santillana reservoir in Spain. Different reservoir volumes and spillway lengths were considered to investigate the influence of the dam and reservoir characteristics on the results. The methodology improves the estimation of the Design Flood Hydrograph and can be applied to assess the risk of dam overtopping
Resumo:
Prediction at ungauged sites is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. Regression models relate physiographic and climatic basin characteristics to flood quantiles, which can be estimated from observed data at gauged sites. However, these models assume linear relationships between variables Prediction intervals are estimated by the variance of the residuals in the estimated model. Furthermore, the effect of the uncertainties in the explanatory variables on the dependent variable cannot be assessed. This paper presents a methodology to propagate the uncertainties that arise in the process of predicting flood quantiles at ungauged basins by a regression model. In addition, Bayesian networks were explored as a feasible tool for predicting flood quantiles at ungauged sites. Bayesian networks benefit from taking into account uncertainties thanks to their probabilistic nature. They are able to capture non-linear relationships between variables and they give a probability distribution of discharges as result. The methodology was applied to a case study in the Tagus basin in Spain.
Resumo:
Bibliography: p. 98.