939 resultados para robust estimation statistics
Resumo:
Let a class $\F$ of densities be given. We draw an i.i.d.\ sample from a density $f$ which may or may not be in $\F$. After every $n$, one must make a guess whether $f \in \F$ or not. A class is almost surely testable if there exists such a testing sequence such that for any $f$, we make finitely many errors almost surely. In this paper, several results are given that allowone to decide whether a class is almost surely testable. For example, continuity and square integrability are not testable, but unimodality, log-concavity, and boundedness by a given constant are.
Resumo:
The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.
Resumo:
We propose a new family of density functions that possess both flexibilityand closed form expressions for moments and anti-derivatives, makingthem particularly appealing for applications. We illustrate its usefulnessby applying our new family to obtain density forecasts of U.S. inflation.Our methods generate forecasts that improve on standard methods based on AR-ARCH models relying on normal or Student's t-distributional assumptions.
Resumo:
Many dynamic revenue management models divide the sale period into a finite number of periods T and assume, invoking a fine-enough grid of time, that each period sees at most one booking request. These Poisson-type assumptions restrict the variability of the demand in the model, but researchers and practitioners were willing to overlook this for the benefit of tractability of the models. In this paper, we criticize this model from another angle. Estimating the discrete finite-period model poses problems of indeterminacy and non-robustness: Arbitrarily fixing T leads to arbitrary control values and on the other hand estimating T from data adds an additional layer of indeterminacy. To counter this, we first propose an alternate finite-population model that avoids this problem of fixing T and allows a wider range of demand distributions, while retaining the useful marginal-value properties of the finite-period model. The finite-population model still requires jointly estimating market size and the parameters of the customer purchase model without observing no-purchases. Estimation of market-size when no-purchases are unobservable has rarely been attempted in the marketing or revenue management literature. Indeed, we point out that it is akin to the classical statistical problem of estimating the parameters of a binomial distribution with unknown population size and success probability, and hence likely to be challenging. However, when the purchase probabilities are given by a functional form such as a multinomial-logit model, we propose an estimation heuristic that exploits the specification of the functional form, the variety of the offer sets in a typical RM setting, and qualitative knowledge of arrival rates. Finally we perform simulations to show that the estimator is very promising in obtaining unbiased estimates of population size and the model parameters.
Resumo:
Structural equation models (SEM) are commonly used to analyze the relationship between variables some of which may be latent, such as individual ``attitude'' to and ``behavior'' concerning specific issues. A number of difficulties arise when we want to compare a large number of groups, each with large sample size, and the manifest variables are distinctly non-normally distributed. Using an specific data set, we evaluate the appropriateness of the following alternative SEM approaches: multiple group versus MIMIC models, continuous versus ordinal variables estimation methods, and normal theory versus non-normal estimation methods. The approaches are applied to the ISSP-1993 Environmental data set, with the purpose of exploring variation in the mean level of variables of ``attitude'' to and ``behavior''concerning environmental issues and their mutual relationship across countries. Issues of both theoretical and practical relevance arise in the course of this application.
Resumo:
This paper considers a job search model where the environment is notstationary along the unemployment spell and where jobs do not lastforever. Under this circumstance, reservation wages can be lower thanwithout separations, as in a stationary environment, but they can alsobe initially higher because of the non-stationarity of the model. Moreover,the time-dependence of reservation wages is stronger than with noseparations. The model is estimated structurally using Spanish data forthe period 1985-1996. The main finding is that, although the decrease inreservation wages is the main determinant of the change in the exit ratefrom unemployment for the first four months, later on the only effect comesfrom the job offer arrival rate, given that acceptance probabilities areroughly equal to one.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.
Resumo:
PURPOSE: The prognostic impact of complete response (CR) achievement in multiple myeloma (MM) has been shown mostly in the context of autologous stem-cell transplantation. Other levels of response have been defined because, even with high-dose therapy, CR is a relatively rare event. The purpose of this study was to analyze the prognostic impact of very good partial response (VGPR) in patients treated with high-dose therapy. PATIENTS AND METHODS: All patients were included in the Intergroupe Francophone du Myelome 99-02 and 99-04 trials and treated with vincristine, doxorubicin, and dexamethasone (VAD) induction therapy followed by double autologous stem-cell transplantation (ASCT). Best post-ASCT response assessment was available for 802 patients. RESULTS: With a median follow-up of 67 months, median event-free survival (EFS) and 5-year EFS were 42 months and 34%, respectively, for 405 patients who achieved at least VGPR after ASCT versus 32 months and 26% in 288 patients who achieved only partial remission (P = .005). Five-year overall survival (OS) was significantly superior in patients achieving at least VGPR (74% v 61% P = .0017). In multivariate analysis, achievement of less than VGPR was an independent factor predicting shorter EFS and OS. Response to VAD had no impact on EFS and OS. The impact of VGPR achievement on EFS and OS was significant in patients with International Staging System stages 2 to 3 and for patients with poor-risk cytogenetics t(4;14) or del(17p). CONCLUSION: In the context of ASCT, achievement of at least VGPR is a simple prognostic factor that has importance in intermediate and high-risk MM and can be informative in more patients than CR.
Resumo:
ABSTRACT Biomass is a fundamental measure for understanding the structure and functioning (e.g. fluxes of energy and nutrients in the food chain) of aquatic ecosystems. We aim to provide predictive models to estimate the biomass of Triplectides egleri Sattler, 1963, in a stream in Central Amazonia, based on body and case dimensions. We used body length, head-capsule width, interocular distance and case length and width to derive biomass estimations. Linear, exponential and power regression models were used to assess the relationship between biomass and body or case dimensions. All regression models used in the biomass estimation of T. egleri were significant. The best fit between biomass and body or case dimensions was obtained using the power model, followed by the exponential and linear models. Body length provided the best estimate of biomass. However, the dimensions of sclerotized structures (interocular distance and head-capsule width) also provided good biomass predictions, and may be useful in estimating biomass of preserved and/or damaged material. Case width was the dimension of the case that provided the best estimate of biomass. Despite the low relation, case width may be useful in studies that require low stress on individuals.
Resumo:
The package HIERFSTAT for the statistical software R, created by the R Development Core Team, allows the estimate of hierarchical F-statistics from a hierarchy with any numbers of levels. In addition, it allows testing the statistical significance of population differentiation for these different levels, using a generalized likelihood-ratio test. The package HIERFSTAT is available at http://www.unil.ch/popgen/softwares/hierfstat.htm.
Resumo:
The 2005-2006 (FY06) edition of Iowa Public Library Statistics includes information on income, expenditures, collections, circulation, and other measures, including staff. Each section is arranged by size code, then alphabetically by city. The totals and percentiles for each size code grouping are given immediately following the alphabetical listings. Totals for all reporting libraries are given at the end of each section. There are 542 libraries included in this publication; 10 did not report. The Table of Cities and Size Codes lists the libraries alphabetically and gives their size codes. The table allows a user of this publication to locate information about a specific library. The following table lists the size code designations, the population range in each size code, the number of libraries reporting in each size code, and the total population of the reporting libraries in each size code. The total population of the 542 libraries is 2,243,396. Population data is used to determine per capita figures used throughout the publication.
Resumo:
The 2006-2007 (FY07) edition of Iowa Public Library Statistics includes information on income, expenditures, collections, circulation, and other measures, including staff. Each section is arranged by size code, then alphabetically by city. The totals and percentiles for each size code grouping are given immediately following the alphabetical listings. Totals and medians for all reporting libraries are given at the end of each section. There are 543 libraries included in this publication; 530 submitted a report. The table of size codes (page 6) lists the libraries alphabetically. The libraries in each section of the publication are listed by size code, then alphabetically by city. The following table lists the size code designations, the population range in each size code, the number of libraries reporting in each size code, and the total population of the reporting libraries in each size code. The total population served by the 543 libraries is 2,248,279. Population data is used to determine per capita figures throughout the publication.
Resumo:
Two methods were evaluated for scaling a set of semivariograms into a unified function for kriging estimation of field-measured properties. Scaling is performed using sample variances and sills of individual semivariograms as scale factors. Theoretical developments show that kriging weights are independent of the scaling factor which appears simply as a constant multiplying both sides of the kriging equations. The scaling techniques were applied to four sets of semivariograms representing spatial scales of 30 x 30 m to 600 x 900 km. Experimental semivariograms in each set successfully coalesced into a single curve by variances and sills of individual semivariograms. To evaluate the scaling techniques, kriged estimates derived from scaled semivariogram models were compared with those derived from unscaled models. Differences in kriged estimates of the order of 5% were found for the cases in which the scaling technique was not successful in coalescing the individual semivariograms, which also means that the spatial variability of these properties is different. The proposed scaling techniques enhance interpretation of semivariograms when a variety of measurements are made at the same location. They also reduce computational times for kriging estimations because kriging weights only need to be calculated for one variable. Weights remain unchanged for all other variables in the data set whose semivariograms are scaled.