824 resultados para Nonparametric Estimation
Resumo:
Estimation of economic relationships often requires imposition of constraints such as positivity or monotonicity on each observation. Methods to impose such constraints, however, vary depending upon the estimation technique employed. We describe a general methodology to impose (observation-specific) constraints for the class of linear regression estimators using a method known as constraint weighted bootstrapping. While this method has received attention in the nonparametric regression literature, we show how it can be applied for both parametric and nonparametric estimators. A benefit of this method is that imposing numerous constraints simultaneously can be performed seamlessly. We apply this method to Norwegian dairy farm data to estimate both unconstrained and constrained parametric and nonparametric models.
Resumo:
2000 Mathematics Subject Classification: 62G08, 62P30.
Resumo:
Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.
Resumo:
L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.
Resumo:
ABSTRACT: This study aimed to estimate the probability of climatological water deficit in an experimental watershed in the Cerrado biome, located in the central plateau of Brazil. For that, it was used a time series of 31 years (1982?2012). The probable climatological water deficit was calculated by the difference between rainfall and probable reference evapotranspiration, on a decennial scale. The reference evapotranspiration (ET0) was estimated by the standard FAO-56 Penman-Monteith method. To estimate water deficit, it was used gamma distribution, time series of rainfall and reference evapotranspiration. The adherence of the estimated probabilities to the observed data was verified by the Kolmogorov-Smirnov nonparametric test, with significance level (a-0.05), which presented a good adjustment to the distribution models. It was observed a climatological water deficit, in greater or lesser intensity, between the annual decennials 2 and 32.
Resumo:
The purpose of this study was to correlate the pre-operative imaging, vascularity of the proximal pole, and histology of the proximal pole bone of established scaphoid fracture non-union. This was a prospective non-controlled experimental study. Patients were evaluated pre-operatively for necrosis of the proximal scaphoid fragment by radiography, computed tomography (CT) and magnetic resonance imaging (MRI). Vascular status of the proximal scaphoid was determined intra-operatively, demonstrating the presence or absence of puncate bone bleeding. Samples were harvested from the proximal scaphoid fragment and sent for pathological examination. We determined the association between the imaging and intra-operative examination and histological findings. We evaluated 19 male patients diagnosed with scaphoid nonunion. CT evaluation showed no correlation to scaphoid proximal fragment necrosis. MRI showed marked low signal intensity on T1-weighted images that confirmed the histological diagnosis of necrosis in the proximal scaphoid fragment in all patients. Intra-operative assessment showed that 90% of bones had absence of intra-operative puncate bone bleeding, which was confirmed necrosis by microscopic examination. In scaphoid nonunion MRI images with marked low signal intensity on T1-weighted images and the absence of intra-operative puncate bone bleeding are strong indicatives of osteonecrosis of the proximal fragment.
Resumo:
We present a computer program developed for estimating penetrance rates in autosomal dominant diseases by means of family kinship and phenotype information contained within the pedigrees. The program also determines the exact 95% credibility interval for the penetrance estimate. Both executable (PenCalc for Windows) and web versions (PenCalcWeb) of the software are available. The web version enables further calculations, such as heterozygosity probabilities and assessment of offspring risks for all individuals in the pedigrees. Both programs can be accessed and down-loaded freely at the home-page address http://www.ib.usp.br/~otto/software.htm.
Resumo:
It is well known that striation spacing may be related to the crack growth rate, da/dN, through Paris equation, as well as the maximum and minimum loads under service loading conditions. These loads define the load ratio, R, and are considered impossible to be evaluated from the inter-spacing striations analysis. In this way, this study discusses the methodology proposed by Furukawa to evaluate the maximum and minimum loads based on the experimental fact that the relative height of a striation, H, and the striation spacing, s, are strongly influenced by the load ratio, R. Fatigue tests in C(T) specimens were conducted on SAE 7475-T7351 Al alloy plates at room temperature and the results showed a straightforward correlation between the parameters H, s, and R. Measurements of striation height, H, were performed using scanning electron microscopy and field emission gun (FEG) after sectioning the specimen at a large inclined angle to amplify the height of the striations. The results showed that for increasing R the values of H/s tend to increase. Striation height, striation spacing, and load ratio correlations were obtained, which allows one to estimate service loadings from fatigue fracture surface survey.
Resumo:
The aim of this study was to compare REML/BLUP and Least Square procedures in the prediction and estimation of genetic parameters and breeding values in soybean progenies. F(2:3) and F(4:5) progenies were evaluated in the 2005/06 growing season and the F(2:4) and F(4:6) generations derived thereof were evaluated in 2006/07. These progenies were originated from two semi-early, experimental lines that differ in grain yield. The experiments were conducted in a lattice design and plots consisted of a 2 m row, spaced 0.5 m apart. The trait grain yield per plot was evaluated. It was observed that early selection is more efficient for the discrimination of the best lines from the F(4) generation onwards. No practical differences were observed between the least square and REML/BLUP procedures in the case of the models and simplifications for REML/BLUP used here.
Resumo:
This paper presents a new statistical algorithm to estimate rainfall over the Amazon Basin region using the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI). The algorithm relies on empirical relationships derived for different raining-type systems between coincident measurements of surface rainfall rate and 85-GHz polarization-corrected brightness temperature as observed by the precipitation radar (PR) and TMI on board the TRMM satellite. The scheme includes rain/no-rain area delineation (screening) and system-type classification routines for rain retrieval. The algorithm is validated against independent measurements of the TRMM-PR and S-band dual-polarization Doppler radar (S-Pol) surface rainfall data for two different periods. Moreover, the performance of this rainfall estimation technique is evaluated against well-known methods, namely, the TRMM-2A12 [ the Goddard profiling algorithm (GPROF)], the Goddard scattering algorithm (GSCAT), and the National Environmental Satellite, Data, and Information Service (NESDIS) algorithms. The proposed algorithm shows a normalized bias of approximately 23% for both PR and S-Pol ground truth datasets and a mean error of 0.244 mm h(-1) ( PR) and -0.157 mm h(-1)(S-Pol). For rain volume estimates using PR as reference, a correlation coefficient of 0.939 and a normalized bias of 0.039 were found. With respect to rainfall distributions and rain area comparisons, the results showed that the formulation proposed is efficient and compatible with the physics and dynamics of the observed systems over the area of interest. The performance of the other algorithms showed that GSCAT presented low normalized bias for rain areas and rain volume [0.346 ( PR) and 0.361 (S-Pol)], and GPROF showed rainfall distribution similar to that of the PR and S-Pol but with a bimodal distribution. Last, the five algorithms were evaluated during the TRMM-Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) 1999 field campaign to verify the precipitation characteristics observed during the easterly and westerly Amazon wind flow regimes. The proposed algorithm presented a cumulative rainfall distribution similar to the observations during the easterly regime, but it underestimated for the westerly period for rainfall rates above 5 mm h(-1). NESDIS(1) overestimated for both wind regimes but presented the best westerly representation. NESDIS(2), GSCAT, and GPROF underestimated in both regimes, but GPROF was closer to the observations during the easterly flow.
Resumo:
The reverse engineering problem addressed in the present research consists of estimating the thicknesses and the optical constants of two thin films deposited on a transparent substrate using only transmittance data through the whole stack. No functional dispersion relation assumptions are made on the complex refractive index. Instead, minimal physical constraints are employed, as in previous works of some of the authors where only one film was considered in the retrieval algorithm. To our knowledge this is the first report on the retrieval of the optical constants and the thickness of multiple film structures using only transmittance data that does not make use of dispersion relations. The same methodology may be used if the available data correspond to normal reflectance. The software used in this work is freely available through the PUMA Project web page (http://www.ime.usp.br/similar to egbirgin/puma/). (C) 2008 Optical Society of America
Resumo:
We consider the problem of interaction neighborhood estimation from the partial observation of a finite number of realizations of a random field. We introduce a model selection rule to choose estimators of conditional probabilities among natural candidates. Our main result is an oracle inequality satisfied by the resulting estimator. We use then this selection rule in a two-step procedure to evaluate the interacting neighborhoods. The selection rule selects a small prior set of possible interacting points and a cutting step remove from this prior set the irrelevant points. We also prove that the Ising models satisfy the assumptions of the main theorems, without restrictions on the temperature, on the structure of the interacting graph or on the range of the interactions. It provides therefore a large class of applications for our results. We give a computationally efficient procedure in these models. We finally show the practical efficiency of our approach in a simulation study.
Resumo:
Objective. - The aim of this study was to propose a new method that allows for the estimation of critical power (CP) from non-exhaustive tests using ratings of perceived exertion (RPE). Methods. - Twenty-two subjects underwent two practice trials for ergometer and Borg 15-point scale familiarization, and adaptation to severe exhaustive exercise. After then, four exercise bouts were performed on different days for the estimation of CP and anaerobic work capacity (AWC) by linear work-time equation, and CP(15), CP(17), AWC(15) and AWC(17) were estimated using the work and time to attainment of RPE15 and RPE17 based on the Borg 15-point scale. Results. - The CP, CP(15) and CP(17) (170-177W) were not significantly different (P>0.05). However, AWC, AWC(15) and AWC(17) were all different from each other. The correlations between CP(15) and CP(17), with CP were strong (R=0.871 and 0.911, respectively), but the AWC(15) and AWC(17) were not significantly correlated with AWC. Conclusion. - Sub-maximal. RPE responses can be used for the estimation of CP from non-exhaustive exercise protocols. (C) 2009 Elsevier Masson SAS. All rights reserved.
Resumo:
Fourier transform near infrared (FT-NIR) spectroscopy was evaluated as an analytical too[ for monitoring residual Lignin, kappa number and hexenuronic acids (HexA) content in kraft pulps of Eucalyptus globulus. Sets of pulp samples were prepared under different cooking conditions to obtain a wide range of compound concentrations that were characterised by conventional wet chemistry analytical methods. The sample group was also analysed using FT-NIR spectroscopy in order to establish prediction models for the pulp characteristics. Several models were applied to correlate chemical composition in samples with the NIR spectral data by means of PCR or PLS algorithms. Calibration curves were built by using all the spectral data or selected regions. Best calibration models for the quantification of lignin, kappa and HexA were proposed presenting R-2 values of 0.99. Calibration models were used to predict pulp titers of 20 external samples in a validation set. The lignin concentration and kappa number in the range of 1.4-18% and 8-62, respectively, were predicted fairly accurately (standard error of prediction, SEP 1.1% for lignin and 2.9 for kappa). The HexA concentration (range of 5-71 mmol kg(-1) pulp) was more difficult to predict and the SEP was 7.0 mmol kg(-1) pulp in a model of HexA quantified by an ultraviolet (UV) technique and 6.1 mmol kg(-1) pulp in a model of HexA quantified by anion-exchange chromatography (AEC). Even in wet chemical procedures used for HexA determination, there is no good agreement between methods as demonstrated by the UV and AEC methods described in the present work. NIR spectroscopy did provide a rapid estimate of HexA content in kraft pulps prepared in routine cooking experiments.
Resumo:
The crossflow filtration process differs of the conventional filtration by presenting the circulation flow tangentially to the filtration surface. The conventional mathematical models used to represent the process have some limitations in relation to the identification and generalization of the system behaviour. In this paper, a system based on artificial neural networks is developed to overcome the problems usually found in the conventional mathematical models. More specifically, the developed system uses an artificial neural network that simulates the behaviour of the crossflow filtration process in a robust way. Imprecisions and uncertainties associated with the measurements made on the system are automatically incorporated in the neural approach. Simulation results are presented to justify the validity of the proposed approach. (C) 2007 Elsevier B.V. All rights reserved.