74 resultados para Jackknife
Resumo:
One of the fundamental econometric models in finance is predictive regression. The standard least squares method produces biased coefficient estimates when the regressor is persistent and its innovations are correlated with those of the dependent variable. This article proposes a general and convenient method based on the jackknife technique to tackle the estimation problem. The proposed method reduces the bias for both single- and multiple-regressor models and for both short- and long-horizon regressions. The effectiveness of the proposed method is demonstrated by simulations. An empirical application to equity premium prediction using the dividend yield and the short rate highlights the differences between the results by the standard approach and those by the bias-reduced estimator. The significant predictive variables under the ordinary least squares become insignificant after adjusting for the finite-sample bias. These discrepancies suggest that bias reduction in predictive regressions is important in practical applications.
Resumo:
A metodologia baseada na melhor predição linear empírica não enviesada (Empirical Best Linear Unbiased Prediction), consagrada com o acrónimo EBLUP, é muito utilizada na estimação de parâmetros para pequenos domínios. Apesar da relativa facilidade de dedução dos EBLUPs, mesmo num contexto de um modelo longitudinal, a medição da sua qualidade é um problema complexo devido à di culdade de estimação do erro quadrático médio de predição (EQMP) de tais preditores. Neste trabalho utiliza-se um estimador de parâmetros de interesse em pequenos domínios assistido pelo modelo temporal de Rao-Yu (Rao e Yu, 1994). O EBLUP temporal é apresentado e é revisitada a aproximação analítica assimptótica do EQMP do EBLUP temporal proposta por Rao e Yu (1994). Sob o modelo de Rao-Yu, é proposta uma metodologia jackknife ponderada para estimar o EQMP do EBLUP, desenvolvida a partir dos trabalhos de Chen e Lahiri (2008). Foi realizado um estudo por simulação com o objectivo de comparar o desempenho do estimador proposto com o obtido por via da aproximação analítica do EQMP.
Resumo:
The jackknife method is often used for variance estimation in sample surveys but has only been developed for a limited class of sampling designs.We propose a jackknife variance estimator which is defined for any without-replacement unequal probability sampling design. We demonstrate design consistency of this estimator for a broad class of point estimators. A Monte Carlo study shows how the proposed estimator may improve on existing estimators.
Resumo:
Imputation is commonly used to compensate for item non-response in sample surveys. If we treat the imputed values as if they are true values, and then compute the variance estimates by using standard methods, such as the jackknife, we can seriously underestimate the true variances. We propose a modified jackknife variance estimator which is defined for any without-replacement unequal probability sampling design in the presence of imputation and non-negligible sampling fraction. Mean, ratio and random-imputation methods will be considered. The practical advantage of the method proposed is its breadth of applicability.
Resumo:
Few studies have evaluated the reliability of lifetime sun exposure estimated from inquiring about the number of hours people spent outdoors in a given period on a typical weekday or weekend day (the time-based approach). Some investigations have suggested that women have a particularly difficult task in estimating time outdoors in adulthood due to their family and occupational roles. We hypothesized that people might gain additional memory cues and estimate lifetime hours spent outdoors more reliably if asked about time spent outdoors according to specific activities (an activity-based approach). Using self-administered, mailed questionnaires, test-retest responses to time-based and to activity-based approaches were evaluated in 124 volunteer radiologic technologist participants from the United States: 64 females and 60 males 48 to 80 years of age. Intraclass correlation coefficients (ICC) were used to evaluate the test-retest reliability of average number of hours spent outdoors in the summer estimated for each approach. We tested the differences between the two ICCs, corresponding to each approach, using a t test with the variance of the difference estimated by the jackknife method. During childhood and adolescence, the two approaches gave similar ICCs for average numbers of hours spent outdoors in the summer. By contrast, compared with the time-based approach, the activity-based approach showed significantly higher ICCs during adult ages (0.69 versus 0.43, P = 0.003) and over the lifetime (0.69 versus 0.52, P = 0.05); the higher ICCs for the activity-based questionnaire were primarily derived from the results for females. Research is needed to further improve the activity-based questionnaire approach for long-term sun exposure assessment. (Cancer Epidemiol Biomarkers Prev 2009;18(2):464–71)
Rainfall variability drives interannual variation in N2O emissions from a humid, subtropical pasture
Resumo:
Variations in interannual rainfall totals can lead to large uncertainties in annual N2O emission budget estimates from short term field studies. The interannual variation in nitrous oxide (N2O) emissions from a subtropical pasture in Queensland, Australia, was examined using continuous measurements of automated chambers over 2 consecutive years. Nitrous oxide emissions were highest during the summer months and were highly episodic, related more to the size and distribution of rain events than soil water content. Over 48% of the total N2O emitted was lost in just 16% of measurement days. Interannual variation in annual N2O estimates was high, with cumulative emissions increasing with decreasing rainfall. Cumulative emissions averaged 1826.7 ± 199.9 g N2O-N ha−1 yr−1 over the two year period, though emissions from 2008 (2148 ± 273 g N2O-N ha−1 yr−1) were 42% higher than 2007 (1504 ± 126 g N2O-N ha−1 yr−1). This increase in annual emissions coincided with almost half of the summer precipitation from 2007 to 2008. Emissions dynamics were chiefly driven by the distribution and size of rain events which varied on a seasonal and annual basis. Sampling frequency effects on cumulative N2O flux estimation were assessed using a jackknife technique to inform future manual sampling campaigns. Test subsets of the daily measured data were generated for the pasture and two adjacent land-uses (rainforest and lychee orchard) by selecting measured flux values at regular time intervals ranging from 1 to 30 days. Errors associated with weekly sampling were up to 34% of the sub-daily mean and were highly biased towards overestimation if strategically sampled following rain events. Sampling time of day also played a critical role. Morning sampling best represented the 24 hour mean in the pasture, whereas sampling at noon proved the most accurate in the shaded rainforest and lychee orchard.
Resumo:
Membrane proteins play important roles in many biochemical processes and are also attractive targets of drug discovery for various diseases. The elucidation of membrane protein types provides clues for understanding the structure and function of proteins. Recently we developed a novel system for predicting protein subnuclear localizations. In this paper, we propose a simplified version of our system for predicting membrane protein types directly from primary protein structures, which incorporates amino acid classifications and physicochemical properties into a general form of pseudo-amino acid composition. In this simplified system, we will design a two-stage multi-class support vector machine combined with a two-step optimal feature selection process, which proves very effective in our experiments. The performance of the present method is evaluated on two benchmark datasets consisting of five types of membrane proteins. The overall accuracies of prediction for five types are 93.25% and 96.61% via the jackknife test and independent dataset test, respectively. These results indicate that our method is effective and valuable for predicting membrane protein types. A web server for the proposed method is available at http://www.juemengt.com/jcc/memty_page.php
Resumo:
In this paper, we aim at predicting protein structural classes for low-homology data sets based on predicted secondary structures. We propose a new and simple kernel method, named as SSEAKSVM, to predict protein structural classes. The secondary structures of all protein sequences are obtained by using the tool PSIPRED and then a linear kernel on the basis of secondary structure element alignment scores is constructed for training a support vector machine classifier without parameter adjusting. Our method SSEAKSVM was evaluated on two low-homology datasets 25PDB and 1189 with sequence homology being 25% and 40%, respectively. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies on these two data sets are 86.3% and 84.5%, respectively, which are higher than those obtained by other existing methods. Especially, our method achieves higher accuracies (88.1% and 88.5%) for differentiating the α + β class and the α/β class compared to other methods. This suggests that our method is valuable to predict protein structural classes particularly for low-homology protein sequences. The source code of the method in this paper can be downloaded at http://math.xtu.edu.cn/myphp/math/research/source/SSEAK_source_code.rar.
Resumo:
1 Species-accumulation curves for woody plants were calculated in three tropical forests, based on fully mapped 50-ha plots in wet, old-growth forest in Peninsular Malaysia, in moist, old-growth forest in central Panama, and in dry, previously logged forest in southern India. A total of 610 000 stems were identified to species and mapped to < Im accuracy. Mean species number and stem number were calculated in quadrats as small as 5 m x 5 m to as large as 1000 m x 500 m, for a variety of stem sizes above 10 mm in diameter. Species-area curves were generated by plotting species number as a function of quadrat size; species-individual curves were generated from the same data, but using stem number as the independent variable rather than area. 2 Species-area curves had different forms for stems of different diameters, but species-individual curves were nearly independent of diameter class. With < 10(4) stems, species-individual curves were concave downward on log-log plots, with curves from different forests diverging, but beyond about 104 stems, the log-log curves became nearly linear, with all three sites having a similar slope. This indicates an asymptotic difference in richness between forests: the Malaysian site had 2.7 times as many species as Panama, which in turn was 3.3 times as rich as India. 3 Other details of the species-accumulation relationship were remarkably similar between the three sites. Rectangular quadrats had 5-27% more species than square quadrats of the same area, with longer and narrower quadrats increasingly diverse. Random samples of stems drawn from the entire 50 ha had 10-30% more species than square quadrats with the same number of stems. At both Pasoh and BCI, but not Mudumalai. species richness was slightly higher among intermediate-sized stems (50-100mm in diameter) than in either smaller or larger sizes, These patterns reflect aggregated distributions of individual species, plus weak density-dependent forces that tend to smooth the species abundance distribution and 'loosen' aggregations as stems grow. 4 The results provide support for the view that within each tree community, many species have their abundance and distribution guided more by random drift than deterministic interactions. The drift model predicts that the species-accumulation curve will have a declining slope on a log-log plot, reaching a slope of O.1 in about 50 ha. No other model of community structure can make such a precise prediction. 5 The results demonstrate that diversity studies based on different stem diameters can be compared by sampling identical numbers of stems. Moreover, they indicate that stem counts < 1000 in tropical forests will underestimate the percentage difference in species richness between two diverse sites. Fortunately, standard diversity indices (Fisher's sc, Shannon-Wiener) captured diversity differences in small stem samples more effectively than raw species richness, but both were sample size dependent. Two nonparametric richness estimators (Chao. jackknife) performed poorly, greatly underestimating true species richness.
Resumo:
The paper studies stochastic approximation as a technique for bias reduction. The proposed method does not require approximating the bias explicitly, nor does it rely on having independent identically distributed (i.i.d.) data. The method always removes the leading bias term, under very mild conditions, as long as auxiliary samples from distributions with given parameters are available. Expectation and variance of the bias-corrected estimate are given. Examples in sequential clinical trials (non-i.i.d. case), curved exponential models (i.i.d. case) and length-biased sampling (where the estimates are inconsistent) are used to illustrate the applications of the proposed method and its small sample properties.
Resumo:
This thesis studies the tree species’ juvenile diversity in cacao (Theobroma cacao L.) based agroforestry and in primary forest in a natural conservation forest environment of Lore Lindu National Park, Sulawesi, Indonesia. Species’ adult composition in Lore Lindu National Park is relatively well studied, less is known about tree species’ diversity in seedling communities particularly in frequently disturbed cacao agroforestry field environment. Cacao production forms a potentially serious thread for maintaining the conservation areas pristine and forested in Sulawesi. The impacts of cacao production on natural environment are directly linked to the diversity and abundance of shade tree usage. The study aims at comparing differences between cacao agroforestry and natural forest in the surrounding area in their species composition in seedling and sapling size categories. The study was carried out in two parts. Biodiversity inventory of seedlings and saplings was combined with social survey with farmer interviews. Aim of the survey was to gain knowledge of the cacao fields, and farmers’ observations and choices regarding tree species associated with cacao. Data was collected in summer 2008. The assessment of the impact of environmental factors of solar radiation, weeding frequency, cacao tree planting density, distance to forest and distance to main park road, and type of habitat on seedling and sapling compositions was done with Non-metric Multidimensional Scaling (NMS). Outlier analysis was used to assess distorting variables for NMS, and Multi-Response Permutation Procedures (MRPP) analysis to differentiate the impact of categorical variables. Sampling success was estimated with rarefaction curves and jackknife estimate of species richness. In the inventory 135 species of trees and shrubs were found. Only some agroforestry related species were dominating. The most species rich were sapling communities in forest habitat. NMS was showing generally low linear correlation between variation of species composition and environmental variables. Solar radiation was having most significance as explaining variable. The most clearly separated in ordination were cacao and forest habitats. The results of seedling and sapling inventory were only partly coinciding with farmers’ knowledge of the tree species occurring on their fields. More research with frequent assessment of seedling cohorts is needed due to natural variability of cohorts and high mortality rate of seedlings.
Resumo:
The relationship between site characteristics and understorey vegetation composition was analysed with quantitative methods, especially from the viewpoint of site quality estimation. Theoretical models were applied to an empirical data set collected from the upland forests of southern Finland comprising 104 sites dominated by Scots pine (Pinus sylvestris L.), and 165 sites dominated by Norway spruce (Picea abies (L.) Karsten). Site index H100 was used as an independent measure of site quality. A new model for the estimation of site quality at sites with a known understorey vegetation composition was introduced. It is based on the application of Bayes' theorem to the density function of site quality within the study area combined with the species-specific presence-absence response curves. The resulting posterior probability density function may be used for calculating an estimate for the site variable. Using this method, a jackknife estimate of site index H100 was calculated separately for pine- and spruce-dominated sites. The results indicated that the cross-validation root mean squared error (RMSEcv) of the estimates improved from 2.98 m down to 2.34 m relative to the "null" model (standard deviation of the sample distribution) in pine-dominated forests. In spruce-dominated forests RMSEcv decreased from 3.94 m down to 3.16 m. In order to assess these results, four other estimation methods based on understorey vegetation composition were applied to the same data set. The results showed that none of the methods was clearly superior to the others. In pine-dominated forests, RMSEcv varied between 2.34 and 2.47 m, and the corresponding range for spruce-dominated forests was from 3.13 to 3.57 m.
Resumo:
Applying Turkey's jackknife method on MSY estimates from the surplus production models of Schaefer and Fox showed that the optimum yield for shrimps in industrial fishery in Sierra Leone is estimated at 2,686.8 t with 15,822 fishing days. Annual catch for 1996 was 2,788 t, indicating an escalation in exploitation which, if prolonged, could bring reduced productivity as experienced in the fishery some years ago.
Resumo:
The primary objective of this study was to predict the distribution of mesophotic hard corals in the Au‘au Channel in the Main Hawaiian Islands (MHI). Mesophotic hard corals are light-dependent corals adapted to the low light conditions at approximately 30 to 150 m in depth. Several physical factors potentially influence their spatial distribution, including aragonite saturation, alkalinity, pH, currents, water temperature, hard substrate availability and the availability of light at depth. Mesophotic corals and mesophotic coral ecosystems (MCEs) have increasingly been the subject of scientific study because they are being threatened by a growing number of anthropogenic stressors. They are the focus of this spatial modeling effort because the Hawaiian Islands Humpback Whale National Marine Sanctuary (HIHWNMS) is exploring the expansion of its scope—beyond the protection of the North Pacific Humpback Whale (Megaptera novaeangliae)—to include the conservation and management of these ecosystem components. The present study helps to address this need by examining the distribution of mesophotic corals in the Au‘au Channel region. This area is located between the islands of Maui, Lanai, Molokai and Kahoolawe, and includes parts of the Kealaikahiki, Alalākeiki and Kalohi Channels. It is unique, not only in terms of its geology, but also in terms of its physical oceanography and local weather patterns. Several physical conditions make it an ideal place for mesophotic hard corals, including consistently good water quality and clarity because it is flushed by tidal currents semi-diurnally; it has low amounts of rainfall and sediment run-off from the nearby land; and it is largely protected from seasonally strong wind and wave energy. Combined, these oceanographic and weather conditions create patches of comparatively warm, calm, clear waters that remain relatively stable through time. Freely available Maximum Entropy modeling software (MaxEnt 3.3.3e) was used to create four separate maps of predicted habitat suitability for: (1) all mesophotic hard corals combined, (2) Leptoseris, (3) Montipora and (4) Porites genera. MaxEnt works by analyzing the distribution of environmental variables where species are present, so it can find other areas that meet all of the same environmental constraints. Several steps (Figure 0.1) were required to produce and validate four ensemble predictive models (i.e., models with 10 replicates each). Approximately 2,000 georeferenced records containing information about mesophotic coral occurrence and 34 environmental predictors describing the seafloor’s depth, vertical structure, available light, surface temperature, currents and distance from shoreline at three spatial scales were used to train MaxEnt. Fifty percent of the 1,989 records were randomly chosen and set aside to assess each model replicate’s performance using Receiver Operating Characteristic (ROC), Area Under the Curve (AUC) values. An additional 1,646 records were also randomly chosen and set aside to independently assess the predictive accuracy of the four ensemble models. Suitability thresholds for these models (denoting where corals were predicted to be present/absent) were chosen by finding where the maximum number of correctly predicted presence and absence records intersected on each ROC curve. Permutation importance and jackknife analysis were used to quantify the contribution of each environmental variable to the four ensemble models.