931 resultados para Bootstrap weights approach
Resumo:
The motivation of the study stems from the results reported in the Excellence in Research for Australia (ERA) 2010 report. The report showed that only 12 universities performed research at or above international standards, of which, the Group of Eight (G8) universities filled the top eight spots. While performance of universities was based on number of research outputs, total amount of research income and other quantitative indicators, the measure of efficiency or productivity was not considered. The objectives of this paper are twofold. First, to provide a review of the research performance of 37 Australian universities using the data envelopment analysis (DEA) bootstrap approach of Simar and Wilson (2007). Second, to determine sources of productivity drivers by regressing the efficiency scores against a set of environmental variables.
Resumo:
In this work, we investigate an alternative bootstrap approach based on a result of Ramsey [F.L. Ramsey, Characterization of the partial autocorrelation function, Ann. Statist. 2 (1974), pp. 1296-1301] and on the Durbin-Levinson algorithm to obtain a surrogate series from linear Gaussian processes with long range dependence. We compare this bootstrap method with other existing procedures in a wide Monte Carlo experiment by estimating, parametrically and semi-parametrically, the memory parameter d. We consider Gaussian and non-Gaussian processes to prove the robustness of the method to deviations from normality. The approach is also useful to estimate confidence intervals for the memory parameter d by improving the coverage level of the interval.
Resumo:
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.
Resumo:
Between 2001 and 2005, the US airline industry faced financial turmoil. At the same time, the European airline industry entered a period of substantive deregulation. This period witnessed opportunities for low-cost carriers to become more competitive in the market as a result of these combined events. To help assess airline performance in the aftermath of these events, this paper provides new evidence of technical efficiency for 42 national and international airlines in 2006 using the data envelopment analysis (DEA) bootstrap approach first proposed by Simar and Wilson (J Econ, 136:31-64, 2007). In the first stage, technical efficiency scores are estimated using a bootstrap DEA model. In the second stage, a truncated regression is employed to quantify the economic drivers underlying measured technical efficiency. The results highlight the key role played by non-discretionary inputs in measures of airline technical efficiency.
Resumo:
This paper seeks to explain the lagging productivity in Singapore’s manufacturing noted in the statements of the Economic Strategies Committee Report 2010. Two methods are employed: the Malmquist productivity to measure total factor productivity change and Simar and Wilson’s (J Econ, 136:31–64, 2007) bootstrapped truncated regression approach. In the first stage, the nonparametric data envelopment analysis is used to measure technical efficiency. To quantify the economic drivers underlying inefficiencies, the second stage employs a bootstrapped truncated regression whereby bias-corrected efficiency estimates are regressed against explanatory variables. The findings reveal that growth in total factor productivity was attributed to efficiency change with no technical progress. Most industries were technically inefficient throughout the period except for ‘Pharmaceutical Products’. Sources of efficiency were attributed to quality of worker and flexible work arrangements while incessant use of foreign workers lowered efficiency.
Resumo:
Recent literature has argued that environmental efficiency (EE), which is built on the materials balance (MB) principle, is more suitable than other EE measures in situations where the law of mass conversation regulates production processes. In addition, the MB-based EE method is particularly useful in analysing possible trade-offs between cost and environmental performance. Identifying determinants of MB-based EE can provide useful information to decision makers but there are very few empirical investigations into this issue. This article proposes the use of data envelopment analysis and stochastic frontier analysis techniques to analyse variation in MB-based EE. Specifically, the article develops a stochastic nutrient frontier and nutrient inefficiency model to analyse determinants of MB-based EE. The empirical study applies both techniques to investigate MB-based EE of 96 rice farms in South Korea. The size of land, fertiliser consumption intensity, cost allocative efficiency, and the share of owned land out of total land are found to be correlated with MB-based EE. The results confirm the presence of a trade-off between MB-based EE and cost allocative efficiency and this finding, favouring policy interventions to help farms simultaneously achieve cost efficiency and MP-based EE.
Resumo:
This paper seeks to explain the lagging productivity in Singapore’s manufacturing noted in the statements of the Economic Strategies Committee Report 2010. Two methods are employed: the Malmquist productivity to measure total factor productivity (TFP) change and Simar and Wilson’s (2007) bootstrapped truncated regression approach which first derives bias-corrected efficiency estimates before being regressed against explanatory variables to help quantify sources of inefficiencies. The findings reveal that growth in total factor productivity was attributed to efficiency change with no technical progress. Sources of efficiency were attributed to quality of worker and flexible work arrangements while the use of foreign workers lowered efficiency.
Resumo:
Between 2001 and 2005, the US airline industry faced financial turmoil while the European airline industry entered a period of substantive deregulation. Consequently, this opened up opportunities for low-cost carriers to become more competitive in the market. To assess airline performance and identify the sources of efficiency in the immediate aftermath of these events, we employ a bootstrap data envelopment analysis truncated regression approach. The results suggest that at the time the mainstream airlines needed to significantly reorganize and rescale their operations to remain competitive. In the second-stage analysis, the results indicate that private ownership, status as a low-cost carrier, and improvements in weight load contributed to better organizational efficiency.
Resumo:
Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.
Resumo:
BACKGROUND Measurement of the global burden of disease with disability-adjusted life-years (DALYs) requires disability weights that quantify health losses for all non-fatal consequences of disease and injury. There has been extensive debate about a range of conceptual and methodological issues concerning the definition and measurement of these weights. Our primary objective was a comprehensive re-estimation of disability weights for the Global Burden of Disease Study 2010 through a large-scale empirical investigation in which judgments about health losses associated with many causes of disease and injury were elicited from the general public in diverse communities through a new, standardised approach. METHODS We surveyed respondents in two ways: household surveys of adults aged 18 years or older (face-to-face interviews in Bangladesh, Indonesia, Peru, and Tanzania; telephone interviews in the USA) between Oct 28, 2009, and June 23, 2010; and an open-access web-based survey between July 26, 2010, and May 16, 2011. The surveys used paired comparison questions, in which respondents considered two hypothetical individuals with different, randomly selected health states and indicated which person they regarded as healthier. The web survey added questions about population health equivalence, which compared the overall health benefits of different life-saving or disease-prevention programmes. We analysed paired comparison responses with probit regression analysis on all 220 unique states in the study. We used results from the population health equivalence responses to anchor the results from the paired comparisons on the disability weight scale from 0 (implying no loss of health) to 1 (implying a health loss equivalent to death). Additionally, we compared new disability weights with those used in WHO's most recent update of the Global Burden of Disease Study for 2004. FINDINGS 13,902 individuals participated in household surveys and 16,328 in the web survey. Analysis of paired comparison responses indicated a high degree of consistency across surveys: correlations between individual survey results and results from analysis of the pooled dataset were 0·9 or higher in all surveys except in Bangladesh (r=0·75). Most of the 220 disability weights were located on the mild end of the severity scale, with 58 (26%) having weights below 0·05. Five (11%) states had weights below 0·01, such as mild anaemia, mild hearing or vision loss, and secondary infertility. The health states with the highest disability weights were acute schizophrenia (0·76) and severe multiple sclerosis (0·71). We identified a broad pattern of agreement between the old and new weights (r=0·70), particularly in the moderate-to-severe range. However, in the mild range below 0·2, many states had significantly lower weights in our study than previously. INTERPRETATION This study represents the most extensive empirical effort as yet to measure disability weights. By contrast with the popular hypothesis that disability assessments vary widely across samples with different cultural environments, we have reported strong evidence of highly consistent results.
Resumo:
In this paper it is demonstrated how the Bayesian parametric bootstrap can be adapted to models with intractable likelihoods. The approach is most appealing when the semi-automatic approximate Bayesian computation (ABC) summary statistics are selected. After a pilot run of ABC, the likelihood-free parametric bootstrap approach requires very few model simulations to produce an approximate posterior, which can be a useful approximation in its own right. An alternative is to use this approximation as a proposal distribution in ABC algorithms to make them more efficient. In this paper, the parametric bootstrap approximation is used to form the initial importance distribution for the sequential Monte Carlo and the ABC importance and rejection sampling algorithms. The new approach is illustrated through a simulation study of the univariate g-and- k quantile distribution, and is used to infer parameter values of a stochastic model describing expanding melanoma cell colonies.
Resumo:
We consider estimating the total load from frequent flow data but less frequent concentration data. There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates that minimizes the biases and makes use of informative predictive variables. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized rating-curve approach with additional predictors that capture unique features in the flow data, such as the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and the discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. Forming this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach for two rivers delivering to the Great Barrier Reef, Queensland, Australia. One is a data set from the Burdekin River, and consists of the total suspended sediment (TSS) and nitrogen oxide (NO(x)) and gauged flow for 1997. The other dataset is from the Tully River, for the period of July 2000 to June 2008. For NO(x) Burdekin, the new estimates are very similar to the ratio estimates even when there is no relationship between the concentration and the flow. However, for the Tully dataset, by incorporating the additional predictive variables namely the discounted flow and flow phases (rising or recessing), we substantially improved the model fit, and thus the certainty with which the load is estimated.
Resumo:
There are numerous load estimation methods available, some of which are captured in various online tools. However, most estimators are subject to large biases statistically, and their associated uncertainties are often not reported. This makes interpretation difficult and the estimation of trends or determination of optimal sampling regimes impossible to assess. In this paper, we first propose two indices for measuring the extent of sampling bias, and then provide steps for obtaining reliable load estimates by minimizing the biases and making use of possible predictive variables. The load estimation procedure can be summarized by the following four steps: - (i) output the flow rates at regular time intervals (e.g. 10 minutes) using a time series model that captures all the peak flows; - (ii) output the predicted flow rates as in (i) at the concentration sampling times, if the corresponding flow rates are not collected; - (iii) establish a predictive model for the concentration data, which incorporates all possible predictor variables and output the predicted concentrations at the regular time intervals as in (i), and; - (iv) obtain the sum of all the products of the predicted flow and the predicted concentration over the regular time intervals to represent an estimate of the load. The key step to this approach is in the development of an appropriate predictive model for concentration. This is achieved using a generalized regression (rating-curve) approach with additional predictors that capture unique features in the flow data, namely the concept of the first flush, the location of the event on the hydrograph (e.g. rise or fall) and cumulative discounted flow. The latter may be thought of as a measure of constituent exhaustion occurring during flood events. The model also has the capacity to accommodate autocorrelation in model errors which are the result of intensive sampling during floods. Incorporating this additional information can significantly improve the predictability of concentration, and ultimately the precision with which the pollutant load is estimated. We also provide a measure of the standard error of the load estimate which incorporates model, spatial and/or temporal errors. This method also has the capacity to incorporate measurement error incurred through the sampling of flow. We illustrate this approach using the concentrations of total suspended sediment (TSS) and nitrogen oxide (NOx) and gauged flow data from the Burdekin River, a catchment delivering to the Great Barrier Reef. The sampling biases for NOx concentrations range from 2 to 10 times indicating severe biases. As we expect, the traditional average and extrapolation methods produce much higher estimates than those when bias in sampling is taken into account.