952 resultados para Schwarz Information Criterion
Resumo:
The traditional searching method for model-order selection in linear regression is a nested full-parameters-set searching procedure over the desired orders, which we call full-model order selection. On the other hand, a method for model-selection searches for the best sub-model within each order. In this paper, we propose using the model-selection searching method for model-order selection, which we call partial-model order selection. We show by simulations that the proposed searching method gives better accuracies than the traditional one, especially for low signal-to-noise ratios over a wide range of model-order selection criteria (both information theoretic based and bootstrap-based). Also, we show that for some models the performance of the bootstrap-based criterion improves significantly by using the proposed partial-model selection searching method. Index Terms— Model order estimation, model selection, information theoretic criteria, bootstrap 1. INTRODUCTION Several model-order selection criteria can be applied to find the optimal order. Some of the more commonly used information theoretic-based procedures include Akaike’s information criterion (AIC) [1], corrected Akaike (AICc) [2], minimum description length (MDL) [3], normalized maximum likelihood (NML) [4], Hannan-Quinn criterion (HQC) [5], conditional model-order estimation (CME) [6], and the efficient detection criterion (EDC) [7]. From a practical point of view, it is difficult to decide which model order selection criterion to use. Many of them perform reasonably well when the signal-to-noise ratio (SNR) is high. The discrepancies in their performance, however, become more evident when the SNR is low. In those situations, the performance of the given technique is not only determined by the model structure (say a polynomial trend versus a Fourier series) but, more importantly, by the relative values of the parameters within the model. This makes the comparison between the model-order selection algorithms difficult as within the same model with a given order one could find an example for which one of the methods performs favourably well or fails [6, 8]. Our aim is to improve the performance of the model order selection criteria in cases where the SNR is low by considering a model-selection searching procedure that takes into account not only the full-model order search but also a partial model order search within the given model order. Understandably, the improvement in the performance of the model order estimation is at the expense of additional computational complexity.
Resumo:
BACKGROUND: The relationship between temperature and mortality has been explored for decades and many temperature indicators have been applied separately. However, few data are available to show how the effects of different temperature indicators on different mortality categories, particularly in a typical subtropical climate. OBJECTIVE: To assess the associations between various temperature indicators and different mortality categories in Brisbane, Australia during 1996-2004. METHODS: We applied two methods to assess the threshold and temperature indicator for each age and death groups: mean temperature and the threshold assessed from all cause mortality was used for all mortality categories; the specific temperature indicator and the threshold for each mortality category were identified separately according to the minimisation of AIC. We conducted polynomial distributed lag non-linear model to identify effect estimates in mortality with one degree of temperature increase (or decrease) above (or below) the threshold on current days and lagged effects using both methods. RESULTS: Akaike's Information Criterion was minimized when mean temperature was used for all non-external deaths and deaths from 75 to 84 years; when minimum temperature was used for deaths from 0 to 64 years, 65-74 years, ≥ 85 years, and from the respiratory diseases; when maximum temperature was used for deaths from cardiovascular diseases. The effect estimates using certain temperature indicators were similar as mean temperature both for current day and lag effects. CONCLUSION: Different age groups and death categories were sensitive to different temperature indicators. However, the effect estimates from certain temperature indicators did not significantly differ from those of mean temperature.
Resumo:
Most crash severity studies ignored severity correlations between driver-vehicle units involved in the same crashes. Models without accounting for these within-crash correlations will result in biased estimates in the factor effects. This study developed a Bayesian hierarchical binomial logistic model to identify the significant factors affecting the severity level of driver injury and vehicle damage in traffic crashes at signalized intersections. Crash data in Singapore were employed to calibrate the model. Model fitness assessment and comparison using Intra-class Correlation Coefficient (ICC) and Deviance Information Criterion (DIC) ensured the suitability of introducing the crash-level random effects. Crashes occurring in peak time, in good street lighting condition, involving pedestrian injuries are associated with a lower severity, while those in night time, at T/Y type intersections, on right-most lane, and installed with red light camera have larger odds of being severe. Moreover, heavy vehicles have a better resistance on severe crash, while crashes involving two-wheel vehicles, young or aged drivers, and the involvement of offending party are more likely to result in severe injuries.
Resumo:
This paper presents a novel technique for segmenting an audio stream into homogeneous regions according to speaker identities, background noise, music, environmental and channel conditions. Audio segmentation is useful in audio diarization systems, which aim to annotate an input audio stream with information that attributes temporal regions of the audio into their specific sources. The segmentation method introduced in this paper is performed using the Generalized Likelihood Ratio (GLR), computed between two adjacent sliding windows over preprocessed speech. This approach is inspired by the popular segmentation method proposed by the pioneering work of Chen and Gopalakrishnan, using the Bayesian Information Criterion (BIC) with an expanding search window. This paper will aim to identify and address the shortcomings associated with such an approach. The result obtained by the proposed segmentation strategy is evaluated on the 2002 Rich Transcription (RT-02) Evaluation dataset, and a miss rate of 19.47% and a false alarm rate of 16.94% is achieved at the optimal threshold.
Resumo:
This paper proposes a practical prediction procedure for vertical displacement of a Rotarywing Unmanned Aerial Vehicle (RUAV) landing deck in the presence of stochastic sea state disturbances. A proper time series model tending to capture characteristics of the dynamic relationship between an observer and a landing deck is constructed, with model orders determined by a novel principle based on Bayes Information Criterion (BIC) and coefficients identified using the Forgetting Factor Recursive Least Square (FFRLS) method. In addition, a fast-converging online multi-step predictor is developed, which can be implemented more rapidly than the Auto-Regressive (AR) predictor as it requires less memory allocations when updating coefficients. Simulation results demonstrate that the proposed prediction approach exhibits satisfactory prediction performance, making it suitable for integration into ship-helicopter approach and landing guidance systems in consideration of computational capacity of the flight computer.
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
Background. Interventions that prevent healthcare-associated infection should lead to fewer deaths and shorter hospital stays. Cleaning hands (with soap or alcohol) is an effective way to prevent the transmission of organisms, but rates of compliance with hand hygiene are sometimes disappointingly low. The National Hand Hygiene Initiative in Australia aimed to improve hand hygiene compliance among healthcare workers, with the goal of reducing rates of healthcare-associated infection. Methods. We examined whether the introduction of the National Hand Hygiene Initiative was associated with a change in infection rates. Monthly infection rates for healthcare-associated Staphylococcus aureus bloodstream infections were examined in 38 Australian hospitals across 6 states. We used Poisson regression and examined 12 possible patterns of change, with the best fitting pattern chosen using the Akaike information criterion. Monthly bed-days were included to control for increased hospital use over time. Results. The National Hand Hygiene Initiative was associated with a reduction in infection rates in 4 of the 6 states studied. Two states showed an immediate reduction in rates of 17% and 28%, 2 states showed a linear decrease in rates of 8% and 11% per year, and 2 showed no change in infection rates. Conclusions. The intervention was associated with reduced infection rates in most states. The failure in 2 states may have been because those states already had effective initiatives before the national initiative’s introduction or because infection rates were already low and could not be further reduced.
Resumo:
Spatial data are now prevalent in a wide range of fields including environmental and health science. This has led to the development of a range of approaches for analysing patterns in these data. In this paper, we compare several Bayesian hierarchical models for analysing point-based data based on the discretization of the study region, resulting in grid-based spatial data. The approaches considered include two parametric models and a semiparametric model. We highlight the methodology and computation for each approach. Two simulation studies are undertaken to compare the performance of these models for various structures of simulated point-based data which resemble environmental data. A case study of a real dataset is also conducted to demonstrate a practical application of the modelling approaches. Goodness-of-fit statistics are computed to compare estimates of the intensity functions. The deviance information criterion is also considered as an alternative model evaluation criterion. The results suggest that the adaptive Gaussian Markov random field model performs well for highly sparse point-based data where there are large variations or clustering across the space; whereas the discretized log Gaussian Cox process produces good fit in dense and clustered point-based data. One should generally consider the nature and structure of the point-based data in order to choose the appropriate method in modelling a discretized spatial point-based data.
Resumo:
Research problem: Overfitting and collinearity problems commonly exist in current construction cost estimation applications and obstruct researchers and practitioners in achieving better modelling results. Research objective and method: A hybrid approach of Akaike information criterion (AIC) stepwise regression and principal component regression (PCR) is proposed to help solve overfitting and collinearity problems. Utilization of this approach in linear regression is validated by comparing it with other commonly used approaches. The mean square error obtained by leave-one-out cross validation (MSELOOCV) is used in model selection in deciding predictive variables.
Resumo:
Aim Large-scale patterns linking energy availability, biological productivity and diversity form a central focus of ecology. Despite evidence that the activity and abundance of animals may be limited by climatic variables associated with regional biological productivity (e.g. mean annual precipitation and annual actual evapotranspiration), it is unclear whether plant–granivore interactions are themselves influenced by these climatic factors across broad spatial extents. We evaluated whether climatic conditions that are known to alter the abundance and activity of granivorous animals also affect rates of seed removal. Location Eleven sites across temperate North America. Methods We used a common protocol to assess the removal of the same seed species (Avena sativa) over a 2-day period. Model selection via the Akaike information criterion was used to determine a set of candidate binomial generalized linear mixed models that evaluated the relationship between local climatic data and post-dispersal seed predation. Results Annual actual evapotranspiration was the single best predictor of the proportion of seeds removed. Annual actual evapotranspiration and mean annual precipitation were both positively related to mean seed removal and were included in four and three of the top five models, respectively. Annual temperature range was also positively related to seed removal and was an explanatory variable in three of the top four models. Main conclusions Our work provides the first evidence that energy and precipitation, which are known to affect consumer abundance and activity, also translate to strong, predictable patterns of seed predation across a continent. More generally, these findings suggest that future changes in temperature and precipitation could have widespread consequences for plant species composition in grasslands, through impacts on plant recruitment.
Resumo:
This thesis proposes three novel models which extend the statistical methodology for motor unit number estimation, a clinical neurology technique. Motor unit number estimation is important in the treatment of degenerative muscular diseases and, potentially, spinal injury. Additionally, a recent and untested statistic to enable statistical model choice is found to be a practical alternative for larger datasets. The existing methods for dose finding in dual-agent clinical trials are found to be suitable only for designs of modest dimensions. The model choice case-study is the first of its kind containing interesting results using so-called unit information prior distributions.
Resumo:
Non-rigid image registration is an essential tool required for overcoming the inherent local anatomical variations that exist between images acquired from different individuals or atlases. Furthermore, certain applications require this type of registration to operate across images acquired from different imaging modalities. One popular local approach for estimating this registration is a block matching procedure utilising the mutual information criterion. However, previous block matching procedures generate a sparse deformation field containing displacement estimates at uniformly spaced locations. This neglects to make use of the evidence that block matching results are dependent on the amount of local information content. This paper presents a solution to this drawback by proposing the use of a Reversible Jump Markov Chain Monte Carlo statistical procedure to optimally select grid points of interest. Three different methods are then compared to propagate the estimated sparse deformation field to the entire image including a thin-plate spline warp, Gaussian convolution, and a hybrid fluid technique. Results show that non-rigid registration can be improved by using the proposed algorithm to optimally select grid points of interest.
Resumo:
Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.