912 resultados para model selection


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this correspondence new robust nonlinear model construction algorithms for a large class of linear-in-the-parameters models are introduced to enhance model robustness via combined parameter regularization and new robust structural selective criteria. In parallel to parameter regularization, we use two classes of robust model selection criteria based on either experimental design criteria that optimizes model adequacy, or the predicted residual sums of squares (PRESS) statistic that optimizes model generalization capability, respectively. Three robust identification algorithms are introduced, i.e., combined A- and D-optimality with regularized orthogonal least squares algorithm, respectively; and combined PRESS statistic with regularized orthogonal least squares algorithm. A common characteristic of these algorithms is that the inherent computation efficiency associated with the orthogonalization scheme in orthogonal least squares or regularized orthogonal least squares has been extended such that the new algorithms are computationally efficient. Numerical examples are included to demonstrate effectiveness of the algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Motivation: The ability of a simple method (MODCHECK) to determine the sequence–structure compatibility of a set of structural models generated by fold recognition is tested in a thorough benchmark analysis. Four Model Quality Assessment Programs (MQAPs) were tested on 188 targets from the latest LiveBench-9 automated structure evaluation experiment. We systematically test and evaluate whether the MQAP methods can successfully detect native-likemodels. Results: We show that compared with the other three methods tested MODCHECK is the most reliable method for consistently performing the best top model selection and for ranking the models. In addition, we show that the choice of model similarity score used to assess a model's similarity to the experimental structure can influence the overall performance of these tools. Although these MQAP methods fail to improve the model selection performance for methods that already incorporate protein three dimension (3D) structural information, an improvement is observed for methods that are purely sequence-based, including the best profile–profile methods. This suggests that even the best sequence-based fold recognition methods can still be improved by taking into account the 3D structural information.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we deal with a Bayesian analysis for right-censored survival data suitable for populations with a cure rate. We consider a cure rate model based on the negative binomial distribution, encompassing as a special case the promotion time cure model. Bayesian analysis is based on Markov chain Monte Carlo (MCMC) methods. We also present some discussion on model selection and an illustration with a real dataset.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The predominant knowledge-based approach to automated model construction, compositional modelling, employs a set of models of particular functional components. Its inference mechanism takes a scenario describing the constituent interacting components of a system and translates it into a useful mathematical model. This paper presents a novel compositional modelling approach aimed at building model repositories. It furthers the field in two respects. Firstly, it expands the application domain of compositional modelling to systems that can not be easily described in terms of interacting functional components, such as ecological systems. Secondly, it enables the incorporation of user preferences into the model selection process. These features are achieved by casting the compositional modelling problem as an activity-based dynamic preference constraint satisfaction problem, where the dynamic constraints describe the restrictions imposed over the composition of partial models and the preferences correspond to those of the user of the automated modeller. In addition, the preference levels are represented through the use of symbolic values that differ in orders of magnitude.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC) são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We consider model selection uncertainty in linear regression. We study theoretically and by simulation the approach of Buckland and co-workers, who proposed estimating a parameter common to all models under study by taking a weighted average over the models, using weights obtained from information criteria or the bootstrap. This approach is compared with the usual approach in which the 'best' model is used, and with Bayesian model averaging. The weighted predictor behaves similarly to model averaging, with generally more realistic mean-squared errors than the usual model-selection-based estimator.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we propose a bivariate distribution for the bivariate survival times based on Farlie-Gumbel-Morgenstern copula to model the dependence on a bivariate survival data. The proposed model allows for the presence of censored data and covariates. For inferential purpose a Bayesian approach via Markov Chain Monte Carlo (MCMC) is considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated via a simulation study and a real dataset.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Preservation of rivers and water resources is crucial in most environmental policies and many efforts are made to assess water quality. Environmental monitoring of large river networks are based on measurement stations. Compared to the total length of river networks, their number is often limited and there is a need to extend environmental variables that are measured locally to the whole river network. The objective of this paper is to propose several relevant geostatistical models for river modeling. These models use river distance and are based on two contrasting assumptions about dependency along a river network. Inference using maximum likelihood, model selection criterion and prediction by kriging are then developed. We illustrate our approach on two variables that differ by their distributional and spatial characteristics: summer water temperature and nitrate concentration. The data come from 141 to 187 monitoring stations in a network on a large river located in the Northeast of France that is more than 5000 km long and includes Meuse and Moselle basins. We first evaluated different spatial models and then gave prediction maps and error variance maps for the whole stream network.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We consider a fully model-based approach for the analysis of distance sampling data. Distance sampling has been widely used to estimate abundance (or density) of animals or plants in a spatially explicit study area. There is, however, no readily available method of making statistical inference on the relationships between abundance and environmental covariates. Spatial Poisson process likelihoods can be used to simultaneously estimate detection and intensity parameters by modeling distance sampling data as a thinned spatial point process. A model-based spatial approach to distance sampling data has three main benefits: it allows complex and opportunistic transect designs to be employed, it allows estimation of abundance in small subregions, and it provides a framework to assess the effects of habitat or experimental manipulation on density. We demonstrate the model-based methodology with a small simulation study and analysis of the Dubbo weed data set. In addition, a simple ad hoc method for handling overdispersion is also proposed. The simulation study showed that the model-based approach compared favorably to conventional distance sampling methods for abundance estimation. In addition, the overdispersion correction performed adequately when the number of transects was high. Analysis of the Dubbo data set indicated a transect effect on abundance via Akaike’s information criterion model selection. Further goodness-of-fit analysis, however, indicated some potential confounding of intensity with the detection function.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we propose a hybrid hazard regression model with threshold stress which includes the proportional hazards and the accelerated failure time models as particular cases. To express the behavior of lifetimes the generalized-gamma distribution is assumed and an inverse power law model with a threshold stress is considered. For parameter estimation we develop a sampling-based posterior inference procedure based on Markov Chain Monte Carlo techniques. We assume proper but vague priors for the parameters of interest. A simulation study investigates the frequentist properties of the proposed estimators obtained under the assumption of vague priors. Further, some discussions on model selection criteria are given. The methodology is illustrated on simulated and real lifetime data set.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the setting of high-dimensional linear models with Gaussian noise, we investigate the possibility of confidence statements connected to model selection. Although there exist numerous procedures for adaptive (point) estimation, the construction of adaptive confidence regions is severely limited (cf. Li in Ann Stat 17:1001–1008, 1989). The present paper sheds new light on this gap. We develop exact and adaptive confidence regions for the best approximating model in terms of risk. One of our constructions is based on a multiscale procedure and a particular coupling argument. Utilizing exponential inequalities for noncentral χ2-distributions, we show that the risk and quadratic loss of all models within our confidence region are uniformly bounded by the minimal risk times a factor close to one.