863 resultados para random regression model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Submarine groundwater discharge (SGD) is an integral part of the hydrological cycle and represents an important aspect of land-ocean interactions. We used a numerical model to simulate flow and salt transport in a nearshore groundwater aquifer under varying wave conditions based on yearlong random wave data sets, including storm surge events. The results showed significant flow asymmetry with rapid response of influxes and retarded response of effluxes across the seabed to the irregular wave conditions. While a storm surge immediately intensified seawater influx to the aquifer, the subsequent return of intruded seawater to the sea, as part of an increased SGD, was gradual. Using functional data analysis, we revealed and quantified retarded, cumulative effects of past wave conditions on SGD including the fresh groundwater and recirculating seawater discharge components. The retardation was characterized well by a gamma distribution function regardless of wave conditions. The relationships between discharge rates and wave parameters were quantifiable by a regression model in a functional form independent of the actual irregular wave conditions. This statistical model provides a useful method for analyzing and predicting SGD from nearshore unconfined aquifers affected by random waves

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Semisupervised dimensionality reduction has been attracting much attention as it not only utilizes both labeled and unlabeled data simultaneously, but also works well in the situation of out-of-sample. This paper proposes an effective approach of semisupervised dimensionality reduction through label propagation and label regression. Different from previous efforts, the new approach propagates the label information from labeled to unlabeled data with a well-designed mechanism of random walks, in which outliers are effectively detected and the obtained virtual labels of unlabeled data can be well encoded in a weighted regression model. These virtual labels are thereafter regressed with a linear model to calculate the projection matrix for dimensionality reduction. By this means, when the manifold or the clustering assumption of data is satisfied, the labels of labeled data can be correctly propagated to the unlabeled data; and thus, the proposed approach utilizes the labeled and the unlabeled data more effectively than previous work. Experimental results are carried out upon several databases, and the advantage of the new approach is well demonstrated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model for counts, and present efficient closed-form Bayesian inference; unlike conventional Poisson models, the proposed approach has two free parameters to include two different kinds of random effects, and allows the incorporation of prior information, such as sparsity in the regression coefficients. By placing a gamma distribution prior on the NB dispersion parameter r, and connecting a log-normal distribution prior with the logit of the NB probability parameter p, efficient Gibbs sampling and variational Bayes inference are both developed. The closed-form updates are obtained by exploiting conditional conjugacy via both a compound Poisson representation and a Polya-Gamma distribution based data augmentation approach. The proposed Bayesian inference can be implemented routinely, while being easily generalizable to more complex settings involving multivariate dependence structures. The algorithms are illustrated using real examples. Copyright 2012 by the author(s)/owner(s).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discrete Conditional Phase-type (DC-Ph) models are a family of models which represent skewed survival data conditioned on specific inter-related discrete variables. The survival data is modeled using a Coxian phase-type distribution which is associated with the inter-related variables using a range of possible data mining approaches such as Bayesian networks (BNs), the Naïve Bayes Classification method and classification regression trees. This paper utilizes the Discrete Conditional Phase-type model (DC-Ph) to explore the modeling of patient waiting times in an Accident and Emergency Department of a UK hospital. The resulting DC-Ph model takes on the form of the Coxian phase-type distribution conditioned on the outcome of a logistic regression model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This project focuses on the study of different explanatory models for the behavior of CDS security, such as Fixed-Effect Model, GLS Random-Effect Model, Pooled OLS and Quantile Regression Model. After determining the best fitness model, trading strategies with long and short positions in CDS have been developed. Due to some specifications of CDS, I conclude that the quantile regression is the most efficient model to estimate the data. The P&L and Sharpe Ratio of the strategy are analyzed using a backtesting analogy, where I conclude that, mainly for non-financial companies, the model allows traders to take advantage of and profit from arbitrages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A total of 20,065 weights recorded on 3016 Nelore animals were used to estimate covariance functions for growth from birth to 630 days of age, assuming a parametric correlation structure to model within-animal correlations. The model of analysis included fixed effects of contemporary groups and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Genetic effects of the animal and its dam and maternal permanent environmental effects were modelled by random regressions on Legendre polynomials of age at recording. Changes in direct permanent environmental effect variances were modelled by a polynomial variance function, together with a parametric correlation function to account for correlations between ages. Stationary and nonstationary models were used to model within-animal correlations between different ages. Residual variances were considered homogeneous or heterogeneous, with changes modelled by a step or polynomial function of age at recording. Based on Bayesian information criterion, a model with a cubic variance function combined with a nonstationary correlation function for permanent environmental effects, with 49 parameters to be estimated, fitted best. Modelling within-animal correlations through a parametric correlation structure can describe the variation pattern adequately. Moreover, the number of parameters to be estimated can be decreased substantially compared to a model fitting random regression on Legendre polynomial of age. © 2004 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present work we perform an econometric analysis of the Tribal art market. To this aim, we use a unique and original database that includes information on Tribal art market auctions worldwide from 1998 to 2011. In Literature, art prices are modelled through the hedonic regression model, a classic fixed-effect model. The main drawback of the hedonic approach is the large number of parameters, since, in general, art data include many categorical variables. In this work, we propose a multilevel model for the analysis of Tribal art prices that takes into account the influence of time on artwork prices. In fact, it is natural to assume that time exerts an influence over the price dynamics in various ways. Nevertheless, since the set of objects change at every auction date, we do not have repeated measurements of the same items over time. Hence, the dataset does not constitute a proper panel; rather, it has a two-level structure in that items, level-1 units, are grouped in time points, level-2 units. The main theoretical contribution is the extension of classical multilevel models to cope with the case described above. In particular, we introduce a model with time dependent random effects at the second level. We propose a novel specification of the model, derive the maximum likelihood estimators and implement them through the E-M algorithm. We test the finite sample properties of the estimators and the validity of the own-written R-code by means of a simulation study. Finally, we show that the new model improves considerably the fit of the Tribal art data with respect to both the hedonic regression model and the classic multilevel model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a diagnostic test for the mixing distribution in a generalised linear mixed model. The test is based on the difference between the marginal maximum likelihood and conditional maximum likelihood estimates of a subset of the fixed effects in the model. We derive the asymptotic variance of this difference, and propose a test statistic that has a limiting chi-square distribution under the null hypothesis that the mixing distribution is correctly specified. For the important special case of the logistic regression model with random intercepts, we evaluate via simulation the power of the test in finite samples under several alternative distributional forms for the mixing distribution. We illustrate the method by applying it to data from a clinical trial investigating the effects of hormonal contraceptives in women.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In non-linear random effects some attention has been very recently devoted to the analysis ofsuitable transformation of the response variables separately (Taylor 1996) or not (Oberg and Davidian 2000) from the transformations of the covariates and, as far as we know, no investigation has been carried out on the choice of link function in such models. In our study we consider the use of a random effect model when a parameterized family of links (Aranda-Ordaz 1981, Prentice 1996, Pregibon 1980, Stukel 1988 and Czado 1997) is introduced. We point out the advantages and the drawbacks associated with the choice of this data-driven kind of modeling. Difficulties in the interpretation of regression parameters, and therefore in understanding the influence of covariates, as well as problems related to loss of efficiency of estimates and overfitting, are discussed. A case study on radiotherapy usage in breast cancer treatment is discussed.