994 resultados para thick-tailed distribution


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linear mixed effects models are frequently used to analyse longitudinal data, due to their flexibility in modelling the covariance structure between and within observations. Further, it is easy to deal with unbalanced data, either with respect to the number of observations per subject or per time period, and with varying time intervals between observations. In most applications of mixed models to biological sciences, a normal distribution is assumed both for the random effects and for the residuals. This, however, makes inferences vulnerable to the presence of outliers. Here, linear mixed models employing thick-tailed distributions for robust inferences in longitudinal data analysis are described. Specific distributions discussed include the Student-t, the slash and the contaminated normal. A Bayesian framework is adopted, and the Gibbs sampler and the Metropolis-Hastings algorithms are used to carry out the posterior analyses. An example with data on orthodontic distance growth in children is discussed to illustrate the methodology. Analyses based on either the Student-t distribution or on the usual Gaussian assumption are contrasted. The thick-tailed distributions provide an appealing robust alternative to the Gaussian process for modelling distributions of the random effects and of residuals in linear mixed models, and the MCMC implementation allows the computations to be performed in a flexible manner.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scale mixtures of the skew-normal (SMSN) distribution is a class of asymmetric thick-tailed distributions that includes the skew-normal (SN) distribution as a special case. The main advantage of these classes of distributions is that they are easy to simulate and have a nice hierarchical representation facilitating easy implementation of the expectation-maximization algorithm for the maximum-likelihood estimation. In this paper, we assume an SMSN distribution for the unobserved value of the covariates and a symmetric scale mixtures of the normal distribution for the error term of the model. This provides a robust alternative to parameter estimation in multivariate measurement error models. Specific distributions examined include univariate and multivariate versions of the SN, skew-t, skew-slash and skew-contaminated normal distributions. The results and methods are applied to a real data set.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present a continuous time random walk model for the scale-invariant transport found in a self-organized critical rice pile [K. Christensen et al., Phys. Rev. Lett. 77, 107 (1996)]. From our analytical results it is shown that the dynamics of the experiment can be explained in terms of Lvy flights for the grains and a long-tailed distribution of trapping times. Scaling relations for the exponents of these distributions are obtained. The predicted microscopic behavior is confirmed by means of a cellular automaton model.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Linear mixed effects models have been widely used in analysis of data where responses are clustered around some random effects, so it is not reasonable to assume independence between observations in the same cluster. In most biological applications, it is assumed that the distributions of the random effects and of the residuals are Gaussian. This makes inferences vulnerable to the presence of outliers. Here, linear mixed effects models with normal/independent residual distributions for robust inferences are described. Specific distributions examined include univariate and multivariate versions of the Student-t, the slash and the contaminated normal. A Bayesian framework is adopted and Markov chain Monte Carlo is used to carry out the posterior analysis. The procedures are illustrated using birth weight data on rats in a texicological experiment. Results from the Gaussian and robust models are contrasted, and it is shown how the implementation can be used for outlier detection. The thick-tailed distributions provide an appealing robust alternative to the Gaussian process in linear mixed models, and they are easily implemented using data augmentation and MCMC techniques.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Extreme stock price movements are of great concern to both investors and the entire economy. For investors, a single negative return, or a combination of several smaller returns, can possible wipe out so much capital that the firm or portfolio becomes illiquid or insolvent. If enough investors experience this loss, it could shock the entire economy. An example of such a case is the stock market crash of 1987. Furthermore, there has been a lot of recent interest regarding the increasing volatility of stock prices. ^ This study presents an analysis of extreme stock price movements. The data utilized was the daily returns for the Standard and Poor's 500 index from January 3, 1978 to May 31, 2001. Research questions were analyzed using the statistical models provided by extreme value theory. One of the difficulties in examining stock price data is that there is no consensus regarding the correct shape of the distribution function generating the data. An advantage with extreme value theory is that no detailed knowledge of this distribution function is required to apply the asymptotic theory. We focus on the tail of the distribution. ^ Extreme value theory allows us to estimate a tail index, which we use to derive bounds on the returns for very low probabilities on an excess. Such information is useful in evaluating the volatility of stock prices. There are three possible limit laws for the maximum: Gumbel (thick-tailed), Fréchet (thin-tailed) or Weibull (no tail). Results indicated that extreme returns during the time period studied follow a Fréchet distribution. Thus, this study finds that extreme value analysis is a valuable tool for examining stock price movements and can be more efficient than the usual variance in measuring risk. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study the dynamics of the adoption of new products by agents with continuous opinions and discrete actions (CODA). The model is such that the refusal in adopting a new idea or product is increasingly weighted by neighbor agents as evidence against the product. Under these rules, we study the distribution of adoption times and the final proportion of adopters in the population. We compare the cases where initial adopters are clustered to the case where they are randomly scattered around the social network and investigate small world effects on the final proportion of adopters. The model predicts a fat tailed distribution for late adopters which is verified by empirical data. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

BACKGROUND AND PURPOSE: Intravoxel incoherent motion MRI has been proposed as an alternative method to measure brain perfusion. Our aim was to evaluate the utility of intravoxel incoherent motion perfusion parameters (the perfusion fraction, the pseudodiffusion coefficient, and the flow-related parameter) to differentiate high- and low-grade brain gliomas. MATERIALS AND METHODS: The intravoxel incoherent motion perfusion parameters were assessed in 21 brain gliomas (16 high-grade, 5 low-grade). Images were acquired by using a Stejskal-Tanner diffusion pulse sequence, with 16 values of b (0-900 s/mm(2)) in 3 orthogonal directions on 3T systems equipped with 32 multichannel receiver head coils. The intravoxel incoherent motion perfusion parameters were derived by fitting the intravoxel incoherent motion biexponential model. Regions of interest were drawn in regions of maximum intravoxel incoherent motion perfusion fraction and contralateral control regions. Statistical significance was assessed by using the Student t test. In addition, regions of interest were drawn around all whole tumors and were evaluated with the help of histograms. RESULTS: In the regions of maximum perfusion fraction, perfusion fraction was significantly higher in the high-grade group (0.127 ± 0.031) than in the low-grade group (0.084 ± 0.016, P < .001) and in the contralateral control region (0.061 ± 0.011, P < .001). No statistically significant difference was observed for the pseudodiffusion coefficient. The perfusion fraction correlated moderately with dynamic susceptibility contrast relative CBV (r = 0.59). The histograms of the perfusion fraction showed a "heavy-tailed" distribution for high-grade but not low-grade gliomas. CONCLUSIONS: The intravoxel incoherent motion perfusion fraction is helpful for differentiating high- from low-grade brain gliomas.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Daily precipitation is recorded as the total amount of water collected by a rain-gauge in 24h. Events are modelled as a Poisson process and the 24h precipitation by a Generalized Pareto Distribution (GPD) of excesses. Hazard assessment is complete when estimates of the Poisson rate and the distribution parameters, together with a measure of their uncertainty, are obtained. The shape parameter of the GPD determines the support of the variable: Weibull domain of attraction (DA) corresponds to finite support variables, as should be for natural phenomena. However, Fréchet DA has been reported for daily precipitation, which implies an infinite support and a heavy-tailed distribution. We use the fact that a log-scale is better suited to the type of variable analyzed to overcome this inconsistency, thus showing that using the appropriate natural scale can be extremely important for proper hazard assessment. The approach is illustrated with precipitation data from the Eastern coast of the Iberian Peninsula affected by severe convective precipitation. The estimation is carried out by using Bayesian techniques

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Daily precipitation is recorded as the total amount of water collected by a rain-gauge in 24 h. Events are modelled as a Poisson process and the 24 h precipitation by a Generalised Pareto Distribution (GPD) of excesses. Hazard assessment is complete when estimates of the Poisson rate and the distribution parameters, together with a measure of their uncertainty, are obtained. The shape parameter of the GPD determines the support of the variable: Weibull domain of attraction (DA) corresponds to finite support variables as should be for natural phenomena. However, Fréchet DA has been reported for daily precipitation, which implies an infinite support and a heavy-tailed distribution. Bayesian techniques are used to estimate the parameters. The approach is illustrated with precipitation data from the Eastern coast of the Iberian Peninsula affected by severe convective precipitation. The estimated GPD is mainly in the Fréchet DA, something incompatible with the common sense assumption of that precipitation is a bounded phenomenon. The bounded character of precipitation is then taken as a priori hypothesis. Consistency of this hypothesis with the data is checked in two cases: using the raw-data (in mm) and using log-transformed data. As expected, a Bayesian model checking clearly rejects the model in the raw-data case. However, log-transformed data seem to be consistent with the model. This fact may be due to the adequacy of the log-scale to represent positive measurements for which differences are better relative than absolute

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The use of group-randomized trials is particularly widespread in the evaluation of health care, educational, and screening strategies. Group-randomized trials represent a subset of a larger class of designs often labeled nested, hierarchical, or multilevel and are characterized by the randomization of intact social units or groups, rather than individuals. The application of random effects models to group-randomized trials requires the specification of fixed and random components of the model. The underlying assumption is usually that these random components are normally distributed. This research is intended to determine if the Type I error rate and power are affected when the assumption of normality for the random component representing the group effect is violated. ^ In this study, simulated data are used to examine the Type I error rate, power, bias and mean squared error of the estimates of the fixed effect and the observed intraclass correlation coefficient (ICC) when the random component representing the group effect possess distributions with non-normal characteristics, such as heavy tails or severe skewness. The simulated data are generated with various characteristics (e.g. number of schools per condition, number of students per school, and several within school ICCs) observed in most small, school-based, group-randomized trials. The analysis is carried out using SAS PROC MIXED, Version 6.12, with random effects specified in a random statement and restricted maximum likelihood (REML) estimation specified. The results from the non-normally distributed data are compared to the results obtained from the analysis of data with similar design characteristics but normally distributed random effects. ^ The results suggest that the violation of the normality assumption for the group component by a skewed or heavy-tailed distribution does not appear to influence the estimation of the fixed effect, Type I error, and power. Negative biases were detected when estimating the sample ICC and dramatically increased in magnitude as the true ICC increased. These biases were not as pronounced when the true ICC was within the range observed in most group-randomized trials (i.e. 0.00 to 0.05). The normally distributed group effect also resulted in bias ICC estimates when the true ICC was greater than 0.05. However, this may be a result of higher correlation within the data. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.