997 resultados para Graphical modeling (Statistics)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Survival models involving frailties are commonly applied in studies where correlated event time data arise due to natural or artificial clustering. In this paper we present an application of such models in the animal breeding field. Specifically, a mixed survival model with a multivariate correlated frailty term is proposed for the analysis of data from over 3611 Brazilian Nellore cattle. The primary aim is to evaluate parental genetic effects on the trait length in days that their progeny need to gain a commercially specified standard weight gain. This trait is not measured directly but can be estimated from growth data. Results point to the importance of genetic effects and suggest that these models constitute a valuable data analysis tool for beef cattle breeding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interval-censored survival data, in which the event of interest is not observed exactly but is only known to occur within some time interval, occur very frequently. In some situations, event times might be censored into different, possibly overlapping intervals of variable widths; however, in other situations, information is available for all units at the same observed visit time. In the latter cases, interval-censored data are termed grouped survival data. Here we present alternative approaches for analyzing interval-censored data. We illustrate these techniques using a survival data set involving mango tree lifetimes. This study is an example of grouped survival data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A four parameter generalization of the Weibull distribution capable of modeling a bathtub-shaped hazard rate function is defined and studied. The beauty and importance of this distribution lies in its ability to model monotone as well as non-monotone failure rates, which are quite common in lifetime problems and reliability. The new distribution has a number of well-known lifetime special sub-models, such as the Weibull, extreme value, exponentiated Weibull, generalized Rayleigh and modified Weibull distributions, among others. We derive two infinite sum representations for its moments. The density of the order statistics is obtained. The method of maximum likelihood is used for estimating the model parameters. Also, the observed information matrix is obtained. Two applications are presented to illustrate the proposed distribution. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chagas disease (American trypanosomiasis) is one of the most important parasitic diseases with serious social and economic impacts mainly on Latin America. This work reports the synthesis, in vitro trypanocidal evaluation, cytotoxicity assays, and molecular modeling and SAR/QSAR studies of a new series of N-phenylpyrazole benzylidene-carbohydrazides. The results pointed 6k (X = H, Y = p-NO(2), pIC(50) = 4.55 M) and 6l (X = F, Y = p-CN, pIC(50) = 4.27 M) as the most potent derivatives compared to crystal violet (pIC(50) = 3.77 M). The halogen-benzylidene-carbohydrazide presented the lowest potency whereas 6l showed the most promising pro. le with low toxicity (0% of cell death). The best equation from the 4D-QSAR analysis (Model 1) was able to explain 85% of the activity variability. The QSAR graphical representation revealed that bulky X-substituents decreased the potency whereas hydrophobic and hydrogen bond acceptor Y-substituents increased it. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the present study was to test a hypothetical model to examine if dispositional optimism exerts a moderating or a mediating effect between personality traits and quality of life, in Portuguese patients with chronic diseases. A sample of 540 patients was recruited from central hospitals in various districts of Portugal. All patients completed self-reported questionnaires assessing socio-demographic and clinical variables, personality, dispositional optimism, and quality of life. Structural equation modeling (SEM) was used to analyze the moderating and mediating effects. Results suggest that dispositional optimism exerts a mediator rather than a moderator role between personality traits and quality of life, suggesting that “the expectation that good things will happen” contributes to a better general well-being and better mental functioning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper offers a new approach to estimating time-varying covariance matrices in the framework of the diagonal-vech version of the multivariate GARCH(1,1) model. Our method is numerically feasible for large-scale problems, produces positive semidefinite conditional covariance matrices, and does not impose unrealistic a priori restrictions. We provide an empirical application in the context of international stock markets, comparing the nev^ estimator with a number of existing ones.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Observations in daily practice are sometimes registered as positive values larger then a given threshold α. The sample space is in this case the interval (α,+∞), α & 0, which can be structured as a real Euclidean space in different ways. This fact opens the door to alternative statistical models depending not only on the assumed distribution function, but also on the metric which is considered as appropriate, i.e. the way differences are measured, and thus variability

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The forensic two-trace problem is a perplexing inference problem introduced by Evett (J Forensic Sci Soc 27:375-381, 1987). Different possible ways of wording the competing pair of propositions (i.e., one proposition advanced by the prosecution and one proposition advanced by the defence) led to different quantifications of the value of the evidence (Meester and Sjerps in Biometrics 59:727-732, 2003). Here, we re-examine this scenario with the aim of clarifying the interrelationships that exist between the different solutions, and in this way, produce a global vision of the problem. We propose to investigate the different expressions for evaluating the value of the evidence by using a graphical approach, i.e. Bayesian networks, to model the rationale behind each of the proposed solutions and the assumptions made on the unknown parameters in this problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When the behaviour of a specific hypothesis test statistic is studied by aMonte Carlo experiment, the usual way to describe its quality is by givingthe empirical level of the test. As an alternative to this procedure, we usethe empirical distribution of the obtained \emph{p-}values and exploit itsinformation both graphically and numerically.