2 resultados para testing against heavy tails

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The cathepsin enzymes represent an important family of lysosomal proteinases with a broad spectrum of functions in many, if not in all, tissues and cell types. In addition to their primary role during the normal protein turnover, they possess highly specific proteolytic activities, including antigen processing in the immune response and a direct role in the development of obesity and tumours. In pigs, the involvement of cathepsin enzymes in proteolytic processes have important effects during the conversion of muscle to meat, due to their influence on meat texture and sensory characteristics, mainly in seasoned products. Their contribution is fundamental in flavour development of dry-curing hams. However, several authors have demonstrated that high cathepsin activity, in particular of cathepsin B, is correlated to defects of these products, such as an excessive meat softness together with abnormal free tyrosine content, astringent or metallic aftertastes and formation of a white film on the cut surface. Thus, investigation of their genetic variability could be useful to identify DNA markers associated with these dry cured hams parameters, but also with meat quality, production and carcass traits in Italian heavy pigs. Unfortunately, no association has been found between cathepsin markers and meat quality traits so far, in particular with cathepsin B activity, suggesting that other genes, besides these, affect meat quality parameters. Nevertheless, significant associations were observed with several carcass and production traits in pigs. A recent study has demonstrated that different single nucleotide polymorphisms (SNPs) localized in cathepsin D (CTSD), F (CTSF), H and Z genes were highly associated with growth, fat deposition and production traits in an Italian Large White pig population. The aim of this thesis was to confirm some of these results in other pig populations and identify new cathepsin markers in order to evaluate their effects on cathepsin activity and other production traits. Furthermore, starting from the data obtained in previous studies on CTSD gene, we also analyzed the known polymorphism located in the insulin-like growth factor 2 gene (IGF2 intron3-g.3072G>A). This marker is considered the causative mutation for the quantitative trait loci (QTL) affecting muscle mass and fat deposition in pigs. Since IGF2 maps very close to CTSD on porcine chromosome (SSC) 2, we wanted to clarify if the effects of the CTSD marker were due to linkage disequilibrium with the IGF2 intron3-g.3072G>A mutation or not. In the first chapter, we reported the results from these two SSC2 gene markers. First of all, we evaluated the effects of the IGF2 intron3-g.3072G>A polymorphism in the Italian Large White breed, for which no previous studies have analysed this marker. Highly significant associations were identified with all estimated breeding values for production and carcass traits (P<0.00001), while no effects were observed for meat quality traits. Instead, the IGF2 intron3-g.3072G>A mutation did not show any associations with the analyzed traits in the Italian Duroc pigs, probably due to the low level of variability at this polymorphic site for this breed. In the same Duroc pig population, significant associations were obtained for the CTSD marker for all production and carcass traits (P < 0.001), after excluding possible confounding effects of the IGF2 mutation. The effects of the CTSD g.70G>A polymorphism were also confirmed in a group of Italian Large White pigs homozygous for the IGF2 intron3-g.3072G allele G (IGF2 intron3-g.3072GG) and by haplotype analysis between the markers of the two considered genes. Taken together, all these data indicated that the IGF2 intron3-g.3072G>A mutation is not the only polymorphism affecting fatness and muscle deposition in pigs. In the second chapter, we reported the analysis of two new SNPs identified in cathepsin L (CTSL) and cathepsin S (CTSS) genes and the association results with meat quality parameters (including cathepsin B activity) and several production traits in an Italian Large White pig population. Allele frequencies of these two markers were evaluated in 7 different pig breeds. Furthermore, we mapped using a radiation hybrid panel the CTSS gene on SSC4. Association studies with several production traits, carried out in 268 Italian Large White pigs, indicated positive effects of the CTSL polymorphism on average daily gain, weight of lean cuts and backfat thickness (P<0.05). The results for these latter traits were also confirmed using a selective genotype approach in other Italian Large White pigs (P<0.01). In the 268 pig group, the CTSS polymorphism was associated with feed:gain ratio and average daily gain (P<0.05). Instead, no association was observed between the analysed markers and meat quality parameters. Finally, we wanted to verify if the positive results obtained for the cathepsin L and S markers and for other previous identified SNPs (cathepsin F, cathepsin Z and their inhibitor cystatin B) were confirmed in the Italian Duroc pig breed (third chapter). We analysed them in two groups of Duroc pigs: the first group was made of 218 performance-tested pigs not selected by any phenotypic criteria, the second group was made of 100 Italian Duroc pigs extreme and divergent for visible intermuscular fat trait. In the first group, the CTSL polymorphism was associated with weight of lean cuts (P<0.05), while suggestive associations were obtained for average daily gain and backfat thickness (P<0.10). Allele frequencies of the CTSL gene marker also differed positively among the visible intermuscular extreme tails. Instead, no positive effects were observed for the other DNA markers on the analysed traits. In conclusion, in agreement with the present data and for the biological role of these enzymes, the porcine CTSD and CTSL markers: a) may have a direct effect in the biological mechanisms involved in determining fat and lean meat content in pigs, or b) these markers could be very close to the putative functional mutation(s) present in other genes. These findings have important practical applications, in particular the CTSD and CTSL mutations could be applied in a marker assisted selection (MAS) both in the Italian Large White and Italian Duroc breeds. Marker assisted selection could also increase in efficiency by adding information from the cathepsin S genotype, but only in the Italian Large White breed.