4 resultados para testing method
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.
Resumo:
The quality of astronomical sites is the first step to be considered to have the best performances from the telescopes. In particular, the efficiency of large telescopes in UV, IR, radio etc. is critically dependent on atmospheric transparency. It is well known that the random optical effects induced on the light propagation by turbulent atmosphere also limit telescope’s performances. Nowadays, clear appears the importance to correlate the main atmospheric physical parameters with the optical quality reachable by large aperture telescopes. The sky quality evaluation improved with the introduction of new techniques, new instrumentations and with the understanding of the link between the meteorological (or synoptical parameters and the observational conditions thanks to the application of the theories of electromagnetic waves propagation in turbulent medias: what we actually call astroclimatology. At the present the site campaigns are evolved and are performed using the classical scheme of optical seeing properties, meteorological parameters, sky transparency, sky darkness and cloudiness. New concept are added and are related to the geophysical properties such as seismicity, microseismicity, local variability of the climate, atmospheric conditions related to the ground optical turbulence and ground wind regimes, aerosol presence, use of satellite data. The purpose of this project is to provide reliable methods to analyze the atmospheric properties that affect ground-based optical astronomical observations and to correlate them with the main atmospheric parameters generating turbulence and affecting the photometric accuracy. The first part of the research concerns the analysis and interpretation of longand short-time scale meteorological data at two of the most important astronomical sites located in very different environments: the Paranal Observatory in the Atacama Desert (Chile), and the Observatorio del Roque de Los Muchachos(ORM) located in La Palma (Canary Islands, Spain). The optical properties of airborne dust at ORM have been investigated collecting outdoor data using a ground-based dust monitor. Because of its dryness, Paranal is a suitable observatory for near-IR observations, thus the extinction properties in the spectral range 1.00-2.30 um have been investigated using an empirical method. Furthermore, this PhD research has been developed using several turbulence profilers in the selection of the site for the European Extremely Large Telescope(E-ELT). During the campaigns the properties of the turbulence at different heights at Paranal and in the sites located in northern Chile and Argentina have been studied. This given the possibility to characterize the surface layer turbulence at Paranal and its connection with local meteorological conditions.
Resumo:
Weak lensing experiments such as the future ESA-accepted mission Euclid aim to measure cosmological parameters with unprecedented accuracy. It is important to assess the precision that can be obtained in these measurements by applying analysis software on mock images that contain many sources of noise present in the real data. In this Thesis, we show a method to perform simulations of observations, that produce realistic images of the sky according to characteristics of the instrument and of the survey. We then use these images to test the performances of the Euclid mission. In particular, we concentrate on the precision of the photometric redshift measurements, which are key data to perform cosmic shear tomography. We calculate the fraction of the total observed sample that must be discarded to reach the required level of precision, that is equal to 0.05(1+z) for a galaxy with measured redshift z, with different ancillary ground-based observations. The results highlight the importance of u-band observations, especially to discriminate between low (z < 0.5) and high (z ~ 3) redshifts, and the need for good observing sites, with seeing FWHM < 1. arcsec. We then construct an optimal filter to detect galaxy clusters through photometric catalogues of galaxies, and we test it on the COSMOS field, obtaining 27 lensing-confirmed detections. Applying this algorithm on mock Euclid data, we verify the possibility to detect clusters with mass above 10^14.2 solar masses with a low rate of false detections.
Resumo:
This thesis is divided in three chapters. In the first chapter we analyse the results of the world forecasting experiment run by the Collaboratory for the Study of Earthquake Predictability (CSEP). We take the opportunity of this experiment to contribute to the definition of a more robust and reliable statistical procedure to evaluate earthquake forecasting models. We first present the models and the target earthquakes to be forecast. Then we explain the consistency and comparison tests that are used in CSEP experiments to evaluate the performance of the models. Introducing a methodology to create ensemble forecasting models, we show that models, when properly combined, are almost always better performing that any single model. In the second chapter we discuss in depth one of the basic features of PSHA: the declustering of the seismicity rates. We first introduce the Cornell-McGuire method for PSHA and we present the different motivations that stand behind the need of declustering seismic catalogs. Using a theorem of the modern probability (Le Cam's theorem) we show that the declustering is not necessary to obtain a Poissonian behaviour of the exceedances that is usually considered fundamental to transform exceedance rates in exceedance probabilities in the PSHA framework. We present a method to correct PSHA for declustering, building a more realistic PSHA. In the last chapter we explore the methods that are commonly used to take into account the epistemic uncertainty in PSHA. The most widely used method is the logic tree that stands at the basis of the most advanced seismic hazard maps. We illustrate the probabilistic structure of the logic tree, and then we show that this structure is not adequate to describe the epistemic uncertainty. We then propose a new probabilistic framework based on the ensemble modelling that properly accounts for epistemic uncertainties in PSHA.