4 resultados para spatial correlation
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.
Resumo:
This doctoral thesis aims at contributing to the literature on transition economies focusing on the Russian Federations and in particular on regional income convergence and fertility patterns. The first two chapter deal with the issue of income convergence across regions. Chapter 1 provides an historical-institutional analysis of the period between the late years of the Soviet Union and the last decade of economic growth and a presentation of the sample with a description of gross regional product composition, agrarian or industrial vocation, labor. Chapter 2 contributes to the literature on exploratory spatial data analysis with a application to a panel of 77 regions in the period 1994-2008. It provides an analysis of spatial patterns and it extends the theoretical framework of growth regressions controlling for spatial correlation and heterogeneity. Chapter 3 analyses the national demographic patterns since 1960 and provides a review of the policies on maternity leave and family benefits. Data sources are the Statistical Yearbooks of USSR, the Statistical Yearbooks of the Russian Soviet Federative Socialist Republic and the Demographic Yearbooks of Russia. Chapter 4 analyses the demographic patterns in light of the theoretical framework of the Becker model, the Second Demographic Transition and an economic-crisis argument. With national data from 1960, the theoretically issue of the pro or countercyclical relation between income and fertility is graphically analyzed and discussed, together with female employment and education. With regional data after 1994 different panel data models are tested. Individual level data from the Russian Longitudinal Monitoring Survey are employed using the logit model. Chapter 5 employs data from the Generations and Gender Survey by UNECE to focus on postponement and second births intentions. Postponement is studied through cohort analysis of mean maternal age at first birth, while the methodology used for second birth intentions is the ordered logit model.
Resumo:
The presented study carried out an analysis on rural landscape changes. In particular the study focuses on the understanding of driving forces acting on the rural built environment using a statistical spatial model implemented through GIS techniques. It is well known that the study of landscape changes is essential for a conscious decision making in land planning. From a bibliography review results a general lack of studies dealing with the modeling of rural built environment and hence a theoretical modelling approach for such purpose is needed. The advancement in technology and modernity in building construction and agriculture have gradually changed the rural built environment. In addition, the phenomenon of urbanization of a determined the construction of new volumes that occurred beside abandoned or derelict rural buildings. Consequently there are two types of transformation dynamics affecting mainly the rural built environment that can be observed: the conversion of rural buildings and the increasing of building numbers. It is the specific aim of the presented study to propose a methodology for the development of a spatial model that allows the identification of driving forces that acted on the behaviours of the building allocation. In fact one of the most concerning dynamic nowadays is related to an irrational expansion of buildings sprawl across landscape. The proposed methodology is composed by some conceptual steps that cover different aspects related to the development of a spatial model: the selection of a response variable that better describe the phenomenon under study, the identification of possible driving forces, the sampling methodology concerning the collection of data, the most suitable algorithm to be adopted in relation to statistical theory and method used, the calibration process and evaluation of the model. A different combination of factors in various parts of the territory generated favourable or less favourable conditions for the building allocation and the existence of buildings represents the evidence of such optimum. Conversely the absence of buildings expresses a combination of agents which is not suitable for building allocation. Presence or absence of buildings can be adopted as indicators of such driving conditions, since they represent the expression of the action of driving forces in the land suitability sorting process. The existence of correlation between site selection and hypothetical driving forces, evaluated by means of modeling techniques, provides an evidence of which driving forces are involved in the allocation dynamic and an insight on their level of influence into the process. GIS software by means of spatial analysis tools allows to associate the concept of presence and absence with point futures generating a point process. Presence or absence of buildings at some site locations represent the expression of these driving factors interaction. In case of presences, points represent locations of real existing buildings, conversely absences represent locations were buildings are not existent and so they are generated by a stochastic mechanism. Possible driving forces are selected and the existence of a causal relationship with building allocations is assessed through a spatial model. The adoption of empirical statistical models provides a mechanism for the explanatory variable analysis and for the identification of key driving variables behind the site selection process for new building allocation. The model developed by following the methodology is applied to a case study to test the validity of the methodology. In particular the study area for the testing of the methodology is represented by the New District of Imola characterized by a prevailing agricultural production vocation and were transformation dynamic intensively occurred. The development of the model involved the identification of predictive variables (related to geomorphologic, socio-economic, structural and infrastructural systems of landscape) capable of representing the driving forces responsible for landscape changes.. The calibration of the model is carried out referring to spatial data regarding the periurban and rural area of the study area within the 1975-2005 time period by means of Generalised linear model. The resulting output from the model fit is continuous grid surface where cells assume values ranged from 0 to 1 of probability of building occurrences along the rural and periurban area of the study area. Hence the response variable assesses the changes in the rural built environment occurred in such time interval and is correlated to the selected explanatory variables by means of a generalized linear model using logistic regression. Comparing the probability map obtained from the model to the actual rural building distribution in 2005, the interpretation capability of the model can be evaluated. The proposed model can be also applied to the interpretation of trends which occurred in other study areas, and also referring to different time intervals, depending on the availability of data. The use of suitable data in terms of time, information, and spatial resolution and the costs related to data acquisition, pre-processing, and survey are among the most critical aspects of model implementation. Future in-depth studies can focus on using the proposed model to predict short/medium-range future scenarios for the rural built environment distribution in the study area. In order to predict future scenarios it is necessary to assume that the driving forces do not change and that their levels of influence within the model are not far from those assessed for the time interval used for the calibration.