919 resultados para exploratory spatial data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study aimed to analyze the spatial distribution of dengue risk and its association with socio-environmental conditions. This was an ecological study of the counts of autochthonous dengue cases in the municipality of Campinas, São Paulo State, Brazil, in the year 2007, aggregated according to 47 coverage areas of municipal health centers. Spatial models for mapping diseases were constructed with Bayesian hierarchical models, based on Integrated Nested Laplace Approximation (INLA). The analyses were stratified according to two age groups, 0 to 14 years and above 14 years. The results indicate that the spatial distribution of dengue risk is not associated with socio-environmental conditions in the 0 to 14 year age group. In the age group older than 14 years, the relative risk of dengue increases significantly as the level of socio-environmental deprivation increases. Mapping of socio-environmental deprivation and dengue cases proved to be a useful tool for data analysis in dengue surveillance systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently the study of important molecular compounds present in low abundance in some tissues has been a challenge for proteomic analysis classic. An analysis requires more exploratory investigation of small regions of a tissue or a group of cells. MALDI Imaging Technology (MSI) is an application of mass spectrometry facing the chemical analysis of intact tissues. Thus, advances in mass spectrometry MALDI being obtained by the integration of histology, the best methods and automation are the main tools of data analysis. This tool has become essential to analyze the spatial distribution of peptides and proteins throughout the tissue sections, providing an enormous amount of data with minimum sample preparation. Thus, the aim of this study was to develop the technique of MALDI Imaging using tissue from glioblastoma multiforme (GBM), a form of most common malignant tumor in the brain. For this we used the printer chemical ChIP-1000 (Chemical Inkjet Printer, Shimadzu) and mass spectrometer type Maldi-ToF-ToF (Axima Performance, Shimadzu), a search of the identifications were performed in databases such as SwissProt. We identified more than forty proteins with diverse functions such as proteins F-actin-capping and Thymosin to the structure and organization cellular and proteins such several Tumor necrosis factor receptor development-related pathology. The development of this technique will permit to carry-out proteomic analysis directly into the tissue, enabling earlier diagnosis of diseases, as well as the identification and characterization of potential biomarkers of disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Climatic factors directly influence growth and productivity of plants inside greenhouses, where temperature can be considered one of the major parameter in this context. Thus, the aim of this research was to develop a low cost device for thermal sensing and data acquisition, and use it in data collection and analysis of spatial variability of temperature inside a greenhouse with tropical climate. The developed equipment for thermal measurements showed a high degree of accuracy and fast responses in measurements, proving its efficiency. The data analysis interpretations were made from the elaborations of variograms and of tridimensional maps generated by a geostatistical software. The processed data analysis presented that a greenhouse without thermal control has spatial variations of air temperature, both in the sampled horizontals layers as in the three analyzed vertical columns, presenting variations of up to 3.6 ºC in certain times.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis contributes to the current debate in literature about local economic development by considering two different topics: quality of institutions, and the role of clusters in innovation and productivity growth. The research is built upon three papers. The first paper deals with the analysis of the effect of administrative continuity on administrative efficiency. The analysis underlines the importance of different typologies of social capital. Findings reveal a positive impact on administrative efficiency (AE) by administrative continuity (AC) when it is coupled by bridging and linking social capital. On the contrary, bonding social capital influences negatively the effect by AC on AE. The second paper investigates the spatial interaction in levels of quality of government (QoG) among European regions. Notwithstanding the largely recognised role by institutions in the design of regional policies, no study has been conducted about the mechanisms of interaction and diffusion of QoG at regional level. This research wants to overcome this knowledge gap in literature. Findings reveal a heterogeneity in spatial interaction among groups of regions, i.e. ‘leader regions’ (Northern regions) and ‘lagging regions’ (Southern regions), when considering different mechanisms of interaction (learning / imitating competition and pure competition). Moreover, the effect of wealth on the levels of QoG is nonlinear. Finally, the third paper analyses the relation among specialization and productivity within the agricultural sector. In literature, the study of clusters dynamics has long neglected agriculture. The analysis describes the changes in sectorial specialization for eight main crop groups in Italian regions (NUTS 3), assessing the existence of spatial autocorrelations by using an exploratory data analysis. Furthermore, the effect of specialization on productivity is analysed within the main crop groups using a spatial panel data model. Findings reveal a marked tendency to specialization in the Italian agriculture, and a heterogeneous effect by specialization on productivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rotational nature of shifting cultivation poses several challenges to its detection by remote sensing. Consequently, there is a lack of spatial data on the dynamics of shifting cultivation landscapes on a regional, i.e. sub-national, or national level. We present an approach based on a time series of Landsat and MODIS data and landscape metrics to delineate the dynamics of shifting cultivation landscapes. Our results reveal that shifting cultivation is a land use system still widely and dynamically utilized in northern Laos. While there is an overall reduction in the areas dominated by shifting cultivation, some regions also show an expansion. A review of relevant reports and articles indicates that policies tend to lead to a reduction while market forces can result in both expansion and reduction. For a better understanding of the different factors affecting shifting cultivation landscapes in Laos, further research should focus on spatially explicit analyses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data. Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data. Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review. Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a database of ambient air pollution measurements in the United States and to a hypothetical portfolio of stocks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary challenge in groundwater and contaminant transport modeling is obtaining the data needed for constructing, calibrating and testing the models. Large amounts of data are necessary for describing the hydrostratigraphy in areas with complex geology. Increasingly states are making spatial data available that can be used for input to groundwater flow models. The appropriateness of this data for large-scale flow systems has not been tested. This study focuses on modeling a plume of 1,4-dioxane in a heterogeneous aquifer system in Scio Township, Washtenaw County, Michigan. The analysis consisted of: (1) characterization of hydrogeology of the area and construction of a conceptual model based on publicly available spatial data, (2) development and calibration of a regional flow model for the site, (3) conversion of the regional model to a more highly resolved local model, (4) simulation of the dioxane plume, and (5) evaluation of the model's ability to simulate field data and estimation of the possible dioxane sources and subsequent migration until maximum concentrations are at or below the Michigan Department of Environmental Quality's residential cleanup standard for groundwater (85 ppb). MODFLOW-2000 and MT3D programs were utilized to simulate the groundwater flow and the development and movement of the 1, 4-dioxane plume, respectively. MODFLOW simulates transient groundwater flow in a quasi-3-dimensional sense, subject to a variety of boundary conditions that can simulate recharge, pumping, and surface-/groundwater interactions. MT3D simulates solute advection with groundwater flow (using the flow solution from MODFLOW), dispersion, source/sink mixing, and chemical reaction of contaminants. This modeling approach was successful at simulating the groundwater flows by calibrating recharge and hydraulic conductivities. The plume transport was adequately simulated using literature dispersivity and sorption coefficients, although the plume geometries were not well constrained.