918 resultados para Spatial analysis statistics -- Data processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial variability of Vertisol properties is relevant for identifying those zones with physical degradation. In this sense, one has to face the problem of identifying the origin and distribution of spatial variability patterns. The objectives of the present work were (i) to quantify the spatial structure of different physical properties collected from a Vertisol, (ii) to search for potential correlations between different spatial patterns and (iii) to identify relevant components through multivariate spatial analysis. The study was conducted on a Vertisol (Typic Hapludert) dedicated to sugarcane (Saccharum officinarum L.) production during the last sixty years. We used six soil properties collected from a squared grid (225 points) (penetrometer resistance (PR), total porosity, fragmentation dimension (Df), vertical electrical conductivity (ECv), horizontal electrical conductivity (ECh) and soil water content (WC)). All the original data sets were z-transformed before geostatistical analysis. Three different types of semivariogram models were necessary for fitting individual experimental semivariograms. This suggests the different natures of spatial variability patterns. Soil water content rendered the largest nugget effect (C0 = 0.933) while soil total porosity showed the largest range of spatial correlation (A = 43.92 m). The bivariate geostatistical analysis also rendered significant cross-semivariance between different paired soil properties. However, four different semivariogram models were required in that case. This indicates an underlying co-regionalization between different soil properties, which is of interest for delineating management zones within sugarcane fields. Cross-semivariograms showed larger correlation ranges than individual, univariate, semivariograms (A ≥ 29 m). All the findings were supported by multivariate spatial analysis, which showed the influence of soil tillage operations, harvesting machinery and irrigation water distribution on the status of the investigated area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The crop simulation model AquaCrop, recently developed by FAO can be used for a wide range of purposes. However, in its present form, its use over large areas or for applications that require a large number of simulations runs (e.g., long-term analysis), is not practical without developing software to facilitate such applications. Two tools for managing the inputs and outputs of AquaCrop, named AquaData and AquaGIS, have been developed for this purpose and are presented here. Both software utilities have been programmed in Delphi v. 5 and in addition, AquaGIS requires the Geographic Information System (GIS) programming tool MapObjects. These utilities allow the efficient management of input and output files, along with a GIS module to develop spatial analysis and effect spatial visualization of the results, facilitating knowledge dissemination. A sample of application of the utilities is given here, as an AquaCrop simulation analysis of impact of climate change on wheat yield in Southern Spain, which requires extensive input data preparation and output processing. The use of AquaCrop without the two utilities would have required approximately 1000 h of work, while the utilization of AquaData and AquaGIS reduced that time by more than 99%. Furthermore, the use of GIS, made it possible to perform a spatial analysis of the results, thus providing a new option to extend the use of the AquaCrop model to scales requiring spatial and temporal analyses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new methodology is proposed to produce subsidence activity maps based on the geostatistical analysis of persistent scatterer interferometry (PSI) data. PSI displacement measurements are interpolated based on conditional Sequential Gaussian Simulation (SGS) to calculate multiple equiprobable realizations of subsidence. The result from this process is a series of interpolated subsidence values, with an estimation of the spatial variability and a confidence level on the interpolation. These maps complement the PSI displacement map, improving the identification of wide subsiding areas at a regional scale. At a local scale, they can be used to identify buildings susceptible to suffer subsidence related damages. In order to do so, it is necessary to calculate the maximum differential settlement and the maximum angular distortion for each building of the study area. Based on PSI-derived parameters those buildings in which the serviceability limit state has been exceeded, and where in situ forensic analysis should be made, can be automatically identified. This methodology has been tested in the city of Orihuela (SE Spain) for the study of historical buildings damaged during the last two decades by subsidence due to aquifer overexploitation. The qualitative evaluation of the results from the methodology carried out in buildings where damages have been reported shows a success rate of 100%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Santas Justa and Rufina Gothic church (fourteenth century) has suffered several physical, mechanical, chemical, and biochemical types of pathologies along its history: rock alveolization, efflorescence, biological activity, and capillary ascent of groundwater. However, during the last two decades, a new phenomenon has seriously affected the church: ground subsidence caused by aquifer overexploitation. Subsidence is a process that affects the whole Vega Baja of the Segura River basin and consists of gradual sinking in the ground surface caused by soil consolidation due to a pore pressure decrease. This phenomenon has been studied by differential synthetic aperture radar interferometry techniques, which illustrate settlements up to 100 mm for the 1993–2009 period for the whole Orihuela city. Although no differential synthetic aperture radar interferometry information is available for the church due to the loss of interferometric coherence, the spatial analysis of nearby deformation combined with fieldwork has advanced the current understanding on the mechanisms that affect the Santas Justa and Rufina church. These results show the potential interest and the limitations of using this remote sensing technique as a complementary tool for the forensic analysis of building structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying cloud interference in satellite-derived data is a critical step toward developing useful remotely sensed products. Most MODIS land products use a combination of the MODIS (MOD35) cloud mask and the 'internal' cloud mask of the surface reflectance product (MOD09) to mask clouds, but there has been little discussion of how these masks differ globally. We calculated global mean cloud frequency for both products, for 2009, and found that inflated proportions of observations were flagged as cloudy in the Collection 5 MOD35 product. These erroneously categorized areas were spatially and environmentally non-random and usually occurred over high-albedo land-cover types (such as grassland and savanna) in several regions around the world. Additionally, we found that spatial variability in the processing path applied in the Collection 5 MOD35 algorithm affects the likelihood of a cloudy observation by up to 20% in some areas. These factors result in abrupt transitions in recorded cloud frequency across landcover and processing-path boundaries impeding their use for fine-scale spatially contiguous modeling applications. We show that together, these artifacts have resulted in significantly decreased and spatially biased data availability for Collection 5 MOD35-derived composite MODIS land products such as land surface temperature (MOD11) and net primary productivity (MOD17). Finally, we compare our results to mean cloud frequency in the new Collection 6 MOD35 product, and find that landcover artifacts have been reduced but not eliminated. Collection 6 thus increases data availability for some regions and land cover types in MOD35-derived products but practitioners need to consider how the remaining artifacts might affect their analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"Research was supported by the United States Air Force through the Air Force Office of Scientific Research, Air Research and Development Command."

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this second article, statistical ideas are extended to the problem of testing whether there is a true difference between two samples of measurements. First, it will be shown that the difference between the means of two samples comes from a population of such differences which is normally distributed. Second, the 't' distribution, one of the most important in statistics, will be applied to a test of the difference between two means using a simple data set drawn from a clinical experiment in optometry. Third, in making a t-test, a statistical judgement is made as to whether there is a significant difference between the means of two samples. Before the widespread use of statistical software, this judgement was made with reference to a statistical table. Even if such tables are not used, it is useful to understand their logical structure and how to use them. Finally, the analysis of data, which are known to depart significantly from the normal distribution, will be described.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this paper is to investigate the technological development of electronic inventory solutions from perspective of patent analysis. We first applied the international patent classification to classify the top categories of data processing technologies and their corresponding top patenting countries. Then we identified the core technologies by the calculation of patent citation strength and standard deviation criterion for each patent. To eliminate those core innovations having no reference relationships with the other core patents, relevance strengths between core technologies were evaluated also. Our findings provide market intelligence not only for the research and development community, but for the decision making of advanced inventory solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses some aspects of hunter-gatherer spatial organization in southern South Patagonia, in later times to 10,000 cal yr BP. Various methods of spatial analysis, elaborated with a Geographic Information System (GIS) were applied to the distributional pattern of archaeological sites with radiocarbon dates. The shift in the distributional pattern of chronological information was assessed in conjunction with other lines of evidence within a biogeographic framework. Accordingly, the varying degrees of occupation and integration of coastal and interior spaces in human spatial organization are explained in association with the adaptive strategies hunter-gatherers have used over time. Both are part of the same human response to changes in risk and uncertainty variability in the region in terms of resource availability and environmental dynamics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this novel experimental study is to investigate the behaviour of a 2m x 2m model of a masonry groin vault, which is built by the assembly of blocks made of a 3D-printed plastic skin filled with mortar. The choice of the groin vault is due to the large presence of this vulnerable roofing system in the historical heritage. Experimental tests on the shaking table are carried out to explore the vault response on two support boundary conditions, involving four lateral confinement modes. The data processing of markers displacement has allowed to examine the collapse mechanisms of the vault, based on the arches deformed shapes. There then follows a numerical evaluation, to provide the orders of magnitude of the displacements associated to the previous mechanisms. Given that these displacements are related to the arches shortening and elongation, the last objective is the definition of a critical elongation between two diagonal bricks and consequently of a diagonal portion. This study aims to continue the previous work and to take another step forward in the research of ground motion effects on masonry structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thesis represents the conclusive outcome of the European Joint Doctorate programmein Law, Science & Technology funded by the European Commission with the instrument Marie Skłodowska-Curie Innovative Training Networks actions inside of the H2020, grantagreement n. 814177. The tension between data protection and privacy from one side, and the need of granting further uses of processed personal datails is investigated, drawing the lines of the technological development of the de-anonymization/re-identification risk with an explorative survey. After acknowledging its span, it is questioned whether a certain degree of anonymity can still be granted focusing on a double perspective: an objective and a subjective perspective. The objective perspective focuses on the data processing models per se, while the subjective perspective investigates whether the distribution of roles and responsibilities among stakeholders can ensure data anonymity.