67 resultados para Spatial data
Resumo:
Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date tweets such that their locations are close to a user specified location and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “food poisoning vomiting.” We consider the Temporal Spatial-Keyword Top-k Subscription (TaSK) query. Given a TaSK query, we continuously maintain up-to-date top-k most relevant results over a stream of geo-textual objects (e.g., geo-tagged Tweets) for the query. The TaSK query takes into account text relevance, spatial proximity, and recency of geo-textual objects in evaluating its relevance with a geo-textual object. We propose a novel solution to efficiently process a large number of TaSK queries over a stream of geotextual objects. We evaluate the efficiency of our approach on two real-world datasets and the experimental results show that our solution is able to achieve a reduction of the processing time by 70-80% compared with two baselines.
Resumo:
The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as – weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases – i) discovering homogeneous regions, and ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.
Resumo:
Conventional practice in Regional Geochemistry includes as a final step of any geochemical campaign the generation of a series of maps, to show the spatial distribution of each of the components considered. Such maps, though necessary, do not comply with the compositional, relative nature of the data, which unfortunately make any conclusion based on them sensitive
to spurious correlation problems. This is one of the reasons why these maps are never interpreted isolated. This contribution aims at gathering a series of statistical methods to produce individual maps of multiplicative combinations of components (logcontrasts), much in the flavor of equilibrium constants, which are designed on purpose to capture certain aspects of the data.
We distinguish between supervised and unsupervised methods, where the first require an external, non-compositional variable (besides the compositional geochemical information) available in an analogous training set. This external variable can be a quantity (soil density, collocated magnetics, collocated ratio of Th/U spectral gamma counts, proportion of clay particle fraction, etc) or a category (rock type, land use type, etc). In the supervised methods, a regression-like model between the external variable and the geochemical composition is derived in the training set, and then this model is mapped on the whole region. This case is illustrated with the Tellus dataset, covering Northern Ireland at a density of 1 soil sample per 2 square km, where we map the presence of blanket peat and the underlying geology. The unsupervised methods considered include principal components and principal balances
(Pawlowsky-Glahn et al., CoDaWork2013), i.e. logcontrasts of the data that are devised to capture very large variability or else be quasi-constant. Using the Tellus dataset again, it is found that geological features are highlighted by the quasi-constant ratios Hf/Nb and their ratio against SiO2; Rb/K2O and Zr/Na2O and the balance between these two groups of two variables; the balance of Al2O3 and TiO2 vs. MgO; or the balance of Cr, Ni and Co vs. V and Fe2O3. The largest variability appears to be related to the presence/absence of peat.
Resumo:
Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “dengue fever headache.” In this demonstration, we present SOPS, the Spatial-Keyword Publish/Subscribe System, that is capable of efficiently processing spatial keyword continuous queries. SOPS supports two types of queries: (1) Boolean Range Continuous (BRC) query that can be used to subscribe the geo-textual objects satisfying a boolean keyword expression and falling in a specified spatial region; (2) Temporal Spatial-Keyword Top-k Continuous (TaSK) query that continuously maintains up-to-date top-k most relevant results over a stream of geo-textual objects. SOPS enables users to formulate their queries and view the real-time results over a stream of geotextual objects by browser-based user interfaces. On the server side, we propose solutions to efficiently processing a large number of BRC queries (tens of millions) and TaSK queries over a stream of geo-textual objects.
Resumo:
Administrative systems such as health care registration are of increasing importance in providing information for statistical, research, and policy purposes. There is thus a pressing need to understand better the detailed relationship between population characteristics as recorded in such systems and conventional censuses. This paper explores these issues using the unique Northern Ireland Longitudinal Study (NILS). It takes the 2001 Census enumeration as a benchmark and analyses the social, demographic and spatial patterns of mismatch with the health register at individual level. Descriptive comparison is followed by multivariate and multilevel analyses which show that approximately 25% of individuals are reported to be in different addresses and that age, rurality, education, and housing type are all important factors. This level of mismatch appears to be maintained over time, as earlier migrants who update their address details are replaced by others who have not yet done so. In some cases, apparent mismatches seem likely to reflect complex multi-address living arrangements rather than data error.