88 resultados para Spatial data mining


Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we discuss the current state-of-the-art in estimating, evaluating, and selecting among non-linear forecasting models for economic and financial time series. We review theoretical and empirical issues, including predictive density, interval and point evaluation and model selection, loss functions, data-mining, and aggregation. In addition, we argue that although the evidence in favor of constructing forecasts using non-linear models is rather sparse, there is reason to be optimistic. However, much remains to be done. Finally, we outline a variety of topics for future research, and discuss a number of areas which have received considerable attention in the recent literature, but where many questions remain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A glance along the finance shelves at any bookshop reveals a large number of books that seek to show readers how to ‘make a million’ or ‘beat the market’ with allegedly highly profitable equity trading strategies. This paper investigates whether useful trading strategies can be derived from popular books of investment strategy, with What Works on Wall Street by James P. O'Shaughnessy used as an example. Specifically, we test whether this strategy would have produced a similarly spectacular performance in the UK context as was demonstrated by the author for the US market. As part of our investigation, we highlight a general methodology for determining whether the observed superior performance of a trading rule could be attributed in part or in entirety to data mining. Overall, we find that the O'Shaughnessy rule performs reasonably well in the UK equity market, yielding higher returns than the FTSE All-Share Index, but lower returns than an equally weighted benchmark

Relevância:

80.00% 80.00%

Publicador:

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Twitter is both a micro-blogging service and a platform for public conversation. Direct conversation is facilitated in Twitter through the use of @’s (mentions) and replies. While the conversational element of Twitter is of particular interest to the marketing sector, relatively few data-mining studies have focused on this area. We analyse conversations associated with reciprocated mentions that take place in a data-set consisting of approximately 4 million tweets collected over a period of 28 days that contain at least one mention. We ignore tweet content and instead use the mention network structure and its dynamical properties to identify and characterise Twitter conversations between pairs of users and within larger groups. We consider conversational balance, meaning the fraction of content contributed by each party. The goal of this work is to draw out some of the mechanisms driving conversation in Twitter, with the potential aim of developing conversational models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To analyze patterns in marine productivity, harmful algal blooms, thermal stress in coral reefs, and oceanographic processes, optical and biophysical marine parameters, such as sea surface temperature, and ocean color products, such as chlorophyll-a concentration, diffuse attenuation coefficient, total suspended matter concentration, chlorophyll fluorescence line height, and remote sensing reflectance, are required. In this paper we present a novel automatic Satellite-based Ocean Monitoring System (SATMO) developed to provide, in near real-time, continuous spatial data sets of the above-mentioned variables for marine-coastal ecosystems in the Gulf of Mexico, northeastern Pacific Ocean, and western Caribbean Sea, with 1 km spatial resolution. The products are obtained from Moderate Resolution Imaging Spectroradiometer (MODIS) images received at the Direct Readout Ground Station (located at CONABIO) after each overpass of the Aqua and Terra satellites. In addition, at the end of each week and month the system provides composite images for several ocean products, as well as weekly and monthly anomaly composites for chlorophyll-a concentration and sea surface temperature. These anomaly data are reported for the first time for the study region and represent valuable information for analyzing time series of ocean color data for the study of coastal and marine ecosystems in Mexico, Central America, and the western Caribbean.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A basic data requirement of a river flood inundation model is a Digital Terrain Model (DTM) of the reach being studied. The scale at which modeling is required determines the accuracy required of the DTM. For modeling floods in urban areas, a high resolution DTM such as that produced by airborne LiDAR (Light Detection And Ranging) is most useful, and large parts of many developed countries have now been mapped using LiDAR. In remoter areas, it is possible to model flooding on a larger scale using a lower resolution DTM, and in the near future the DTM of choice is likely to be that derived from the TanDEM-X Digital Elevation Model (DEM). A variable-resolution global DTM obtained by combining existing high and low resolution data sets would be useful for modeling flood water dynamics globally, at high resolution wherever possible and at lower resolution over larger rivers in remote areas. A further important data resource used in flood modeling is the flood extent, commonly derived from Synthetic Aperture Radar (SAR) images. Flood extents become more useful if they are intersected with the DTM, when water level observations (WLOs) at the flood boundary can be estimated at various points along the river reach. To illustrate the utility of such a global DTM, two examples of recent research involving WLOs at opposite ends of the spatial scale are discussed. The first requires high resolution spatial data, and involves the assimilation of WLOs from a real sequence of high resolution SAR images into a flood model to update the model state with observations over time, and to estimate river discharge and model parameters, including river bathymetry and friction. The results indicate the feasibility of such an Earth Observation-based flood forecasting system. The second example is at a larger scale, and uses SAR-derived WLOs to improve the lower-resolution TanDEM-X DEM in the area covered by the flood extents. The resulting reduction in random height error is significant.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sparse coding aims to find a more compact representation based on a set of dictionary atoms. A well-known technique looking at 2D sparsity is the low rank representation (LRR). However, in many computer vision applications, data often originate from a manifold, which is equipped with some Riemannian geometry. In this case, the existing LRR becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to applications. In this paper, we generalize the LRR over the Euclidean space to the LRR model over a specific Rimannian manifold—the manifold of symmetric positive matrices (SPD). Experiments on several computer vision datasets showcase its noise robustness and superior performance on classification and segmentation compared with state-of-the-art approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The contraction of a species’ distribution range, which results from the extirpation of local populations, generally precedes its extinction. Therefore, understanding drivers of range contraction is important for conservation and management. Although there are many processes that can potentially lead to local extirpation and range contraction, three main null models have been proposed: demographic, contagion, and refuge. The first two models postulate that the probability of local extirpation for a given area depends on its relative position within the range; but these models generate distinct spatial predictions because they assume either a ubiquitous (demographic) or a clinal (contagion) distribution of threats. The third model (refuge) postulates that extirpations are determined by the intensity of human impacts, leading to heterogeneous spatial predictions potentially compatible with those made by the other two null models. A few previous studies have explored the generality of some of these null models, but we present here the first comprehensive evaluation of all three models. Using descriptive indices and regression analyses we contrast the predictions made by each of the null models using empirical spatial data describing range contraction in 386 terrestrial vertebrates (mammals, birds, amphibians, and reptiles) distributed across the World. Observed contraction patterns do not consistently conform to the predictions of any of the three models, suggesting that these may not be adequate null models to evaluate range contraction dynamics among terrestrial vertebrates. Instead, our results support alternative null models that account for both relative position and intensity of human impacts. These new models provide a better multifactorial baseline to describe range contraction patterns in vertebrates. This general baseline can be used to explore how additional factors influence contraction, and ultimately extinction for particular areas or species as well as to predict future changes in light of current and new threats.