89 resultados para Data Mining, Rough Sets, Multi-Dimension, Association Rules, Constraint
Resumo:
We present projections of winter storm-induced insured losses in the German residential building sector for the 21st century. With this aim, two structurally most independent downscaling methods and one hybrid downscaling method are applied to a 3-member ensemble of ECHAM5/MPI-OM1 A1B scenario simulations. One method uses dynamical downscaling of intense winter storm events in the global model, and a transfer function to relate regional wind speeds to losses. The second method is based on a reshuffling of present day weather situations and sequences taking into account the change of their frequencies according to the linear temperature trends of the global runs. The third method uses statistical-dynamical downscaling, considering frequency changes of the occurrence of storm-prone weather patterns, and translation into loss by using empirical statistical distributions. The A1B scenario ensemble was downscaled by all three methods until 2070, and by the (statistical-) dynamical methods until 2100. Furthermore, all methods assume a constant statistical relationship between meteorology and insured losses and no developments other than climate change, such as in constructions or claims management. The study utilizes data provided by the German Insurance Association encompassing 24 years and with district-scale resolution. Compared to 1971–2000, the downscaling methods indicate an increase of 10-year return values (i.e. loss ratios per return period) of 6–35 % for 2011–2040, of 20–30 % for 2041–2070, and of 40–55 % for 2071–2100, respectively. Convolving various sources of uncertainty in one confidence statement (data-, loss model-, storm realization-, and Pareto fit-uncertainty), the return-level confidence interval for a return period of 15 years expands by more than a factor of two. Finally, we suggest how practitioners can deal with alternative scenarios or possible natural excursions of observed losses.
Resumo:
In this paper we discuss the current state-of-the-art in estimating, evaluating, and selecting among non-linear forecasting models for economic and financial time series. We review theoretical and empirical issues, including predictive density, interval and point evaluation and model selection, loss functions, data-mining, and aggregation. In addition, we argue that although the evidence in favor of constructing forecasts using non-linear models is rather sparse, there is reason to be optimistic. However, much remains to be done. Finally, we outline a variety of topics for future research, and discuss a number of areas which have received considerable attention in the recent literature, but where many questions remain.
Resumo:
A glance along the finance shelves at any bookshop reveals a large number of books that seek to show readers how to ‘make a million’ or ‘beat the market’ with allegedly highly profitable equity trading strategies. This paper investigates whether useful trading strategies can be derived from popular books of investment strategy, with What Works on Wall Street by James P. O'Shaughnessy used as an example. Specifically, we test whether this strategy would have produced a similarly spectacular performance in the UK context as was demonstrated by the author for the US market. As part of our investigation, we highlight a general methodology for determining whether the observed superior performance of a trading rule could be attributed in part or in entirety to data mining. Overall, we find that the O'Shaughnessy rule performs reasonably well in the UK equity market, yielding higher returns than the FTSE All-Share Index, but lower returns than an equally weighted benchmark
Resumo:
Twitter is both a micro-blogging service and a platform for public conversation. Direct conversation is facilitated in Twitter through the use of @’s (mentions) and replies. While the conversational element of Twitter is of particular interest to the marketing sector, relatively few data-mining studies have focused on this area. We analyse conversations associated with reciprocated mentions that take place in a data-set consisting of approximately 4 million tweets collected over a period of 28 days that contain at least one mention. We ignore tweet content and instead use the mention network structure and its dynamical properties to identify and characterise Twitter conversations between pairs of users and within larger groups. We consider conversational balance, meaning the fraction of content contributed by each party. The goal of this work is to draw out some of the mechanisms driving conversation in Twitter, with the potential aim of developing conversational models.
Resumo:
The England and Wales precipitation (EWP) dataset is a homogeneous time series of daily accumulations from 1931 to 2014, composed from rain gauge observations spanning the region. The daily regional-average precipitation statistics are shown to be well described by a Weibull distribution, which is used to define extremes in terms of percentiles. Computed trends in annual and seasonal precipitation are sensitive to the period chosen, due to large variability on interannual and decadal timescales. Atmospheric circulation patterns associated with seasonal precipitation variability are identified. These patterns project onto known leading modes of variability, all of which involve displacements of the jet stream and storm-track over the eastern Atlantic. The intensity of daily precipitation for each calendar season is investigated by partitioning all observations into eight intensity categories contributing equally to the total precipitation in the dataset. Contrary to previous results based on shorter periods, no significant trends of the most intense categories are found between 1931 and 2014. The regional-average precipitation is found to share statistical properties common to the majority of individual stations across England and Wales used in previous studies. Statistics of the EWP data are examined for multi-day accumulations up to 10 days, which are more relevant for river flooding. Four recent years (2000, 2007, 2008 and 2012) have a greater number of extreme events in the 3-and 5-day accumulations than any previous year in the record. It is the duration of precipitation events in these years that is remarkable, rather than the magnitude of the daily accumulations.
Resumo:
Sparse coding aims to find a more compact representation based on a set of dictionary atoms. A well-known technique looking at 2D sparsity is the low rank representation (LRR). However, in many computer vision applications, data often originate from a manifold, which is equipped with some Riemannian geometry. In this case, the existing LRR becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to applications. In this paper, we generalize the LRR over the Euclidean space to the LRR model over a specific Rimannian manifold—the manifold of symmetric positive matrices (SPD). Experiments on several computer vision datasets showcase its noise robustness and superior performance on classification and segmentation compared with state-of-the-art approaches.
Resumo:
TIGGE was a major component of the THORPEX (The Observing System Research and Predictability Experiment) research program, whose aim is to accelerate improvements in forecasting high-impact weather. By providing ensemble prediction data from leading operational forecast centers, TIGGE has enhanced collaboration between the research and operational meteorological communities and enabled research studies on a wide range of topics. The paper covers the objective evaluation of the TIGGE data. For a range of forecast parameters, it is shown to be beneficial to combine ensembles from several data providers in a Multi-model Grand Ensemble. Alternative methods to correct systematic errors, including the use of reforecast data, are also discussed. TIGGE data have been used for a range of research studies on predictability and dynamical processes. Tropical cyclones are the most destructive weather systems in the world, and are a focus of multi-model ensemble research. Their extra-tropical transition also has a major impact on skill of mid-latitude forecasts. We also review how TIGGE has added to our understanding of the dynamics of extra-tropical cyclones and storm tracks. Although TIGGE is a research project, it has proved invaluable for the development of products for future operational forecasting. Examples include the forecasting of tropical cyclone tracks, heavy rainfall, strong winds, and flood prediction through coupling hydrological models to ensembles. Finally the paper considers the legacy of TIGGE. We discuss the priorities and key issues in predictability and ensemble forecasting, including the new opportunities of convective-scale ensembles, links with ensemble data assimilation methods, and extension of the range of useful forecast skill.
Resumo:
This paper proposes a novel adaptive multiple modelling algorithm for non-linear and non-stationary systems. This simple modelling paradigm comprises K candidate sub-models which are all linear. With data available in an online fashion, the performance of all candidate sub-models are monitored based on the most recent data window, and M best sub-models are selected from the K candidates. The weight coefficients of the selected sub-model are adapted via the recursive least square (RLS) algorithm, while the coefficients of the remaining sub-models are unchanged. These M model predictions are then optimally combined to produce the multi-model output. We propose to minimise the mean square error based on a recent data window, and apply the sum to one constraint to the combination parameters, leading to a closed-form solution, so that maximal computational efficiency can be achieved. In addition, at each time step, the model prediction is chosen from either the resultant multiple model or the best sub-model, whichever is the best. Simulation results are given in comparison with some typical alternatives, including the linear RLS algorithm and a number of online non-linear approaches, in terms of modelling performance and time consumption.