2 resultados para pre-filtering

em Aston University Research Archive


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We report an empirical analysis of long-range dependence in the returns of eight stock market indices, using the Rescaled Range Analysis (RRA) to estimate the Hurst exponent. Monte Carlo and bootstrap simulations are used to construct critical values for the null hypothesis of no long-range dependence. The issue of disentangling short-range and long-range dependence is examined. Pre-filtering by fitting a (short-range) autoregressive model eliminates part of the long-range dependence when the latter is present, while failure to pre-filter leaves open the possibility of conflating short-range and long-range dependence. There is a strong evidence of long-range dependence for the small central European Czech stock market index PX-glob, and a weaker evidence for two smaller western European stock market indices, MSE (Spain) and SWX (Switzerland). There is little or no evidence of long-range dependence for the other five indices, including those with the largest capitalizations among those considered, DJIA (US) and FTSE350 (UK). These results are generally consistent with prior expectations concerning the relative efficiency of the stock markets examined. © 2011 Elsevier Inc.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space