923 resultados para Sentiment classification
Resumo:
Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics.
Resumo:
We evaluate a number of real estate sentiment indices to ascertain current and forward-looking information content that may be useful for forecasting the demand and supply activities. Our focus lies on sector-specific surveys targeting the players from the supply-side of both residential and non-residential real estate markets. Analyzing the dynamic relationships within a Vector Auto-Regression (VAR) framework, we test the efficacy of these indices by comparing them with other coincident indicators in predicting real estate returns. Overall, our analysis suggests that sentiment indicators convey important information which should be embedded in the modeling exercise to predict real estate market returns. Generally, sentiment indices show better information content than broad economic indicators. The goodness of fit of our models is higher for the residential market than for the non-residential real estate sector. The impulse responses, in general, conform to our theoretical expectations. Variance decompositions and out-of-sample predictions generally show desired contribution and reasonable improvement respectively, thus upholding our hypothesis. Quite remarkably, consistent with the theory, the predictability swings when we look through different phases of the cycle. This perhaps suggests that, e.g. during recessions, market players’ expectations may be more accurate predictor of the future performances, conceivably indicating a ‘negative’ information processing bias and thus conforming to the precautionary motive of consumer behaviour.
Resumo:
Deep Brain Stimulation has been used in the study of and for treating Parkinson’s Disease (PD) tremor symptoms since the 1980s. In the research reported here we have carried out a comparative analysis to classify tremor onset based on intraoperative microelectrode recordings of a PD patient’s brain Local Field Potential (LFP) signals. In particular, we compared the performance of a Support Vector Machine (SVM) with two well known artificial neural network classifiers, namely a Multiple Layer Perceptron (MLP) and a Radial Basis Function Network (RBN). The results show that in this study, using specifically PD data, the SVM provided an overall better classification rate achieving an accuracy of 81% recognition.
Resumo:
We look through both the demand and supply side information to understand dynamics of price determination in the real estate market and examine how accurately investors’ attitudes predict the market returns and thereby flagging off extent of any demand-supply mismatch. Our hypothesis is based on the possibility that investors’ call for action in terms of their buy/sell decision and adjustment in reservation/offer prices may indicate impending demand-supply imbalances in the market. In the process, we study several real estate sectors to inform our analysis. The timeframe of our analysis (1995-2010) allows us to observe market dynamics over several economic cycles and in various stages of those cycles. Additionally, we also seek to understand how investors’ attitude or the sentiment affects the market activity over the cycles through asymmetric responses. We test our hypothesis variously using a number of measures of market activity and attitude indicators within several model specifications. The empirical models are estimated using Vector Error Correction framework. Our analysis suggests that investors’ attitude exert strong and statistically significant feedback effects in price determination. Moreover, these effects do reveal heterogeneous responses across the real estate sectors. Interestingly, our results indicate the asymmetric responses during boom, normal and recessionary periods. These results are consistent with the theoretical underpinnings.
Resumo:
Obesity is a key factor in the development of the metabolic syndrome (MetS), which is associated with increased cardiometabolic risk. We investigated whether obesity classification by body mass index (BMI) and body fat percentage (BF%) influences cardiometabolic profile and dietary responsiveness in 486 MetS subjects (LIPGENE dietary intervention study). Anthropometric measures, markers of inflammation and glucose metabolism, lipid profiles, adhesion molecules and haemostatic factors were determined at baseline and after 12 weeks of 4 dietary interventions (high saturated fat (SFA), high monounsaturated fat (MUFA) and 2 low fat high complex carbohydrate (LFHCC) diets, 1 supplemented with long chain n-3 polyunsaturated fatty acids (LC n-3 PUFAs)). 39% and 87% of subjects classified as normal and overweight by BMI were obese according to their BF%. Individuals classified as obese by BMI (± 30 kg/m2) and BF% (± 25% (men) and ± 35% (women)) (OO, n = 284) had larger waist and hip measurements, higher BMI and were heavier (P < 0.001) than those classified as non-obese by BMI but obese by BF% (NOO, n = 92). OO individuals displayed a more pro-inflammatory (higher C reactive protein (CRP) and leptin), pro-thrombotic (higher plasminogen activator inhibitor-1 (PAI-1)), pro-atherogenic (higher leptin/adiponectin ratio) and more insulin resistant (higher HOMA-IR) metabolic profile relative to the NOO group (P < 0.001). Interestingly, tumour necrosis factor alpha (TNF-α) concentrations were lower post-intervention in NOO individuals compared to OO subjects (P < 0.001). In conclusion, assessing BF% and BMI as part of a metabotype may help identify individuals at greater cardiometabolic risk than BMI alone.
Resumo:
This paper reviews the ways that quality can be assessed in standing waters, a subject that has hitherto attracted little attention but which is now a legal requirement in Europe. It describes a scheme for the assessment and monitoring of water and ecological quality in standing waters greater than about I ha in area in England & Wales although it is generally relevant to North-west Europe. Thirteen hydrological, chemical and biological variables are used to characterise the standing water body in any current sampling. These are lake volume, maximum depth, onductivity, Secchi disc transparency, pH, total alkalinity, calcium ion concentration, total N concentration,winter total oxidised inorganic nitrogen (effectively nitrate) concentration, total P concentration, potential maximum chlorophyll a concentration, a score based on the nature of the submerged and emergent plant community, and the presence or absence of a fish community. Inter alia these variables are key indicators of the state of eutrophication, acidification, salinisation and infilling of a water body.
Resumo:
In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Resumo:
The Distributed Rule Induction (DRI) project at the University of Portsmouth is concerned with distributed data mining algorithms for automatically generating rules of all kinds. In this paper we present a system architecture and its implementation for inducing modular classification rules in parallel in a local area network using a distributed blackboard system. We present initial results of a prototype implementation based on the Prism algorithm.
Resumo:
Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.
Resumo:
Induction of classification rules is one of the most important technologies in data mining. Most of the work in this field has concentrated on the Top Down Induction of Decision Trees (TDIDT) approach. However, alternative approaches have been developed such as the Prism algorithm for inducing modular rules. Prism often produces qualitatively better rules than TDIDT but suffers from higher computational requirements. We investigate approaches that have been developed to minimize the computational requirements of TDIDT, in order to find analogous approaches that could reduce the computational requirements of Prism.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.
Resumo:
The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.
Resumo:
Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.