98 resultados para data Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Price promotions (also called discount promotions), i.e. short-term temporary price reductions for selected items (Hermann 1989), are frequently used in sales promotions. The main objective of price promotions is to boost sales and increase profits. Quantitative evaluation of the effects of price promotions (QEEPP) is essential and important for sales managers to analyse historical price promotions and informative for devising more effective promotional strategies in the future. However, most previous studies only provide insights into the effects of discount promotions from some specific prospectives, and no approaches have been proposed for comprehensive evaluation of the effects of discount promotions. For example, Hinkle [1965] discovered that price promotions in the off-season are more favourable, and the effects of price promotions are stronger for new products. Peckham [1973] found that price promotions have no impact on long-term trend. Blattberg et al. [1978] identified that different segments respond to price promotions in different ways. Rockney [1991] discovered three basic types of effects: effects on discounted items, effects on substitutes and effects on complementary items.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel data mining framework for the exploration and extraction of actionable knowledge from data generated by electricity meters. Although a rich source of information for energy consumption analysis, electricity meters produce a voluminous, fast-paced, transient stream of data that conventional approaches are unable to address entirely. In order to overcome these issues, it is important for a data mining framework to incorporate functionality for interim summarization and incremental analysis using intelligent techniques. The proposed Incremental Summarization and Pattern Characterization (ISPC) framework demonstrates this capability. Stream data is structured in a data warehouse based on key dimensions enabling rapid interim summarization. Independently, the IPCL algorithm incrementally characterizes patterns in stream data and correlates these across time. Eventually, characterized patterns are consolidated with interim summarization to facilitate an overall analysis and prediction of energy consumption trends. Results of experiments conducted using the actual data from electricity meters confirm applicability of the ISPC framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we demonstrate our signature based detector for self-propagating worms. We use a set of worm and benign traffic traces of several endpoints to build benign and worm profiles. These profiles were arranged into separate n-ary trees. We also demonstrate our anomaly detector that was used to deal with tied matches between worm and benign trees. We analyzed the performance of each detector and also with their integration. Results show that our signature based detector can detect very high true positive. Meanwhile, the anomaly detector did not achieve high true positive. Both detectors, when used independently, suffer high false positive. However, when both detectors were integrated they maintained a high detection rate of true positive and minimized the false positive

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The autism spectrum disorder (ASD) is increasingly being recognized as a major public health issue which affects approximately 0.5-0.6% of the population. Promoting the general awareness of the disorder, increasing the engagement with the affected individuals and their carers, and understanding the success of penetration of the current clinical recommendations in the target communities, is crucial in driving research as well as policy. The aim of the present work is to investigate if Twitter, as a highly popular platform for information exchange, can be used as a data-mining source which could aid in the aforementioned challenges. Specifically, using a large data set of harvested tweets, we present a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cancer remains a major challenge in modern medicine. Increasing prevalence of cancer, particularly in developing countries, demands better understanding of the effectiveness and adverse consequences of different cancer treatment regimes in real patient population. Current understanding of cancer treatment toxicities is often derived from either “clean” patient cohorts or coarse population statistics. It is difficult to get up-to-date and local assessment of treatment toxicities for specific cancer centres. In this paper, we applied an Apriori-based method for discovering toxicity progression patterns in the form of temporal association rules. Our experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the pairwise association analysis. Our method is applicable for most cancer centres with even rudimentary electronic medical records and has the potential to provide real-time surveillance and quality assurance in cancer care.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mobile Health (mHealth) is now emerging with Internet of Things (IoT), Cloud and big data along with the prevalence of smart wearable devices and sensors. There is also the emergence of smart environments such as smart homes, cars, highways, cities, factories and grids. Presently, it is difficult to quickly forecast or prevent urgent health situations in real-time as health data are analyzed offline by a physician. Sensors are expected to be overloaded by demands of providing health data from IoT networks and smart environments. This paper proposes to resolve the problems by introducing an inference system so that life-threatening situations can be prevented in advance based on a short and long term health status prediction. This prediction is inferred from personal health information that is built by big data in Cloud. The inference system can also resolve the problem of data overload in sensor nodes by reducing data volume and frequency to reduce workload in sensor nodes. This paper presents a novel idea of tracking down and predicting a personal health status as well as intelligent functionality of inference in sensor nodes to interface IoT networks

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The low accuracy rates of textshape dividers for digital ink diagrams are hindering their use in real world applications. While recognition of handwriting is well advanced and there have been many recognition approaches proposed for hand drawn sketches, there has been less attention on the division of text and drawing ink. Feature based recognition is a common approach for textshape division. However, the choice of features and algorithms are critical to the success of the recognition. We propose the use of data mining techniques to build more accurate textshape dividers. A comparative study is used to systematically identify the algorithms best suited for the specific problem. We have generated dividers using data mining with diagrams from three domains and a comprehensive ink feature library. The extensive evaluation on diagrams from six different domains has shown that our resulting dividers, using LADTree and LogitBoost, are significantly more accurate than three existing dividers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An accurate estimation of pressure drop due to vehicles inside an urban tunnel plays a pivotal role in tunnel ventilation issue. The main aim of the present study is to utilize computational intelligence technique for predicting pressure drop due to cars in traffic congestion in urban tunnels. A supervised feed forward back propagation neural network is utilized to estimate this pressure drop. The performance of the proposed network structure is examined on the dataset achieved from Computational Fluid Dynamic (CFD) simulation. The input data includes 2 variables, tunnel velocity and tunnel length, which are to be imported to the corresponding algorithm in order to predict presure drop. 10-fold Cross validation technique is utilized for three data mining methods, namely: multi-layer perceptron algorithm, support vector machine regression, and linear regression. A comparison is to be made to show the most accurate results. Simulation results illustrate that the Multi-layer perceptron algorithm is able to accurately estimate the pressure drop.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering is applied in wireless sensor networks for increasing energy efficiency. Clustering methods in wireless sensor networks are different from those in traditional data mining systems. This paper proposes a novel clustering algorithm based on Minimal Spanning Tree (MST) and Maximum Energy resource on sensors named MSTME. Also, specified constrains of clustering in wireless sensor networks and several evaluation metrics are given. MSTME performs better than already known clustering methods of Low Energy Adaptive Clustering Hierarchy (LEACH) and Base Station Controlled Dynamic Clustering Protocol (BCDCP) in wireless sensor networks when they are evaluated by these evaluation metrics. Simulation results show MSTME increases energy efficiency and network lifetime compared with LEACH and BCDCP in two-hop and multi-hop networks, respectively. © World Scientific Publishing Company.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The linkage between healthcare service and cloud computing techniques has drawn much attention lately. Up to the present, most works focus on IT system migration and the management of distributed healthcare data rather than taking advantage of information hidden in the data. In this paper, we propose to explore healthcare data via cloud-based healthcare data mining services. Specifically, we propose a cloud-based healthcare data mining framework for healthcare data mining service development. Under such framework, we further develop a cloud-based healthcare data mining service to predict patients future length of stay in hospital.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automated adversarial detection systems can fail when under attack by adversaries. As part of a resilient data stream mining system to reduce the possibility of such failure, adaptive spike detection is attribute ranking and selection without class-labels. The first part of adaptive spike detection requires weighing all attributes for spiky-ness to rank them. The second part involves filtering some attributes with extreme weights to choose the best ones for computing each example’s suspicion score. Within an identity crime detection domain, adaptive spike detection is validated on a few million real credit applications with adversarial activity. The results are F-measure curves on eleven experiments and relative weights discussion on the best experiment. The results reinforce adaptive spike detection’s effectiveness for class-label-free attribute ranking and selection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Most algorithms that focus on discovering frequent patterns from data streams assumed that the machinery is capable of managing all the incoming transactions without any delay; or without the need to drop transactions. However, this assumption is often impractical due to the inherent characteristics of data stream environments. Especially under high load conditions, there is often a shortage of system resources to process the incoming transactions. This causes unwanted latencies that in turn, affects the applicability of the data mining models produced – which often has a small window of opportunity. We propose a load shedding algorithm to address this issue. The algorithm adaptively detects overload situations and drops transactions from data streams using a probabilistic model. We tested our algorithm on both synthetic and real-life datasets to verify the feasibility of our algorithm.