151 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, evaluating the influence of nodes and finding top-k influential nodes in social networks, has drawn a wide attention and has become a hot-pot research issue. Considering the characteristics of social networks, we present a novel mechanism to mine the top-k influential nodes in mobile social networks. The proposed mechanism is based on the behaviors analysis of SMS/MMS (simple messaging service / multimedia messaging service) communication between mobile users. We introduce the complex network theory to build a social relation graph, which is used to reveal the relationship among people's social contacts and messages sending. Moreover, intimacy degree is also introduced to characterize social frequency among nodes. Election mechanism is hired to find the most influential node, and then a heap sorting algorithm is used to sort the voting results to find the k most influential nodes. The experimental results show that the mechanism can finds out the most influential top-k nodes efficiently and effectively. © 2013 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The thesis has studied a number of critical problems in data mining for customer behavior analysis and has proposed novel techniques for better modeling of the customers’ decision making process, more efficient analysis of their travel behavior, and more effective identification of their emerging preference.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines the relationship between the output levels in the mining sector and various non-mining sectors in an attempt to understand the role of the mining sector in Australia. The unobserved components time series model is used to estimate the effects of the output gap and the growth regime in the mining sector on the output level of each of several non-mining sectors. Overall, the estimates obtained do not suggest an overwhelmingly positive effect running from the mining sector to other production and services sectors, implying that the trickle-down effect of the mining boom may be a myth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although agricultural productivity is critical for economic development very little is known about the causes of the large dispersion in agricultural productivity across the world. Microeconomic studies increasingly stress the lack of land rights in many poor countries as an important source of low productivity. This paper examines the role played by land titles in explaining differences in agricultural productivity for 93 countries. Using the per capita accumulated value of gold and silver production in the 16th and 17th centuries as instruments for land rights it is shown that enforcement of land titles is a significant source of agricultural productivity inequality across the world.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Association Rule (AR) is a common knowledge model in data mining that describes an implicative cooccurring relationship between two disjoint sets of binary-valued transaction database attributes (items), expressed in the form of an "antecedent⇒ consequent" rule. A variant of the AR is the Weighted Association Rule (WAR). With regard to a marketing context, this paper introduces a new knowledge model in data mining -ALlocating Pattern (ALP). An ALP is a special form of WAR, where each rule item is associated with a weighting score between 0 and 1, and the sum of all rule item scores is 1. It can not only indicate the implicative co-occurring relationship between two (disjoint) sets of items in a weighted setting, but also inform the "allocating" relationship among rule items. ALPs can be demonstrated to be applicable in marketing and possibly a surprising variety of other areas. We further propose an Apriori based algorithm to extract hidden and interesting ALPs from a "one-sum" weighted transaction database. The experimental results show the effectiveness of the proposed algorithm. © 2008 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hotel managers continue to find ways to understand traveler preferences, with the aim of improving their strategic planning, marketing, and product development. Traveler preference is unpredictable for example, hotel guests used to prefer having a telephone in the room, but now favor fast Internet connection. Changes in preference influence the performance of hotel businesses, thus creating the need to identify and address the demands of their guests. Most existing studies focus on current demand attributes and not on emerging ones. Thus, hotel managers may find it difficult to make appropriate decisions in response to changes in travelers' concerns. To address these challenges, this paper adopts Emerging Pattern Mining technique to identify emergent hotel features of interest to international travelers. Data are derived from 118,000 records of online reviews. The methods and findings can help hotel managers gain insights into travelers' interests, enabling the former to gain a better understanding of the rapid changes in tourist preferences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cancer remains a major challenge in modern medicine. Increasing prevalence of cancer, particularly in developing countries, demands better understanding of the effectiveness and adverse consequences of different cancer treatment regimes in real patient population. Current understanding of cancer treatment toxicities is often derived from either “clean” patient cohorts or coarse population statistics. It is difficult to get up-to-date and local assessment of treatment toxicities for specific cancer centres. In this paper, we applied an Apriori-based method for discovering toxicity progression patterns in the form of temporal association rules. Our experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the pairwise association analysis. Our method is applicable for most cancer centres with even rudimentary electronic medical records and has the potential to provide real-time surveillance and quality assurance in cancer care.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mobile Health (mHealth) is now emerging with Internet of Things (IoT), Cloud and big data along with the prevalence of smart wearable devices and sensors. There is also the emergence of smart environments such as smart homes, cars, highways, cities, factories and grids. Presently, it is difficult to quickly forecast or prevent urgent health situations in real-time as health data are analyzed offline by a physician. Sensors are expected to be overloaded by demands of providing health data from IoT networks and smart environments. This paper proposes to resolve the problems by introducing an inference system so that life-threatening situations can be prevented in advance based on a short and long term health status prediction. This prediction is inferred from personal health information that is built by big data in Cloud. The inference system can also resolve the problem of data overload in sensor nodes by reducing data volume and frequency to reduce workload in sensor nodes. This paper presents a novel idea of tracking down and predicting a personal health status as well as intelligent functionality of inference in sensor nodes to interface IoT networks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The low accuracy rates of textshape dividers for digital ink diagrams are hindering their use in real world applications. While recognition of handwriting is well advanced and there have been many recognition approaches proposed for hand drawn sketches, there has been less attention on the division of text and drawing ink. Feature based recognition is a common approach for textshape division. However, the choice of features and algorithms are critical to the success of the recognition. We propose the use of data mining techniques to build more accurate textshape dividers. A comparative study is used to systematically identify the algorithms best suited for the specific problem. We have generated dividers using data mining with diagrams from three domains and a comprehensive ink feature library. The extensive evaluation on diagrams from six different domains has shown that our resulting dividers, using LADTree and LogitBoost, are significantly more accurate than three existing dividers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conflicts between resources in stockyards cause mining companies millions of dollars a year. An effective planning strategy needs to be established in order to reduce these operational conflicts. In this research a stockyard simulation model of a mining operation is proposed. The simulation uses discrete event and continuous strategies to create a high detail level of visualization and animation that closely resemble actual stockyard operation. The proposed simulation model is tightly integrated with a stockpile planner and it is used to evaluate the feasibility of a given production plan. The high detail visualization of the simulation model allows planner to determine the source of conflict, which can be used to guide the elimination of these conflicts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An accurate estimation of pressure drop due to vehicles inside an urban tunnel plays a pivotal role in tunnel ventilation issue. The main aim of the present study is to utilize computational intelligence technique for predicting pressure drop due to cars in traffic congestion in urban tunnels. A supervised feed forward back propagation neural network is utilized to estimate this pressure drop. The performance of the proposed network structure is examined on the dataset achieved from Computational Fluid Dynamic (CFD) simulation. The input data includes 2 variables, tunnel velocity and tunnel length, which are to be imported to the corresponding algorithm in order to predict presure drop. 10-fold Cross validation technique is utilized for three data mining methods, namely: multi-layer perceptron algorithm, support vector machine regression, and linear regression. A comparison is to be made to show the most accurate results. Simulation results illustrate that the Multi-layer perceptron algorithm is able to accurately estimate the pressure drop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a framework for motion capture and musculoskeletal analysis of underground mining procedures. The framework discusses suitable motion capture solutions, musculoskeletal modelling and best practices. Preliminary analysis was conducted to assess quantitative musculoskeletal risks of rod handling and fitting with the drilling rig. The preliminary results of the analysis provide recommendations to minimise risks of potential muscular injuries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As people have unique tastes, the way to satisfy a small group of targeted customers or to be generic to meet most people's preference has been a traditional question to many fashion designers and website developers. This study examined the relationship between individuals' personality differences and their web design preferences. Each individual's personality is represented by a combination of five traits, and 15 website design-related features are considered to test the users' preference. We introduced a data mining technique called targeted positive and negative association rule mining to analyze a dataset containing the survey results collected from undergraduate students. The results of this study not only suggest the importance of providing specific designs to attract individual customers, but also provide valuable input on the Big Five personality traits in their entirety.