999 resultados para contrast mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Streams of short text, such as news titles, enable us to effectively and efficiently learn the real world events that occur anywhere and anytime. Short text messages that are companied by timestamps and generally brief events using only a few words differ from other longer text documents, such as web pages, news stories, blogs, technical papers and books. For example, few words repeat in the same news titles, thus frequency of the term (i.e., TF) is not as important in short text corpus as in longer text corpus. Therefore, analysis of short text faces new challenges. Also, detecting and tracking events through short text analysis need to reliably identify events from constant topic clusters; however, existing methods, such as Latent Dirichlet Allocation (LDA), generates different topic results for a corpus at different executions. In this paper, we provide a Finding Topic Clusters using Co-occurring Terms (FTCCT) algorithm to automatically generate topics from a short text corpus, and develop an Event Evolution Mining (EEM) algorithm to discover hot events and their evolutions (i.e., the popularity degrees of events changing over time). In FTCCT, a term (i.e., a single word or a multiple-words phrase) belongs to only one topic in a corpus. Experiments on news titles of 157 countries within 4 months (from July to October, 2013) demonstrate that our FTCCT-based method (combining FTCCT and EEM) achieves far higher quality of the event's content and description words than LDA-based method (combining LDA and EEM) for analysis of streams of short text. Our method also visualizes the evolutions of the hot events. The discovered world-wide event evolutions have explored some interesting correlations of the world-wide events; for example, successive extreme weather phenomenon occur in different locations - typhoon in Hong Kong and Philippines followed hurricane and storm flood in Mexico in September 2013. © 2014 Springer Science+Business Media New York.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The autism spectrum disorder (ASD) is increasingly being recognized as a major public health issue which affects approximately 0.5-0.6% of the population. Promoting the general awareness of the disorder, increasing the engagement with the affected individuals and their carers, and understanding the success of penetration of the current clinical recommendations in the target communities, is crucial in driving research as well as policy. The aim of the present work is to investigate if Twitter, as a highly popular platform for information exchange, can be used as a data-mining source which could aid in the aforementioned challenges. Specifically, using a large data set of harvested tweets, we present a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In big data analysis, frequent itemsets mining plays a key role in mining associations, correlations and causality. Since some traditional frequent itemsets mining algorithms are unable to handle massive small files datasets effectively, such as high memory cost, high I/O overhead, and low computing performance, we propose a novel parallel frequent itemsets mining algorithm based on the FP-Growth algorithm and discuss its applications in this paper. First, we introduce a small files processing strategy for massive small files datasets to compensate defects of low read-write speed and low processing efficiency in Hadoop. Moreover, we use MapReduce to redesign the FP-Growth algorithm for implementing parallel computing, thereby improving the overall performance of frequent itemsets mining. Finally, we apply the proposed algorithm to the association analysis of the data from the national college entrance examination and admission of China. The experimental results show that the proposed algorithm is feasible and valid for a good speedup and a higher mining efficiency, and can meet the actual requirements of frequent itemsets mining for massive small files datasets. © 2014 ISSN 2185-2766.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, evaluating the influence of nodes and finding top-k influential nodes in social networks, has drawn a wide attention and has become a hot-pot research issue. Considering the characteristics of social networks, we present a novel mechanism to mine the top-k influential nodes in mobile social networks. The proposed mechanism is based on the behaviors analysis of SMS/MMS (simple messaging service / multimedia messaging service) communication between mobile users. We introduce the complex network theory to build a social relation graph, which is used to reveal the relationship among people's social contacts and messages sending. Moreover, intimacy degree is also introduced to characterize social frequency among nodes. Election mechanism is hired to find the most influential node, and then a heap sorting algorithm is used to sort the voting results to find the k most influential nodes. The experimental results show that the mechanism can finds out the most influential top-k nodes efficiently and effectively. © 2013 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The thesis has studied a number of critical problems in data mining for customer behavior analysis and has proposed novel techniques for better modeling of the customers’ decision making process, more efficient analysis of their travel behavior, and more effective identification of their emerging preference.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines the relationship between the output levels in the mining sector and various non-mining sectors in an attempt to understand the role of the mining sector in Australia. The unobserved components time series model is used to estimate the effects of the output gap and the growth regime in the mining sector on the output level of each of several non-mining sectors. Overall, the estimates obtained do not suggest an overwhelmingly positive effect running from the mining sector to other production and services sectors, implying that the trickle-down effect of the mining boom may be a myth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although agricultural productivity is critical for economic development very little is known about the causes of the large dispersion in agricultural productivity across the world. Microeconomic studies increasingly stress the lack of land rights in many poor countries as an important source of low productivity. This paper examines the role played by land titles in explaining differences in agricultural productivity for 93 countries. Using the per capita accumulated value of gold and silver production in the 16th and 17th centuries as instruments for land rights it is shown that enforcement of land titles is a significant source of agricultural productivity inequality across the world.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Association Rule (AR) is a common knowledge model in data mining that describes an implicative cooccurring relationship between two disjoint sets of binary-valued transaction database attributes (items), expressed in the form of an "antecedent⇒ consequent" rule. A variant of the AR is the Weighted Association Rule (WAR). With regard to a marketing context, this paper introduces a new knowledge model in data mining -ALlocating Pattern (ALP). An ALP is a special form of WAR, where each rule item is associated with a weighting score between 0 and 1, and the sum of all rule item scores is 1. It can not only indicate the implicative co-occurring relationship between two (disjoint) sets of items in a weighted setting, but also inform the "allocating" relationship among rule items. ALPs can be demonstrated to be applicable in marketing and possibly a surprising variety of other areas. We further propose an Apriori based algorithm to extract hidden and interesting ALPs from a "one-sum" weighted transaction database. The experimental results show the effectiveness of the proposed algorithm. © 2008 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

 This project focuses on the development of zinc doped ferrite nanoparticle based MRI contrast agents with enhanced contrast and site-specific targeting for atherosclerosis diagnosis. The engineered nanocomplexes developed were validated through MRI scans using rat models with potential for multimodal imaging and effective therapy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The koala (Phascolarctos cinereus), one of the world's most iconic faunal species, was recently listed under Australian government legislation as vulnerable in the northern states of Queensland and New South Wales and in the Australian Capital Territory, but not in the southern states of Victoria and South Australia. This review synthesises empirical evidence of regional koala population trends, their conservation outlook, and associated policy challenges. Population declines are common in the northern half of the koala's range, where habitat loss, hotter droughts, disease, dog attacks and vehicle collisions are the major threats. In contrast, some southern populations are locally overabundant and are now subject to managed declines. The koala presents the problem of managing a wide-ranging species that now primarily occurs in human-modified landscapes, some of which are rapidly urbanising or subject to large-scale agricultural and mining developments. Climate change is a major threat to both northern and southern populations. The implementation of policy to conserve remaining koala habitat and restore degraded habitat is critical to the success of koala conservation strategies, but habitat conservation alone will not resolve the issues of koala conservation. There needs to be concerted effort to reduce the incidence of dog attack and road-related mortality, disease prevalence and severity, and take into account new threats of climate change and mining. Many of the complex conservation and policy challenges identified here have broader significance for other species whose population trends, and the nature of the threatening processes, vary from region to region, and through time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hotel managers continue to find ways to understand traveler preferences, with the aim of improving their strategic planning, marketing, and product development. Traveler preference is unpredictable for example, hotel guests used to prefer having a telephone in the room, but now favor fast Internet connection. Changes in preference influence the performance of hotel businesses, thus creating the need to identify and address the demands of their guests. Most existing studies focus on current demand attributes and not on emerging ones. Thus, hotel managers may find it difficult to make appropriate decisions in response to changes in travelers' concerns. To address these challenges, this paper adopts Emerging Pattern Mining technique to identify emergent hotel features of interest to international travelers. Data are derived from 118,000 records of online reviews. The methods and findings can help hotel managers gain insights into travelers' interests, enabling the former to gain a better understanding of the rapid changes in tourist preferences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cancer remains a major challenge in modern medicine. Increasing prevalence of cancer, particularly in developing countries, demands better understanding of the effectiveness and adverse consequences of different cancer treatment regimes in real patient population. Current understanding of cancer treatment toxicities is often derived from either “clean” patient cohorts or coarse population statistics. It is difficult to get up-to-date and local assessment of treatment toxicities for specific cancer centres. In this paper, we applied an Apriori-based method for discovering toxicity progression patterns in the form of temporal association rules. Our experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the pairwise association analysis. Our method is applicable for most cancer centres with even rudimentary electronic medical records and has the potential to provide real-time surveillance and quality assurance in cancer care.