991 resultados para Frequent itemset


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data perturbation is a popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing work: firstly, the concept of independent granule is introduced, and granule inference is used to distinguish between non-independent itemsets and independent itemsets. We further prove that the support counts of non-independent itemsets can be directly derived from subitemsets, so that the error-prone reconstruction process can be avoided. This could improve the efficiency of the algorithm, and bring more accurate results; secondly, through the granular-bitmap representation, the support counts can be calculated in an efficient way. The empirical results on representative synthetic and real-world databases indicate that the proposed GrC-FIM algorithm outperforms the popular EMASK algorithm in both the efficiency and the support count reconstruction accuracy.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Apriori algorithm’s frequent itemset approach has become the standard approach to discovering association rules. However, the computation requirements of the frequent itemset approach are infeasible for dense data and the approach is unable to discover infrequent associations. OPUS AR is an efficient algorithm for association rule discovery that does not utilize frequent itemsets and hence avoids these problems. It can reduce search time by using additional constraints on the search space as well as constraints on itemset frequency. However, the effectiveness of the pruning rules used during search will determine the efficiency of its search. This paper presents and analyses pruning rules for use with OPUS AR. We demonstrate that application of OPUS AR is feasible for a number of datasets for which application of the frequent itemset approach is infeasible and that the new pruning rules can reduce compute time by more than 40%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In today’s high speed networks it is becoming increasingly challenging for network managers to understand the nature of the traffic that is carried in their network. A major problem for traffic analysis in this context is how to extract a concise yet accurate summary of the relevant aggregate traffic flows that are present in network traces. In this paper, we present two summarization techniques to minimize the size of the traffic flow report that is generated by a hierarchical cluster analysis tool. By analyzing the accuracy and compaction gain of our approach on a standard benchmark dataset, we demonstrate that our approach achieves more accurate summaries than those of an existing tool that is based on frequent itemset mining.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background
AMP-activated protein kinase (AMPK) has emerged as a significant signaling intermediary that regulates metabolisms in response to energy demand and supply. An investigation into the degree of activation and deactivation of AMPK subunits under exercise can provide valuable data for understanding AMPK. In particular, the effect of AMPK on muscle cellular energy status makes this protein a promising pharmacological target for disease treatment. As more AMPK regulation data are accumulated, data mining techniques can play an important role in identifying frequent patterns in the data. Association rule mining, which is commonly used in market basket analysis, can be applied to AMPK regulation.

Results
This paper proposes a framework that can identify the potential correlation, either between the state of isoforms of α, β and γ subunits of AMPK, or between stimulus factors and the state of isoforms. Our approach is to apply item constraints in the closed interpretation to the itemset generation so that a threshold is specified in terms of the amount of results, rather than a fixed threshold value for all itemsets of all sizes. The derived rules from experiments are roughly analyzed. It is found that most of the extracted association rules have biological meaning and some of them were previously unknown. They indicate direction for further research.

Conclusion
Our findings indicate that AMPK has a great impact on most metabolic actions that are related to energy demand and supply. Those actions are adjusted via its subunit isoforms under specific physical training. Thus, there are strong co-relationships between AMPK subunit isoforms and exercises. Furthermore, the subunit isoforms are correlated with each other in some cases. The methods developed here could be used when predicting these essential relationships and enable an understanding of the functions and metabolic pathways regarding AMPK.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we discuss our participation to the INEX 2008 Link-the-Wiki track. We utilized a sliding window based algorithm to extract the frequent terms and phrases. Using the extracted phrases and term as descriptive vectors, the anchors and relevant links (both incoming and outgoing) are recognized efficiently.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Endometrial carcinoma is the most common gynecological malignancy in the United States. Although most women present with early disease confined to the uterus, the majority of persistent or recurrent tumors are refractory to current chemotherapies. We have identified a total of 11 different FGFR2 mutations in 3/10 (30%) of endometrial cell lines and 19/187 (10%) of primary uterine tumors. Mutations were seen primarily in tumors of the endometrioid histologic subtype (18/115 cases investigated, 16%). The majority of the somatic mutations identified were identical to germline activating mutations in FGFR2 and FGFR3 that cause Apert Syndrome, Beare-Stevenson Syndrome, hypochondroplasia, achondroplasia and SADDAN syndrome. The two most common somatic mutations identified were S252W (in eight tumors) and N550K (in five samples). Four novel mutations were identified, three of which are also likely to result in receptor gain-of-function. Extensive functional analyses have already been performed on many of these mutations, demonstrating they result in receptor activation through a variety of mechanisms. The discovery of activating FGFR2 mutations in endometrial carcinoma raises the possibility of employing anti-FGFR molecularly targeted therapies in patients with advanced or recurrent endometrial carcinoma.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Using a genome-scanning approach to search for oncogenes, a recent report identifies somatic mutations in the signaling gene BRAF that are particularly prevalent in melanoma.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background & aims The Australasian Nutrition Care Day Survey (ANCDS) ascertained if malnutrition and poor food intake are independent risk factors for health-related outcomes in Australian and New Zealand hospital patients. Methods Phase 1 recorded nutritional status (Subjective Global Assessment) and 24-h food intake (0, 25, 50, 75, 100% intake). Outcomes data (Phase 2) were collected 90-days post-Phase 1 and included length of hospital stay (LOS), readmissions and in-hospital mortality. Results Of 3122 participants (47% females, 65 ± 18 years) from 56 hospitals, 32% were malnourished and 23% consumed ≤ 25% of the offered food. Malnourished patients had greater median LOS (15 days vs. 10 days, p < 0.0001) and readmissions rates (36% vs. 30%, p = 0.001). Median LOS for patients consuming ≤ 25% of the food was higher than those consuming ≤ 50% (13 vs. 11 days, p < 0.0001). The odds of 90-day in-hospital mortality were twice greater for malnourished patients (CI: 1.09–3.34, p = 0.023) and those consuming ≤ 25% of the offered food (CI: 1.13–3.51, p = 0.017), respectively. Conclusion The ANCDS establishes that malnutrition and poor food intake are independently associated with in-hospital mortality in the Australian and New Zealand acute care setting.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

What do we know? • Customer Experience is increasingly becoming the new standard for differentiation in both offline and online retailing, and offers a sustainable competitive advantage. o The economic value of a company’s offering has been observed to increase when the customer has a fulfilling shopping experience (Pine & Gilmore, 1998) o Crafting engaging and customer experience is a known method of generating loyalty, advocacy and word of mouth (Tynan & McKechnie, 2009). o A good experience can entice consumers to shop for longer and spend more (Kim, 2001). • The customer’s experience is made up of diverse elements occurring before, during and after the purchase itself. (Discussed further on page 5). It is cumulative over time and can be influenced by touch points across multiple channels. What remains unclear? • How do Coles customers respond to the elements of online customer experience? • How does the online customer experience differ for frequent and infrequent purchasers? • Do differences between genders and age cohorts for online customer experience exist?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.