24 resultados para FP

em Deakin Research Online - Australia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a real application of Web-content mining using an incremental FP-Growth approach. We firstly restructure the semi-structured data retrieved from the web pages of Chinese car market to fit into the local database, and then employ an incremental algorithm to discover the association rules for the identification of car preference. To find more general regularities, a method of attribute-oriented induction is also utilized to find customer’s consumption preferences. Experimental results show some interesting consumption preference patterns that may be beneficial for the government in making policy to encourage and guide car consumption.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite many studies of family presence during resuscitation, no validated tool exploring the attitudes and beliefs of healthcare staff towards family presence has been published. The aim of this paper is to describe the development of a tool to accurately measure the attitudes and beliefs of emergency department staff towards family presence in the deteriorating adult patient, present the results of validity and reliability testing, and present the final validated tool. Twenty-nine items were developed, informed by themes from the literature and unvalidated published tools related to family presence during resuscitation. The tool was piloted on a sample of 68 emergency nursing and medical staff. Content validity and face validity were established using feedback from participants. Reliability was established by unidimensionality, exploratory factor analysis and internal consistency. Sixteen items were deleted from the original tool due to low item-to-total correlations and low communalities. Exploratory factor analysis of the remaining items revealed four factors with acceptable correlation coefficients and appropriate explanation of variance. Cronbach's alpha for each factor was >0.7 indicating a high degree of internal consistency. The four factors were labelled and arranged in a logical order to form the final tool, the Emergency Department Family Presence Survey.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data mining refers to extracting or "mining" knowledge from large amounts of data. It is also called a method of "knowledge presentation" where visualization and knowledge representation techniques are used to present the mined knowledge to the user. Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. The Pattern Decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass makes it more efficient to mine all frequent patterns in a large dataset. This algorithm avoids the costly process of candidate set generation and saves a large amount of counting time to evaluate support with reduced datasets. In this paper, some existing frequent pattern generation algorithms are explored and their comparisons are discussed. The results show that the PD algorithm outperforms an improved version of Apriori named Direct Count of candidates & Prune transactions (DCP) by one order of magnitude and is faster than an improved FP-tree named as Predictive Item Pruning (PIP). Further, PD is also more scalable than both DCP and PIP.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to distinguish between spam and legitimate email messages. Much work has been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In the case of spam detection FP problem is unacceptable sometimes. In this paper, an adaptive spam filtering model has been proposed based on Machine learning (ML) algorithms which will get better accuracy by reducing FP problems. This model consists of individual and combined filtering approach from existing well known ML algorithms. The proposed model considers both individual and collective output and analyzes them by an analyzer. A dynamic feature selection (DFS) technique also proposed in this paper for getting better accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spam is commonly defined as unsolicited email messages and the goal of spam filtering is to differentiate spam from legitimate email. Much work have been done to filter spam from legitimate emails using machine learning algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In this paper, architecture of spam filtering has been proposed based on support vector machine (SVM,) which will get better accuracy by reducing FP problems. In this architecture an innovative technique for feature selection called dynamic feature selection (DFS) has been proposed which is enhanced the overall performance of the architecture with reduction of FP problems. The experimental result shows that the proposed technique gives better performance compare to similar existing techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an innovative email categorization using a serialized multi-stage classification ensembles technique. Many approaches are used in practice for email categorization to control the menace of spam emails in different ways. Content-based email categorization employs filtering techniques using classification algorithms to learn to predict spam e-mails given a corpus of training e-mails. This process achieves a substantial performance with some amount of FP tradeoffs. It has been studied and investigated with different classification algorithms and found that the outputs of the classifiers vary from one classifier to another with same email corpora. In this paper we have proposed a multi-stage classification technique using different popular learning algorithms with an analyser which reduces the FP (false positive) problems substantially and increases classification accuracy compared to similar existing techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a new technique of email classification based on grey list (GL) analysis of user emails. This technique is based on the analysis of output emails of an integrated model which uses multiple classifiers of statistical learning algorithms. The GL is a list of classifier/(s) output which is/are not considered as true positive (TP) and true negative (TN) but in the middle of them. Many works have been done to filter spam from legitimate emails using classification algorithm and substantial performance has been achieved with some amount of false positive (FP) tradeoffs. In the case of spam detection the FP problem is unacceptable, sometimes. The proposed technique will provide a list of output emails, called "grey list (GL)", to the analyser for making decisions about the status of these emails. It has been shown that the performance of our proposed technique for email classification is much better compare to existing systems, in order to reducing FP problems and accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background/Aims
Familial clustering of hepatitis B virus (HBV) infection is related to perinatal transmission, and is the main cause of familial-type hepatocellular carcinoma (HCC). The route of HBV transmission differs between the children and siblings of patients with HCC. This study examined the differences in HBV carrier rates and HCC-related mortality between two generations in HCC families.
Methods
From 1992 to 1997, relatives of individuals with HCC were screened prospectively with ultrasonography, alpha-fetoprotein, liver biochemistry tests and viral markers. Total HCC-related deaths during a 9-year period were compared between the generations of index patients and their children.
Results
The study included a total of 13 676 relatives in two generations. More HCC-related deaths occurred in the index patient generation than in the child generation. Furthermore, children of female index patients had higher rates of liver cancer related mortality than children of male index patients. The same was true when the analysis was limited to male HBV carriers. The prevalence of HBsAg in the offspring of HBsAg positive mothers was 66% in the child generation and 72% in the index patient generation. These high prevalences indicated high maternal HBV replication status.
Conclusions
Perinatal transmission and maternal viral load are important risk factors in hepatocarcinogenesis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we have proposed a spam filtering technique using (2+1)-tier classification approach. The main focus of this paper is to reduce the false positive (FP) rate which is considered as an important research issue in spam filtering. In our approach, firstly the email message will classify using first two tier classifiers and the outputs will appear to the analyzer. The analyzer will check the labeling of the output emails and send to the corresponding mailboxes based on labeling, for the case of identical prediction. If there are any misclassifications occurred by first two tier classifiers then tier-3 classifier will invoked by the analyzer and the tier-3 will take final decision. This technique reduced the analyzing complexity of our previous work. It has also been shown that the proposed technique gives better performance in terms of reducing false positive as well as better accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the last decade, the Internet email has become one of the primary method of communication used by everyone for the exchange of ideas and information. However, in recent years, along with the rapid growth of the Internet and email, there has been a dramatic growth in spam. Classifications algorithms have been successfully used to filter spam, but with a certain amount of false positive trade-offs. This problem is mainly caused by the dynamic nature of spam content, spam delivery strategies, as well as the diversification of the classification algorithms. This paper presents an approach of email classification to overcome the burden of analyzing technique of GL (grey list) analyser as further refinements of our previous multi-classifier based email classification [10]. In this approach, we introduce a “majority voting grey list (MVGL)” analyzing technique with two different variations which will analyze only the product of GL emails. Our empirical evidence proofs the improvements of this approach, in terms of complexity and cost, compared to existing GL analyser. This approach also overcomes the limitation of human interaction of existing analyzing technique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spam or unwanted email is one of the potential issues of Internet security and classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. In this paper we present an effective and efficient spam classification technique using clustering approach to categorize the features. In our clustering technique we use VAT (Visual Assessment and clustering Tendency) approach into our training model to categorize the extracted features and then pass the information into classification engine. We have used WEKA (www.cs.waikato.ac.nz/ml/weka/) interface to classify the data using different classification algorithms, including tree-based classifiers, nearest neighbor algorithms, statistical algorithms and AdaBoosts. Our empirical performance shows that we can achieve detection rate over 97%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k−Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO - Normalized PolyKernel, SMO – PolyKernel, SMO – Puk, and SMO- Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today.