1000 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective is to measure utility of real-time commercial decision making. It is important due to a higher possibility of mistakes in real-time decisions, problems with recording actual occurrences, and significant costs associated with predictions produced by algorithms. The first contribution is to use overall utility and represent individual utility with a monetary value instead of a prediction. The second is to calculate the benefit from predictions using the utility-based decision threshold. The third is to incorporate cost of predictions. For experiments, overall utility is used to evaluate communal and spike detection, and their adaptive versions. The overall utility results show that with fewer alerts, communal detection is better than spike detection. With more alerts, adaptive communal and spike detection are better than their static versions. To maximise overall utility with all algorithms, only 1% to 4% in the highest predictions should be alerts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research into the prevalence of hospitalisation among childhood asthma cases is undertaken, using a data set local to the Barwon region of Victoria. Participants were the parents/guardians on behalf of children aged between 5-11 years. Various data mining techniques are used, including segmentation, association and classification to assist in predicting and exploring the instances of childhood hospitalisation due to asthma. Results from this study indicate that children in inner city and metropolitan areas may overutilise emergency department services. In addition, this study found that the prediction of hospitalisaion for asthma in children was greater for those with a written asthma management plan.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The VO2-power regression and estimated total energy demand for a 6-minute supramaximal exercise test was predicted from a continuous incremental exercise test. Sub-maximal VO2- power co-ordinates were established from the last 40 seconds(s) of 150-second exercise stages. The precision of the estimated total energy demand was determined using the 95% confidence interval (95% CI) of the estimated total energy demand. The linearity of the individual VO2-power regression equations was determined using Pearson's correlation coefficient. The mean 95% CI of the estimated total energy demand was 5.9±2.5 mL O2 Eq•-1kg•min-1, and the mean correlation coefficient was 0.9942±0.0042. The current study contends that the sub-maximal VO2-power co-ordinates from a continuous incremental exercise test can be used to estimate supra-maximal energy demand without compromising the precision of the accumulated oxygen deficit (AOD) method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we investigate an approach to eliciting practitioners’ problem-solving experience across an application domain. The approach is based on a well-known ‘pattern mining’ process which commonly results in a collection of sharable and reusable ‘design patterns’. While pattern mining has been recognised to work effectively in numerous domains, its main problem is the degree of technical proficiency that few domain practitioners are prepared to master. In our approach to pattern mining, patterns are induced indirectly from designers’ experience, as determined by analysing their past projects, the problems encountered and solutions applied in problem rectification. Through the cycles of hermeneutic revisions, the pattern mining process has been refined and ultimately its deficiencies addressed. The hermeneutic method used in the study has been clearly shown in the paper and illustrated with examples drawn from the multimedia domain. The resulting approach to experience elicitation provided opportunities for active participation of multimedia practitioners in capturing and sharing their design experience.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis proposes three effective strategies to solve the significant performance-bias problem in imbalance text mining: (1) creation of a novel inexact field learning algorithm to overcome the dual-imbalance problem; (2) introduction of the one-class classification-framework to optimize classifier-parameters, and (3) proposal of a maximal-frequent-item-set discovery approach to achieve higher accuracy and efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coverage is the range that covers only positive samples in attribute (or feature) space. Finding coverage is the kernel problem in induction algorithms because of the fact that coverage can be used as rules to describe positive samples. To reflect the characteristic of training samples, it is desirable that the large coverage that cover more positive samples. However, it is difficult to find large coverage, because the attribute space is usually very high dimensionality. Many heuristic methods such as ID3, AQ and CN2 have been proposed to find large coverage. A robust algorithm also has been proposed to find the largest coverage, but the complexities of time and space are costly when the dimensionality becomes high. To overcome this drawback, this paper proposes an algorithm that adopts incremental feature combinations to effectively find the largest coverage. In this algorithm, the irrelevant coverage can be pruned away at early stages because potentially large coverage can be found earlier. Experiments show that the space and time needed to find the largest coverage has been significantly reduced.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis has played a key role in data understanding. When such an important data mining task is extended to the context of data streams, it becomes more challenging since the data arrive at a mining system in one-pass manner. The problem is even more difficult when the clustering task is considered in a sliding window model which requiring the elimination of outdated data must be dealt with properly. We propose SWEM algorithm that exploits the Expectation Maximization technique to address these challenges. SWEM is not only able to process the stream in an incremental manner, but also capable to adapt to changes happened in the underlying stream distribution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis has played a key role in data stream understanding. The problem is difficult when the clustering task is considered in a sliding window model in which the requirement of outdated data elimination must be dealt with properly. We propose SWEM algorithm that is designed based on the Expectation Maximization technique to address these challenges. Equipped in SWEM is the capability to compute clusters incrementally using a small number of statistics summarized over the stream and the capability to adapt to the stream distribution’s changes. The feasibility of SWEM has been verified via a number of experiments and we show that it is superior than Clustream algorithm, for both synthetic and real datasets.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data perturbation is a popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing work: firstly, the concept of independent granule is introduced, and granule inference is used to distinguish between non-independent itemsets and independent itemsets. We further prove that the support counts of non-independent itemsets can be directly derived from subitemsets, so that the error-prone reconstruction process can be avoided. This could improve the efficiency of the algorithm, and bring more accurate results; secondly, through the granular-bitmap representation, the support counts can be calculated in an efficient way. The empirical results on representative synthetic and real-world databases indicate that the proposed GrC-FIM algorithm outperforms the popular EMASK algorithm in both the efficiency and the support count reconstruction accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes to apply multiagent based data mining technologies to biological data analysis. The rationale is justified from multiple perspectives with an emphasis on biological context. Followed by that, an initial multiagent based bio-data mining framework is presented. Based on the framework, we developed a prototype system to demonstrate how it helps the biologists to perform a comprehensive mining task for answering biological questions. The system offers a new way to reuse biological datasets and available data mining algorithms with ease.