Biblioteca Digital

58 resultados para mining algorithm

em University of Queensland eSpace - Australia

Improvements of IncSpan: Incremental mining of sequential patterns in large database

Relevância:

70.00% 70.00%

Publicador:

Veja mais

Knowledge discovery and data mining in biological databases

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.

Veja mais

Data mining and simulation: a grey relationship demonstration

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.

Veja mais

Improvements in the data partitioning approach for frequent itemsets mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.

Veja mais

Diagnosis of diabetes mellitus: case of different cutoff values: A data mining approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.

Veja mais

Special issue on advances in data mining and its applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).

Veja mais

A framework for electricity price spike analysis with advanced data mining methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.

Veja mais

An improved seeded region growing algorithm

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently Adams and Bischof (1994) proposed a novel region growing algorithm for segmenting intensity images. The inputs to the algorithm are the intensity image and a set of seeds - individual points or connected components - that identify the individual regions to be segmented. The algorithm grows these seed regions until all of the image pixels have been assimilated. Unfortunately the algorithm is inherently dependent on the order of pixel processing. This means, for example, that raster order processing and anti-raster order processing do not, in general, lead to the same tessellation. In this paper we propose an improved seeded region growing algorithm that retains the advantages of the Adams and Bischof algorithm fast execution, robust segmentation, and no tuning parameters - but is pixel order independent. (C) 1997 Elsevier Science B.V.

Veja mais

Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.

Veja mais

A consistent point-searching algorithm for solution interpolation in unstructured meshes consisting of 4-node bilinear quadrilateral elements

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To translate and transfer solution data between two totally different meshes (i.e. mesh 1 and mesh 2), a consistent point-searching algorithm for solution interpolation in unstructured meshes consisting of 4-node bilinear quadrilateral elements is presented in this paper. The proposed algorithm has the following significant advantages: (1) The use of a point-searching strategy allows a point in one mesh to be accurately related to an element (containing this point) in another mesh. Thus, to translate/transfer the solution of any particular point from mesh 2 td mesh 1, only one element in mesh 2 needs to be inversely mapped. This certainly minimizes the number of elements, to which the inverse mapping is applied. In this regard, the present algorithm is very effective and efficient. (2) Analytical solutions to the local co ordinates of any point in a four-node quadrilateral element, which are derived in a rigorous mathematical manner in the context of this paper, make it possible to carry out an inverse mapping process very effectively and efficiently. (3) The use of consistent interpolation enables the interpolated solution to be compatible with an original solution and, therefore guarantees the interpolated solution of extremely high accuracy. After the mathematical formulations of the algorithm are presented, the algorithm is tested and validated through a challenging problem. The related results from the test problem have demonstrated the generality, accuracy, effectiveness, efficiency and robustness of the proposed consistent point-searching algorithm. Copyright (C) 1999 John Wiley & Sons, Ltd.

Veja mais

Trial-of-antibiotic algorithm for the diagnosis of tuberculosis in a district hospital in a developing country with high HIV prevalence

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: To evaluate a diagnostic algorithm for pulmonary tuberculosis based on smear microscopy and objective response to trial of antibiotics. SETTING: Adult medical wards, Hlabisa Hospital, South Africa, 1996-1997. METHODS: Adults with chronic chest symptoms and abnormal chest X-ray had sputum examined for Ziehl-Neelsen stained acid-fast bacilli by light microscopy. Those with negative smears were treated with amoxycillin for 5 days and assessed. Those who had not improved were treated with erythromycin for 5 days and reassessed. Response was compared with mycobacterial culture. RESULTS: Of 280 suspects who completed the diagnostic pathway, 160 (57%) had a positive smear, 46 (17%) responded to amoxycillin, 34 (12%) responded to erythromycin and 40 (14%) were treated as smear-negative tuberculosis. The sensitivity (89%) and specificity (84%) of the full algorithm for culture-positive tuberculosis were high. However, 11 patients (positive predictive value [PPV] 95%) were incorrectly diagnosed with tuberculosis, and 24 cases of tuberculosis (negative predictive value [NPV] 70%) were not identified. NPV improved to 75% when anaemia was included as a predictor. Algorithm performance was independent of human immunodeficiency virus status. CONCLUSION: Sputum smear microscopy plus trial of antibiotic algorithm among a selected group of tuberculosis suspects may increase diagnostic accuracy in district hospitals in developing countries.

Veja mais

Integrating attribute and space characteristics in choropleth display and spatial data mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper develops an interactive approach for exploratory spatial data analysis. Measures of attribute similarity and spatial proximity are combined in a clustering model to support the identification of patterns in spatial information. Relationships between the developed clustering approach, spatial data mining and choropleth display are discussed. Analysis of property crime rates in Brisbane, Australia is presented. A surprising finding in this research is that there are substantial inconsistencies in standard choropleth display options found in two widely used commercial geographical information systems, both in terms of definition and performance. The comparative results demonstrate the usefulness and appeal of the developed approach in a geographical information system environment for exploratory spatial data analysis.

Veja mais

Minimum-order stable recursive filter design via the genetic algorithm approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, the minimum-order stable recursive filter design problem is proposed and investigated. This problem is playing an important role in pipeline implementation sin signal processing. Here, the existence of a high-order stable recursive filter is proved theoretically, in which the upper bound for the highest order of stable filters is given. Then the minimum-order stable linear predictor is obtained via solving an optimization problem. In this paper, the popular genetic algorithm approach is adopted since it is a heuristic probabilistic optimization technique and has been widely used in engineering designs. Finally, an illustrative example is sued to show the effectiveness of the proposed algorithm.

Veja mais

Spatial data mining for enhanced soil map modelling

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.

Veja mais

An equivalent algorithm for simulating thermal effects of magma intrusion problems in porous rocks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An equivalent algorithm is proposed to simulate thermal effects of the magma intrusion in geological systems, which are composed of porous rocks. Based on the physical and mathematical equivalence, the original magma solidification problem with a moving boundary between the rock and intruded magma is transformed into a new problem without the moving boundary but with a physically equivalent heat source. From the analysis of an ideal solidification model, the physically equivalent heat source has been determined in this paper. The major advantage in using the proposed equivalent algorithm is that the fixed finite element mesh with a variable integration time step can be employed to simulate the thermal effect of the intruded magma solidification using the conventional finite element method. The related numerical results have demonstrated the correctness and usefulness of the proposed equivalent algorithm for simulating the thermal effect of the intruded magma solidification in geological systems. (C) 2003 Elsevier B.V. All rights reserved.

Veja mais

58 resultados para mining algorithm

em University of Queensland eSpace - Australia

Filtro por publicador