980 resultados para Information Mining


Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The road to electric rope shovel automation is marked with technological innovations that include an increase in operational information available to mining operations. The CRCMining Shovel Operator Information System not only collects machine operational data but also provides the operator with knowledge-of-performance and influences his/her performance to achieve higher productivity with reduced machine duty. The operator’s behaviour is one of the most important aspects of the man-machine interaction to be considered before semi- or fully-automated shovel systems can be realised. This paper presents the results of the rope shovel studies conducted by CRCMining between 2002 and 2004, provides information on current research to improve shovel performance and briefly discusses the implications of human-system interactions on future designs of autonomous machines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A study of information available on the settlement characteristics of backfill in restored opencast coal mining sites and other similar earthworks projects has been undertaken. In addition, the methods of opencast mining, compaction controls, monitoring and test methods have been reviewed. To consider and develop the methods of predicting the settlement of fill, three sites in the West Midlands have been examined; at each, the backfill had been placed in a controlled manner. In addition, use has been made of a finite element computer program to compare a simple two-dimensional linear elastic analysis with field observations of surface settlements in the vicinity of buried highwalls. On controlled backfill sites, settlement predictions have been accurately made, based on a linear relationship between settlement (expressed as a percentage of fill height) against logarithm of time. This `creep' settlement was found to be effectively complete within 18 months of restoration. A decrease of this percentage settlement was observed with increasing fill thickness; this is believed to be related to the speed with which the backfill is placed. A rising water table within the backfill is indicated to cause additional gradual settlement. A prediction method, based on settlement monitoring, has been developed and used to determine the pattern of settlement across highwalls and buried highwalls. The zone of appreciable differential settlement was found to be mainly limited to the highwall area, the magnitude was dictated by the highwall inclination. With a backfill cover of about 15 metres over a buried highwall the magnitude of differential settlement was negligible. Use has been made of the proposed settlement prediction method and monitoring to control the re-development of restored opencase sites. The specifications, tests and monitoring techniques developed in recent years have been used to aid this. Such techniques have been valuable in restoring land previously derelict due to past underground mining.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.