7 resultados para Information Mining

em Aston University Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. Most existing systems concentrate either on mining algorithms or on visualization techniques. Though visual methods developed in information visualization have been helpful, for improved understanding of a complex large high-dimensional dataset, there is a need for an effective projection of such a dataset onto a lower-dimension (2D or 3D) manifold. This paper introduces a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualization domain. The framework follows Shneiderman’s mantra to provide an effective user interface. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection methods, such as Generative Topographic Mapping (GTM) and Hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, billboarding, and user interaction facilities, to provide an integrated visual data mining framework. Results on a real life high-dimensional dataset from the chemoinformatics domain are also reported and discussed. Projection results of GTM are analytically compared with the projection results from other traditional projection methods, and it is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We introduce a flexible visual data mining framework which combines advanced projection algorithms from the machine learning domain and visual techniques developed in the information visualization domain. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection algorithms, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates and billboarding, to provide a visual data mining framework. Results on a real-life chemoinformatics dataset using GTM are promising and have been analytically compared with the results from the traditional projection methods. It is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework. Copyright 2006 ACM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization formore effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graphbased method to iteratively update user- and product-related distributions more reliably in a heterogeneous user-product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JINGDONG, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A study of information available on the settlement characteristics of backfill in restored opencast coal mining sites and other similar earthworks projects has been undertaken. In addition, the methods of opencast mining, compaction controls, monitoring and test methods have been reviewed. To consider and develop the methods of predicting the settlement of fill, three sites in the West Midlands have been examined; at each, the backfill had been placed in a controlled manner. In addition, use has been made of a finite element computer program to compare a simple two-dimensional linear elastic analysis with field observations of surface settlements in the vicinity of buried highwalls. On controlled backfill sites, settlement predictions have been accurately made, based on a linear relationship between settlement (expressed as a percentage of fill height) against logarithm of time. This `creep' settlement was found to be effectively complete within 18 months of restoration. A decrease of this percentage settlement was observed with increasing fill thickness; this is believed to be related to the speed with which the backfill is placed. A rising water table within the backfill is indicated to cause additional gradual settlement. A prediction method, based on settlement monitoring, has been developed and used to determine the pattern of settlement across highwalls and buried highwalls. The zone of appreciable differential settlement was found to be mainly limited to the highwall area, the magnitude was dictated by the highwall inclination. With a backfill cover of about 15 metres over a buried highwall the magnitude of differential settlement was negligible. Use has been made of the proposed settlement prediction method and monitoring to control the re-development of restored opencase sites. The specifications, tests and monitoring techniques developed in recent years have been used to aid this. Such techniques have been valuable in restoring land previously derelict due to past underground mining.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Case-based Reasoning's (CBR) origins were stimulated by a desire to understand how people remember information and are in turn reminded of information, and that subsequently it was recognized that people commonly solve problems by remembering how they solved similar problems in the past. Thus CBR became an appropriate way to find out the most suitable solution method for a new problem based on the old methods for the same or even similar problems. The research highlights how to use CBR to aid biologists in finding the best method to cryo preserve algae. The study found CBR could be used successfully to find the similarity percentage between the new algae and old cases in the case base. The prediction result showed approximately 93.75% accuracy, which proves the CBR system can offer appropriate recommendations for most situations. © 2011 IEEE.