3 resultados para Information Mining

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, data mining has emerged as one of the most dynamic and lively areas in information technology. Although many algorithms and techniques for data mining have been proposed, they either focus on domain independent techniques or on very specific domain problems. A general requirement in bridging the gap between academia and business is to cater to general domain-related issues surrounding real-life applications, such as constraints, organizational factors, domain expert knowledge, domain adaption, and operational knowledge. Unfortunately, these either have not been addressed, or have not been sufficiently addressed, in current data mining research and development.Domain-Driven Data Mining (D3M) aims to develop general principles, methodologies, and techniques for modeling and merging comprehensive domain-related factors and synthesized ubiquitous intelligence surrounding problem domains with the data mining process, and discovering knowledge to support business decision-making. This paper aims to report original, cutting-edge, and state-of-the-art progress in D3M. It covers theoretical and applied contributions aiming to: 1) propose next-generation data mining frameworks and processes for actionable knowledge discovery, 2) investigate effective (automated, human and machine-centered and/or human-machined-co-operated) principles and approaches for acquiring, representing, modelling, and engaging ubiquitous intelligence in real-world data mining, and 3) develop workable and operational systems balancing technical significance and applications concerns, and converting and delivering actionable knowledge into operational applications rules to seamlessly engage application processes and systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Seafloor massive sulfides (SMS) contain commercially viable quantities of high grade ores, making them attractive prospect sites for marine mining. SMS deposits may also contain hydrothermal vent ecosystems populated by high conservation value vent-endemic species. Responsible environmental management of these resources is best achieved by the adoption of a precautionary approach. Part of this precautionary approach involves the Environmental Impact Assessment (EIA) of exploration and exploitative activities at SMS deposits. The VentBase 2012 workshop provided a forum for stakeholders and scientists to discuss issues surrounding SMS exploration and exploitation. This forum recognised the requirement for a primer which would relate concepts underpinning EIA at SMS deposits. The purpose of this primer is to inform policy makers about EIA at SMS deposits in order to aid management decisions. The primer offers a basic introduction to SMS deposits and their associated ecology, and the basic requirements for EIA at SMS deposits; including initial data and information scoping, environmental survey, and ecological risk assessment. © 2013 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.