89 resultados para Data mining and knowledge discovery


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There has been a clear lack of common data exchange semantics for inter-organisational workflow management systems where the research has mainly focused on technical issues rather than language constructs. This paper presents the neutral data exchanges semantics required for the workflow integration within the AXAEDIS framework and presents the mechanism for object discovery from the object repository where little or no knowledge about the object is available. The paper also presents workflow independent integration architecture with the AXAEDIS Framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a proposed new approach to the Computer Network Security Intrusion Detection Systems (NIDS) application domain knowledge processing focused on a topic map technology-enabled representation of features of the threat pattern space as well as the knowledge of situated efficacy of alternative candidate algorithms for pattern recognition within the NIDS domain. Thus an integrative knowledge representation framework for virtualisation, data intelligence and learning loop architecting in the NIDS domain is described together with specific aspects of its deployment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Guest Editorial

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The “butterfly effect” is a popularly known paradigm; commonly it is said that when a butterfly flaps its wings in Brazil, it may cause a tornado in Texas. This essentially describes how weather forecasts can be extremely senstive to small changes in the given atmospheric data, or initial conditions, used in computer model simulations. In 1961 Edward Lorenz found, when running a weather model, that small changes in the initial conditions given to the model can, over time, lead to entriely different forecasts (Lorenz, 1963). This discovery highlights one of the major challenges in modern weather forecasting; that is to provide the computer model with the most accurately specified initial conditions possible. A process known as data assimilation seeks to minimize the errors in the given initial conditions and was, in 1911, described by Bjerkness as “the ultimate problem in meteorology” (Bjerkness, 1911).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports the findings of a small-scale research project, which investigated the levels of awareness and knowledge of written standard English of 10- and 11-year-old children in two English primary schools over a six-year period, coinciding with the implementation in the schools of the National Literacy Strategy (NLS). A questionnaire was used to provide quantitative and qualitative data relating to: features of writing which were recognised as standard or non-standard; children's understanding of technical terminology; variations between boys' and girls' performance; and the impact of the NLS over time. The findings reveal variations in levels of recognition of different non-standard features, differences between girls' and boys' recognition, possible examples of language change, but no evidence of a positive impact of the NLS. The implications of these findings are discussed both in terms of changes in educational standards and changes to standard English.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijden’s ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbase’s BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are a number of challenges associated with managing knowledge and information in construction organizations delivering major capital assets. These include the ever-increasing volumes of information, losing people because of retirement or competitors, the continuously changing nature of information, lack of methods on eliciting useful knowledge, development of new information technologies and changes in management and innovation practices. Existing tools and methodologies for valuing intangible assets in fields such as engineering, project management and financial, accounting, do not address fully the issues associated with the valuation of information and knowledge. Information is rarely recorded in a way that a document can be valued, when either produced or subsequently retrieved and re-used. In addition there is a wealth of tacit personal knowledge which, if codified into documentary information, may prove to be very valuable to operators of the finished asset or future designers. This paper addresses the problem of information overload and identifies the differences between data, information and knowledge. An exploratory study was conducted with a leading construction consultant examining three perspectives (business, project management and document management) by structured interviews and specifically how to value information in practical terms. Major challenges in information management are identified. An through-life Information Evaluation methodology (IEM) is presented to reduce information overload and to make the information more valuable in the future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper provides an account of the changing livelihood dynamics unfolding in diamond-rich territories of rural Liberia. In these areas, many farm families are using the rice harvested on their plots to attract and feed labourers recruited specifically to mine for diamonds. The monies accrued from the sales of all recovered stones are divided evenly between the family and hired hands, an arrangement which, for thousands of people, has proved to be an effective short-term buffer against poverty. A deepened knowledge of these dynamics could be an important step towards facilitating lasting development in Liberia’s highly-impoverished rural areas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.