151 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The appearance of patterns could be found in different modalities of a domain, where the different modalities refer to the data sources that constitute different aspects of a domain. Particularly, the domain of our discussion refers to crime and the different modalities refer to the different data sources such as offender data, weapon data, etc. in crime domain. In addition, patterns also exist in different levels of granularity for each modality. In order to have a thorough understanding a domain, it is important to reveal the hidden patterns through the data explorations at different levels of granularity and for each modality. Therefore, this paper presents a new model for identifying patterns that exist in different levels of granularity for different modes of crime data. A hierarchical clustering approach - growing self organising maps (GSOM) has been deployed. Furthermore, the model is enhanced with experiments that exhibit the significance of exploring data at different granularities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we discuss a special case of knowledge creation via pattern mining that was studied using a hermeneutic approach. The reported study explores the nature of knowledge creation by domain practitioners who do not communicate directly. The focus of this paper extends the traditional view of a knowledge creation process beyond organisational boundaries. The proposed knowledge creation framework explains the facilitated process of knowledge creation by its qualification, combination, socialisation, externalisation, internalisation and introspection, thus allowing the transformation of individual experience and knowledge into formalised shareable domain knowledge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The identification of RNA secondary structures has been among the most exciting recent developments in biology and medical science. It has been recognized that there is an abundance of functional structures with frameshifting, regulation of translation, and splicing functions. However, the inherent signal for secondary structures is weak and generally not straightforward due to complex interleaving substrings. This makes it difficult to explore their potential functions from various structure data. Our approach, based on a collection of predicted RNA secondary structures, allows us to efficiently capture interesting characteristic relations in RNA and bring out the top-ranked rules for specified association groups. Our results not only point to a number of interesting associations and include a brief biological interpretation to them. It assists biologists in sorting out the most significant characteristic structure patterns and predicting structurefunction relationships in RNA.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Theme development evolution analysis of literature is a significant tool to help the scientific scholars find and study the frontier problems more efficiently. This paper designs and develops a visual mining system for theme development evolution analysis to deal with the large number of literature information. The analysis of related themes based on sub-themes, together with the dynamic threshold strategy are adopted for improving the accuracy of system. Experiments results prove that correlations of themes obtained from the system are accurate and achieve better practical effect in comparison with that of our early work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Education is a complex systematic engineering, which is the guarantee of training high-quality talent, helping society make full use of educational outcomes and promote the healthy development of education. In the education, the students' score is a very important quantitative evaluation indicator, which can objectively reflect the effects of educational system and is an important basis to make lots of scientific decisions. This paper uses clustering algorithm and decision tree to comprehensively analyze the students' score, and obtains useful results. It can be observed that the results are valuable for the teaching and management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis examined the application of data mining techniques to the issue of predicting pilling propensity of wool knitwear. Using real industrial data, a pilling propensity prediction tool with embedded trained support vector machines is developed to provide high accuracy prediction to wool knitwear even before the yarn is spun!

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Much has been written and researched about transformational change and the exogenous events that result in radical institutional transformation. This paper examines institutions as building blocks of social order comprising power and politics and shared understanding to bring about change. Thelen and Mahoney (2010) go beyond a general model of change that describes the collapse of one set of institutional norms to be replaced by another. The model of change proposed takes into account both exogenous as well as endogenous factors as being the source of institutional change. They go on to state that a view of transformation change as being a result of abrupt, wholesale breakdown needs to be rethought to include incremental, endogenous shifts in thinking that can often result in fundamental transformations. This paper gives consideration to these issues to propose the Australian Higher Education sector as a unique sample in which to investigate this type of change.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports on the preparation and management processes of inconsistent data on damage on residential houses in Victoria, Australia. There are no existing specific and fully relevant databases readily available except for the incomplete paper-based and electronic-based reports. Therefore, the extracting of information from the reports is complicated and time consuming in order to extract and include all the necessary information needed for analysis of damage on residential houses founded on expansive soils. Data mining is adopted to develop a database. Statistical methods and Artificial Intelligence methods are used to quantify the quality of data. The paper concludes that the development of such database could enable BHC to evaluate the usefulness of the reports prepared on the reported damage properties for further analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose – The purpose of this study is to examine the exposures of Australian gold mining firms in the highly volatile period from 1995 to 2000. This period has been characterized by significant changes in gold price due to bulk sale of gold by collective central banks. Specifically, the paper aims to investigate several firm-specific factors that are hypothesized to carry substantial influence on gold beta.

Design/methodology/approach – To estimate gold beta, we use the following multifactor model: Rg,t = a+ßgGPRt + ßxFXRt + ßmRm,t + Et , where Rg,t is the return on the gold stock Index at time t, GPRt is the gold price return denominated in US dollar at time t, FXRt is the foreign exchange return of Australian dollar in terms of US dollar at time t, Rm,t is the market return at time t, and Et is the random error term at time t.

Findings – The paper finds that the values of gold beta are consistently greater than one, implying the sensitive nature of firms’ stock returns to gold price changes. This also suggests that investors holding gold mining stock would receive higher percentage increases in stock returns from a percentage increase in gold price returns, as opposed to investors holding gold bullion. Furthermore, these values have changed substantially over time with significant changes in gold price volatility. The most important and consistent relationship that we find is the impact of firms’ hedging behavior on their respective gold betas. This is consistent with Tufano’s study. It implies that firms, which hedge a greater proportion of their gold reserves, are less sensitive to movements in gold prices. The finding therefore supports the risk management theory that hedging increases shareholder’s wealth. However, cash operating costs, cash reserves and the level of gold production seem to influence very little on the firms’ exposure to gold price changes.

Originality/value – This study is of interest and important to the stock mining companies and investors because the extent of the effect of gold price movements on the stock returns of gold mining companies has significant impacts on returns for both firms and investors especially in their risk management and investment decisions, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Researchers have been endeavoring to discover concise sets of episode rules instead of complete sets in sequences. Existing approaches, however, are not able to process complex sequences and can not guarantee the accuracy of resulting sets due to the violation of anti-monotonicity of the frequency metric. In some real applications, episode rules need to be extracted from complex sequences in which multiple items may appear in a time slot. This paper investigates the discovery of concise episode rules in complex sequences. We define a concise representation called non-derivable episode rules and formularize the mining problem. Adopting a novel anti-monotonic frequency metric, we then develop a fast approach to discover non-derivable episode rules in complex sequences. Experimental results demonstrate that the utility of the proposed approach substantially reduces the number of rules and achieves fast processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Class imbalance in textual data is one important factor that affects the reliability of text mining. For imbalanced textual data, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This chapter aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one class classifiers achieve more balanced performance than the standard approaches.