32 resultados para Information Mining

em Deakin Research Online - Australia


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Recently, literature analysis has become a hot issue in academic studies. In order to quantify the importance of journals and provide researchers with target vehicles for their work, this poster proposes a novel approach based on the social information through considering the potential relationship between journals quality and authors’ affiliation. Based on the formula proposed in this work, the importance of journals can be estimated and ranked.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Whilst atom probe tomography (APT) is a powerful technique with the capacity to gather information containing hundreds of millions of atoms from a single specimen, the ability to effectively use this information creates significant challenges. The main technological bottleneck lies in handling the extremely large amounts of data on spatial-chemical correlations, as well as developing new quantitative computational foundations for image reconstruction that target critical and transformative problems in materials science. The power to explore materials at the atomic scale with the extraordinary level of sensitivity of detection offered by atom probe tomography has not been not fully harnessed due to the challenges of dealing with missing, sparse and often noisy data. Hence there is a profound need to couple the analytical tools to deal with the data challenges with the experimental issues associated with this instrument. In this paper we provide a summary of some key issues associated with the challenges, and solutions to extract or "mine" fundamental materials science information from that data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Protein kinases, a family of enzymes, have been viewed as an important signaling intermediary by living organisms for regulating critical biological processes such as memory, hormone response and cell growth. The
unbalanced kinases are known to cause cancer and other diseases. With the increasing efforts to collect, store and disseminate information about the entire kinase family, it not only leads to valuable data set to understand cell regulation but also poses a big challenge to extract valuable knowledge about metabolic pathway from the data. Data mining techniques that have been widely used to find frequent patterns in large datasets can be extended and adapted to kinase data as well. This paper proposes a framework for mining frequent itemsets from the collected kinase dataset. An experiment using AMPK regulation data demonstrates that our approaches are useful and efficient in analyzing kinase regulation data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of the Internet has boosted prosperity of the World Wide Web, which is now a huge information source. Because of characteristics of the web, in most cases, traditional databasebased technologies are no longer suitable for web information retrieval and management. To effectively manage web information, it is necessary to reveal intrinsic relationships/structures among web information objects by eliminating noise factors. This paper proposes a mechanism that could be widely used in information processing, including web information processing and noise factor elimination for getting more intrinsic relationships. As an application case of this mechanism, one relevant web page finding algorithm is proposed to uncover intrinsic relationship among web pages from their hyperlink patterns, and find more semantic relevant web pages. The experimental evaluation shows the feasibility and effectiveness of the algorithm and demonstrates the potential of the proposed mechanism in web applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining is playing an important role in decision making for business activities and governmental administration. Since many organizations or their divisions do not possess the in-house expertise and infrastructure for data mining, it is beneficial to delegate data mining tasks to external service providers. However, the organizations or divisions may lose of private information during the delegating process. In this paper, we present a Bloom filter based solution to enable organizations or their divisions to delegate the tasks of mining association rules while protecting data privacy. Our approach can achieve high precision in data mining by only trading-off storage requirements, instead of by trading-off the level of privacy preserving.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past decade, advances in the Internet and media technology have literally brought people closer than ever before. It is interesting to note that traditional sociological definitions of a community have been outmoded, for community has extended far beyond the geographical boundaries that were held by traditional definitions (Wellman & Gulia, 1999). Virtual or online community was defined in such a context to describe various forms of computer-mediated communication (CMC). Although virtual communities do not necessarily arise from the Internet, the overwhelming popularity of the Internet is one of the main reasons that virtual communities receive so much attention (Rheingold, 1999). The beginning of virtual communities is attributed to scientists who exchanged information and cooperatively conduct research during the 1970s. There are four needs of participants in a virtual community: member interest, social interaction, imagination, and transaction (Hagel & Armstrong, 1997). The first two focus more on the information exchange and knowledge discovery; the imagination is for entertainment; and the transaction is for commerce strategy. In this article, we investigate the function of information exchange and knowledge discovery in virtual communities. There are two important inherent properties embedded in virtual communities (Wellman, 2001):

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Findings from informetric research represent an important background resource to add to the mix of information useful for resolving difficult and ongoing problems in specific library environments or information service settings. This paper provides examples of informetric research that can be useful input to decision-making in the field of library management and information service provision. This overview takes four of the challenges that Michael Buckland outlined for library research as a way of guiding the discussion of ways that informetric work can be used to inform library decision-making. (1) References are made to relevant informetric work undertaken or conducted in Australia, by Australian researchers, or with Australian data.

Informetrics includes both quantitative and qualitative methods, which when used in combination can provide a rounded set of findings that has great validity for management, policy and service applications. Quantitative methodologies are generally based on bibliometric techniques, such as mining and analysis of data from various bibliographic and textual databases. Qualitative methods include survey, case study and historical approaches. Used in combination, each set of findings adds richness and other perspectives to an analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis proposes three effective strategies to solve the significant performance-bias problem in imbalance text mining: (1) creation of a novel inexact field learning algorithm to overcome the dual-imbalance problem; (2) introduction of the one-class classification-framework to optimize classifier-parameters, and (3) proposal of a maximal-frequent-item-set discovery approach to achieve higher accuracy and efficiency.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Based on the knowledge sharing model by Nonaka (1994), this study examines the relative efficacy of various Information Communication Technologies (ICTs) applications in facilitating sharing of explicit and tacit knowledge among professional accountants in Malaysia. The results of this study indicate that ICTs, generally, facilitate all modes of knowledge sharing. Best-Practice Repositories are effective for sharing of both explicit and tacit knowledge, while internet/e-mail facilities are effective for tacit knowledge sharing. Data warehousing /mining, on the other hand, is effective in facilitating self learning through tacit-to-tacit mode and explicit-to-explicit mode. ICT facilities used mainly for office administration are ineffective for knowledge sharing purpose. The implications of the findings are
discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Selecting a suitable proximity measure is one of the fundamental tasks in clustering. How to effectively utilize all available side information, including the instance level information in the form of pair-wise constraints, and the attribute level information in the form of attribute order preferences, is an essential problem in metric learning. In this paper, we propose a learning framework in which both the pair-wise constraints and the attribute order preferences can be incorporated simultaneously. The theory behind it and the related parameter adjusting technique have been described in details. Experimental results on benchmark data sets demonstrate the effectiveness of proposed method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches.

Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed.

In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis.

The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Theme development evolution analysis of literature is a significant tool to help the scientific scholars find and study the frontier problems more efficiently. This paper designs and develops a visual mining system for theme development evolution analysis to deal with the large number of literature information. The analysis of related themes based on sub-themes, together with the dynamic threshold strategy are adopted for improving the accuracy of system. Experiments results prove that correlations of themes obtained from the system are accurate and achieve better practical effect in comparison with that of our early work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reports on the preparation and management processes of inconsistent data on damage on residential houses in Victoria, Australia. There are no existing specific and fully relevant databases readily available except for the incomplete paper-based and electronic-based reports. Therefore, the extracting of information from the reports is complicated and time consuming in order to extract and include all the necessary information needed for analysis of damage on residential houses founded on expansive soils. Data mining is adopted to develop a database. Statistical methods and Artificial Intelligence methods are used to quantify the quality of data. The paper concludes that the development of such database could enable BHC to evaluate the usefulness of the reports prepared on the reported damage properties for further analysis.