980 resultados para Information Mining


30.00% 30.00%



There is growing interest in the ways in which the location of a person can be utilized by new applications and services. Recent advances in mobile technologies have meant that the technical capability to record and transmit location data for processing is appearing in off-the-shelf handsets. This opens possibilities to profile people based on the places they visit, people they associate with, or other aspects of their complex routines determined through persistent tracking. It is possible that services offering customized information based on the results of such behavioral profiling could become commonplace. However, it may not be immediately apparent to the user that a wealth of information about them, potentially unrelated to the service, can be revealed. Further issues occur if the user agreed, while subscribing to the service, for data to be passed to third parties where it may be used to their detriment. Here, we report in detail on a short case study tracking four people, in three European member states, persistently for six weeks using mobile handsets. The GPS locations of these people have been mined to reveal places of interest and to create simple profiles. The information drawn from the profiling activity ranges from intuitive through special cases to insightful. In this paper, these results and further extensions to the technology are considered in light of European legislation to assess the privacy implications of this emerging technology.


30.00% 30.00%



In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.


30.00% 30.00%



In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.


30.00% 30.00%



This article clarifies what was done with the sub-7-man positions in data-mining Harold van der Heijden's 'HHdbIV' database of chess studies prior to its publication. It emphasises that only positions in the main lines of studies were examined and that the information about uniqueness of move was not incorporated in HHdbIV. There is some reflection on the separate technical and artistic dimensions of study evaluation.


30.00% 30.00%



n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.


30.00% 30.00%



Purpose: This paper explores the extent of site-specific and geographic segmental social, environmental and ethical reporting by mining companies operating in Ghana. We aim to: (i) establish a picture of corporate transparency relating to geographic segmentation of social, environmental and ethical reporting which is specific to operating sites and country of operation, and; (ii) gauge the impact of the introduction of integrated reporting on site-specific social, environmental and ethical reporting. Methodology/Approach: We conducted an interpretive content analysis of the annual/integrated reports of mining companies for the years 2009, 2010 and 2011 in order to extract site-specific social, environmental and ethical information relating to the companies’ mining operations in Ghana. Findings and Implications: We found that site-specific social, environmental and ethical reporting is extremely patchy and inconsistent between the companies’ reports studied. We also found that there was no information relating to certain sites, which were in operation, according to the Ghana Minerals Commission. This could simply be because operations were not in progress. Alternatively it could be that decisions are made concerning which site-specific information is reported according to a certain benchmark. One policy implication arising from this research is that IFRS should require geographic segmental reporting of material social, environmental and ethical information in order to bring IFRS into line with global developments in integrated reporting. Originality: Although there is a wealth of sustainability reporting research and an emergent literature on integrated reporting, there is currently no academic research exploring site-specific social, environmental and ethical reporting


30.00% 30.00%



Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.


30.00% 30.00%



Protein kinases, a family of enzymes, have been viewed as an important signaling intermediary by living organisms for regulating critical biological processes such as memory, hormone response and cell growth. The
unbalanced kinases are known to cause cancer and other diseases. With the increasing efforts to collect, store and disseminate information about the entire kinase family, it not only leads to valuable data set to understand cell regulation but also poses a big challenge to extract valuable knowledge about metabolic pathway from the data. Data mining techniques that have been widely used to find frequent patterns in large datasets can be extended and adapted to kinase data as well. This paper proposes a framework for mining frequent itemsets from the collected kinase dataset. An experiment using AMPK regulation data demonstrates that our approaches are useful and efficient in analyzing kinase regulation data.


30.00% 30.00%



The development of the Internet has boosted prosperity of the World Wide Web, which is now a huge information source. Because of characteristics of the web, in most cases, traditional databasebased technologies are no longer suitable for web information retrieval and management. To effectively manage web information, it is necessary to reveal intrinsic relationships/structures among web information objects by eliminating noise factors. This paper proposes a mechanism that could be widely used in information processing, including web information processing and noise factor elimination for getting more intrinsic relationships. As an application case of this mechanism, one relevant web page finding algorithm is proposed to uncover intrinsic relationship among web pages from their hyperlink patterns, and find more semantic relevant web pages. The experimental evaluation shows the feasibility and effectiveness of the algorithm and demonstrates the potential of the proposed mechanism in web applications.


30.00% 30.00%



In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.


30.00% 30.00%



Data mining is playing an important role in decision making for business activities and governmental administration. Since many organizations or their divisions do not possess the in-house expertise and infrastructure for data mining, it is beneficial to delegate data mining tasks to external service providers. However, the organizations or divisions may lose of private information during the delegating process. In this paper, we present a Bloom filter based solution to enable organizations or their divisions to delegate the tasks of mining association rules while protecting data privacy. Our approach can achieve high precision in data mining by only trading-off storage requirements, instead of by trading-off the level of privacy preserving.


30.00% 30.00%



Over the past decade, advances in the Internet and media technology have literally brought people closer than ever before. It is interesting to note that traditional sociological definitions of a community have been outmoded, for community has extended far beyond the geographical boundaries that were held by traditional definitions (Wellman & Gulia, 1999). Virtual or online community was defined in such a context to describe various forms of computer-mediated communication (CMC). Although virtual communities do not necessarily arise from the Internet, the overwhelming popularity of the Internet is one of the main reasons that virtual communities receive so much attention (Rheingold, 1999). The beginning of virtual communities is attributed to scientists who exchanged information and cooperatively conduct research during the 1970s. There are four needs of participants in a virtual community: member interest, social interaction, imagination, and transaction (Hagel & Armstrong, 1997). The first two focus more on the information exchange and knowledge discovery; the imagination is for entertainment; and the transaction is for commerce strategy. In this article, we investigate the function of information exchange and knowledge discovery in virtual communities. There are two important inherent properties embedded in virtual communities (Wellman, 2001):


30.00% 30.00%



Findings from informetric research represent an important background resource to add to the mix of information useful for resolving difficult and ongoing problems in specific library environments or information service settings. This paper provides examples of informetric research that can be useful input to decision-making in the field of library management and information service provision. This overview takes four of the challenges that Michael Buckland outlined for library research as a way of guiding the discussion of ways that informetric work can be used to inform library decision-making. (1) References are made to relevant informetric work undertaken or conducted in Australia, by Australian researchers, or with Australian data.

Informetrics includes both quantitative and qualitative methods, which when used in combination can provide a rounded set of findings that has great validity for management, policy and service applications. Quantitative methodologies are generally based on bibliometric techniques, such as mining and analysis of data from various bibliographic and textual databases. Qualitative methods include survey, case study and historical approaches. Used in combination, each set of findings adds richness and other perspectives to an analysis.


30.00% 30.00%



This thesis proposes three effective strategies to solve the significant performance-bias problem in imbalance text mining: (1) creation of a novel inexact field learning algorithm to overcome the dual-imbalance problem; (2) introduction of the one-class classification-framework to optimize classifier-parameters, and (3) proposal of a maximal-frequent-item-set discovery approach to achieve higher accuracy and efficiency.