Biblioteca Digital

797 resultados para Data Mining

The New Software Package for Dynamic Hierarchical Clustering for Circles Types of Shapes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. One of the most accuracy approach based on dynamic modeling of cluster similarity is called Chameleon. In this paper we present a modified hierarchical clustering algorithm that used the main idea of Chameleon and the effectiveness of suggested approach will be demonstrated by the experimental results.

A Framework for Fast Classification Algorithms

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Today, due to globalization of the world the size of data set is increasing, it is necessary to discover the knowledge. The discovery of knowledge can be typically in the form of association rules, classification rules, clustering, discovery of frequent episodes and deviation detection. Fast and accurate classifiers for large databases are an important task in data mining. There is growing evidence that integrating classification and association rules mining, classification approaches based on heuristic, greedy search like decision tree induction. Emerging associative classification algorithms have shown good promises on producing accurate classifiers. In this paper we focus on performance of associative classification and present a parallel model for classifier building. For classifier building some parallel-distributed algorithms have been proposed for decision tree induction but so far no such work has been reported for associative classification.

Software Testing and Documenting Automation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article describes some approaches to problem of testing and documenting automation in information systems with graphical user interface. Combination of data mining methods and theory of finite state machines is used for testing automation. Automated creation of software documentation is based on using metadata in documented system. Metadata is built on graph model. Described approaches improve performance and quality of testing and documenting processes.

Solving a Direct Marketing Problem by Three Types of ARTMAP Neural Networks

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important task for a direct mailing company is to detect potential customers in order to avoid unnecessary and unwanted mailing. This paper describes a non-linear method to predict profiles of potential customers using dARTMAP, ARTMAP-IC, and Fuzzy ARTMAP neural networks. The paper discusses advantages of the proposed approaches over similar techniques based on MLP neural networks.

An improved memory management scheme for large scale graph computing engine GraphChi

Relevância:

60.00% 60.00%

Publicador:

Resumo:

GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.

Instantaneous Database Access

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The biggest threat to any business is a lack of timely and accurate information. Without all the facts, businesses are pressured to make critical decisions and assess risks and opportunities based largely on guesswork, sometimes resulting in financial losses and missed opportunities. The meteoric rise of Databases (DB) appears to confirm the adage that “information is power”, but the stark reality is that information is useless if one has no way to find what one needs to know. It is more accurate perhaps to state that, “the ability to find information is power”. In this paper we show how Instantaneous Database Access System (IDAS) can make a crucial difference by pulling data together and allowing users to summarise information quickly from all areas of a business organisation.

Using Sensitivity as a Method for Ranking the Test Cases Classified by Binary Decision Trees

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Usually, data mining projects that are based on decision trees for classifying test cases will use the probabilities provided by these decision trees for ranking classified test cases. We have a need for a better method for ranking test cases that have already been classified by a binary decision tree because these probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has proposed a new method that could be used to rank the test cases that were classified by a binary decision tree [Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based on the probability estimate, the sensitivity of a particular case or both.

Extreme Situations Prediction by MultidimenSional Heterogeneous Time Series Using Logical Decision Functions

Relevância:

60.00% 60.00%

Publicador:

Resumo:

* The work is supported by RFBR, grant 04-01-00858-a

Technology of Classification of Electronic Documents Based on the Theory of Disturbance of Pseudoinverse Matrices

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Technology of classification of electronic documents based on the theory of disturbance of pseudoinverse matrices was proposed.

Defining Network Activity Patterns Using First Order Temporal Logics

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Part of network management is collecting information about the activities that go on around a distributed system and analyzing it in real time, at a deferred moment, or both. The reason such information may be stored in log files and analyzed later is to data-mine it so that interesting, unusual, or abnormal patterns can be discovered. In this paper we propose defining patterns in network activity logs using a dialect of First Order Temporal Logics (FOTL), called First Order Temporal Logic with Duration Constrains (FOTLDC). This logic is powerful enough to describe most network activity patterns because it can handle both causal and temporal correlations. Existing results for data-mining patterns with similar structure give us the confidence that discovering DFOTL patterns in network activity logs can be done efficiently.

Verbal Dialogue versus Written Dialogue

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Modern technology has moved on and completely changed the way that people can use the telephone or mobile to dialogue with information held on computers. Well developed “written speech analysis” does not work with “verbal speech”. The main purpose of our article is, firstly, to highlights the problems and, secondly, to shows the possible ways to solve these problems.

Multi-agent Systems in the Harvest Prognosis

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper presents a case study of geo-monitoring a region consisting in the capturing and encoding of human expertise into a knowledge-based system. As soon as the maps have been processed, the data patterns are detected using knowledge-based agents for the harvest prognosis.

The Knowledge: Its Presentation and Role in Recognition Systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The concept of knowledge is the central one used when solving the various problems of data mining and pattern recognition in finite spaces of Boolean or multi-valued attributes. A special form of knowledge representation, called implicative regularities, is proposed for applying in two powerful tools of modern logic: the inductive inference and the deductive inference. The first one is used for extracting the knowledge from the data. The second is applied when the knowledge is used for calculation of the goal attribute values. A set of efficient algorithms was developed for that, dealing with Boolean functions and finite predicates represented by logical vectors and matrices.

Pre-adoption market reaction to IFRS 9:a cross-country event-study

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We are the first to examine the market reaction to 13 announcement dates related to IFRS 9 for over 5400 European listed firms. We find an overall positive reaction to the introduction of IFRS 9. The regulation is particularly beneficial to shareholders of firms in countries with weaker rule of law and a smaller divergence between local GAAP and IAS 39. Bootstrap simulations rule out the possibility that sampling error or data mining are driving our findings. Our main findings are also robust to confounding events and the extent of the media coverage for each event. These results suggest that investors perceive the new regulation as shareholder-wealth enhancing and support the view that stronger comparability across accounting standards of European firms is beneficial to international investors and outweighs the costs of poorer firm-specific information.

Defining Interestigness for Association Rules

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Interestingness in Association Rules has been a major topic of research in the past decade. The reason is that the strength of association rules, i.e. its ability to discover ALL patterns given some thresholds on support and confidence, is also its weakness. Indeed, a typical association rules analysis on real data often results in hundreds or thousands of patterns creating a data mining problem of the second order. In other words, it is not straightforward to determine which of those rules are interesting for the end-user. This paper provides an overview of some existing measures of interestingness and we will comment on their properties. In general, interestingness measures can be divided into objective and subjective measures. Objective measures tend to express interestingness by means of statistical or mathematical criteria, whereas subjective measures of interestingness aim at capturing more practical criteria that should be taken into account, such as unexpectedness or actionability of rules. This paper only focusses on objective measures of interestingness.

«
1
2
...
46
47
48
49
50
51
52
53
54
»