9 resultados para proximity query, collision test, distance test, data compression, triangle test
em Bulgarian Digital Mathematics Library at IMI-BAS
Resumo:
questions of forming of learning sets for artificial neural networks in problems of lossless data compression are considered. Methods of construction and use of learning sets are studied. The way of forming of learning set during training an artificial neural network on the data stream is offered.
Resumo:
The present paper is devoted to creation of cryptographic data security and realization of the packet mode in the distributed information measurement and control system that implements methods of optical spectroscopy for plasma physics research and atomic collisions. This system gives a remote access to information and instrument resources within the Intranet/Internet networks. The system provides remote access to information and hardware resources for the natural sciences within the Intranet/Internet networks. The access to physical equipment is realized through the standard interface servers (PXI, CАМАC, and GPIB), the server providing access to Ethernet devices, and the communication server, which integrates the equipment servers into a uniform information system. The system is used to make research task in optical spectroscopy, as well as to support the process of education at the Department of Physics and Engineering of Petrozavodsk State University.
Resumo:
AMS Subj. Classification: H.3.7 Digital Libraries, K.6.5 Security and Protection
Resumo:
2000 Mathematics Subject Classification: 62H12, 62P99
Resumo:
The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.
Resumo:
Usually, data mining projects that are based on decision trees for classifying test cases will use the probabilities provided by these decision trees for ranking classified test cases. We have a need for a better method for ranking test cases that have already been classified by a binary decision tree because these probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has proposed a new method that could be used to rank the test cases that were classified by a binary decision tree [Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based on the probability estimate, the sensitivity of a particular case or both.
Resumo:
The paper treats the task for cluster analysis of a given assembly of objects on the basis of the information contained in the description table of these objects. Various methods of cluster analysis are briefly considered. Heuristic method and rules for classification of the given assembly of objects are presented for the cases when their division into classes and the number of classes is not known. The algorithm is checked by a test example and two program products (PP) – learning systems and software for company management. Analysis of the results is presented.
Resumo:
We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.
Resumo:
The “trial and error” method is fundamental for Master Minddecision algorithms. On the basis of Master Mind games and strategies weconsider some data mining methods for tests using students as teachers.Voting, twins, opposite, simulate and observer methods are investigated.For a pure data base these combinatorial algorithms are faster then manyAI and Master Mind methods. The complexities of these algorithms arecompared with basic combinatorial methods in AI. ACM Computing Classification System (1998): F.3.2, G.2.1, H.2.1, H.2.8, I.2.6.