80 resultados para Mining law
Resumo:
The standard Gibbs energy of formation of Rh203 at high temperature has been determined recently with high precision. The new data are significantly different from those given in thermodynamic compilations.Accurate values for enthalpy and entropy of formation at 298.15 K could not be evaluated from the new data,because reliable values for heat capacity of Rh2O3 were not available. In this article, a new measurement of the high temperature heat capacity of Rh2O3 using differential scanning calorimetry (DSC) is presented.The new values for heat capacity also differ significantly from those given in compilations. The information on heat capacity is coupled with standard Gibbs energy of formation to evaluate values for standard enthalpy and entropy of formation at 289.15 K using a multivariate analysis. The results suggest a major revision in thermodynamic data for Rh2O3. For example, it is recommended that the standard entropy of Rh203 at 298.15 K be changed from 106.27 J mol-' K-'given in the compilations of Barin and Knacke et al. to 75.69 J mol-' K". The recommended revision in the standard enthalpy of formation is from -355.64 kJ mol-'to -405.53 kJ mol".
Resumo:
The Generalized Distributive Law (GDL) is a message passing algorithm which can efficiently solve a certain class of computational problems, and includes as special cases the Viterbi's algorithm, the BCJR algorithm, the Fast-Fourier Transform, Turbo and LDPC decoding algorithms. In this paper GDL based maximum-likelihood (ML) decoding of Space-Time Block Codes (STBCs) is introduced and a sufficient condition for an STBC to admit low GDL decoding complexity is given. Fast-decoding and multigroup decoding are the two algorithms used in the literature to ML decode STBCs with low complexity. An algorithm which exploits the advantages of both these two is called Conditional ML (CML) decoding. It is shown in this paper that the GDL decoding complexity of any STBC is upper bounded by its CML decoding complexity, and that there exist codes for which the GDL complexity is strictly less than the CML complexity. Explicit examples of two such families of STBCs is given in this paper. Thus the CML is in general suboptimal in reducing the ML decoding complexity of a code, and one should design codes with low GDL complexity rather than low CML complexity.
Resumo:
Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.
Resumo:
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.
Resumo:
A method, system, and computer program product for fault data correlation in a diagnostic system are provided. The method includes receiving the fault data including a plurality of faults collected over a period of time, and identifying a plurality of episodes within the fault data, where each episode includes a sequence of the faults. The method further includes calculating a frequency of the episodes within the fault data, calculating a correlation confidence of the faults relative to the episodes as a function of the frequency of the episodes, and outputting a report of the faults with the correlation confidence.
Resumo:
A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series having events with start times and end times, a set of allowed dwelling times and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.
Resumo:
In this paper, a new proportional-navigation guidance law, called retro-proportional-navigation, is proposed. The guidance law is designed to intercept targets that are of higher speeds than the interceptor. This is a typical scenario in a ballistic target interception. The capture region analysis for both proportional-navigation and retro-proportional-navigation guidance laws are presented. The study shows that, at the cost of a higher intercept time, the retro-proportional-navigation guidance law demands lower terminal lateral acceleration than proportional navigation and can intercept high-velocity targets from many initial conditions that the classical proportional navigation cannot. Also, the capture region with the retro-proportional-navigation guidance law is shown to be larger compared with the classical proportional-navigation guidance law.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
This article does not have an abstract.
Resumo:
Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature. On the ICDAR Robust Reading Systems Challenge-1: Word Recognition Task on born digital dataset, as compared to the recognition rate of 61.5% achieved by TH-OCR after suitable pre-processing by Yang et. al. and 63.4% by ABBYY Fine Reader (used as baseline by the competition organizers without any preprocessing), we achieved 82.9% using Omnipage OCR applied on the images after being processed by our algorithm.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
The problem of designing good space-time block codes (STBCs) with low maximum-likelihood (ML) decoding complexity has gathered much attention in the literature. All the known low ML decoding complexity techniques utilize the same approach of exploiting either the multigroup decodable or the fast-decodable (conditionally multigroup decodable) structure of a code. We refer to this well-known technique of decoding STBCs as conditional ML (CML) decoding. In this paper, we introduce a new framework to construct ML decoders for STBCs based on the generalized distributive law (GDL) and the factor-graph-based sum-product algorithm. We say that an STBC is fast GDL decodable if the order of GDL decoding complexity of the code, with respect to the constellation size, is strictly less than M-lambda, where lambda is the number of independent symbols in the STBC. We give sufficient conditions for an STBC to admit fast GDL decoding, and show that both multigroup and conditionally multigroup decodable codes are fast GDL decodable. For any STBC, whether fast GDL decodable or not, we show that the GDL decoding complexity is strictly less than the CML decoding complexity. For instance, for any STBC obtained from cyclic division algebras which is not multigroup or conditionally multigroup decodable, the GDL decoder provides about 12 times reduction in complexity compared to the CML decoder. Similarly, for the Golden code, which is conditionally multigroup decodable, the GDL decoder is only half as complex as the CML decoder.