88 resultados para association rule mining
em Indian Institute of Science - Bangalore - Índia
Resumo:
Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.
Resumo:
The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
In many IEEE 802.11 WLAN deployments, wireless clients have a choice of access points (AP) to connect to. In current systems, clients associate with the access point with the strongest signal to noise ratio. However, such an association mechanism can lead to unequal load sharing, resulting in diminished system performance. In this paper, we first provide a numerical approach based on stochastic dynamic programming to find the optimal client-AP association algorithm for a small topology consisting of two access points. Using the value iteration algorithm, we determine the optimal association rule for the two-AP topology. Next, utilizing the insights obtained from the optimal association ride for the two-AP case, we propose a near-optimal heuristic that we call RAT. We test the efficacy of RAT by considering more realistic arrival patterns and a larger topology. Our results show that RAT performs very well in these scenarios as well. Moreover, RAT lends itself to a fairly simple implementation.
Resumo:
This paper describes the design and implementation of a high-level query language called Generalized Query-By-Rule (GQBR) which supports retrieval, insertion, deletion and update operations. This language, based on the formalism of database logic, enables the users to access each database in a distributed heterogeneous environment, without having to learn all the different data manipulation languages. The compiler has been implemented on a DEC 1090 system in Pascal.
Resumo:
An efficient geometrical design rule checker is proposed, based on operations on quadtrees, which represent VLSI mask layouts. The time complexity of the design rule checker is O(N), where N is the number of polygons in the mask. A pseudoPascal description is provided of all the important algorithms for geometrical design rule verification.
Resumo:
he paper presents, in three parts, a new approach to improve the detection and tracking performance of a track-while-scan (TWS) radar. Part 1 presents a review of current status. In this part, Part 2, it is shown how the detection can be improved by utilising information from tracker. A new multitarget tracking algorithm, capable of tracking manoeuvring targets in clutter, is then presented. The algorithm is specifically tailored so that the solution to the combinatorial problem presented in a companion paper can be applied. The implementation aspects are discussed and a multiprocessor architecture identified to realise the full potential of the algorithm. Part 3 presents analytical derivations for quantitative assessment of the performance of the TWS radar system. It also shows how the performance can be optimised.
Resumo:
Molecular association of porphyrins and their metal derivatives has been recognized as one of the important properties for many of their biological functions. The association is classified into (i) self-aggregation, (ii) intermolecular association and (iii) intramolecular association. The presence of metal ions in the porphyrin cavity is shown to alter the magnitudes of binding constants and thermodynamic parameters of complexation. The interaction between the porphyrin unit and the acceptor is described in terms of π-π interaction. The manifestation of charge transfer states both in the ground and excited states of these complexes is shown to influence the rates of excited state electron transfer reactions. Owing to paucity of crystal structure data, the time-averaged geometries of many of these complexes have been derived from magnetic resonance data.
Resumo:
Transition protein 1 (TP1) and TP2 replace histones during midspermiogenesis (stages 12-15) and are finally replaced by protamines. TPs play a predominant role in DNA condensation and chromatin remodeling during mammalian spermiogenesis. TP2 is a zinc metalloprotein with two novel zinc finger modules that condenses DNA in vitro in a GC-preference manner. TP2 also localizes to the nucleolus in transfected HeLa and Cos-7 cells, suggesting a GC-rich preference, even in vivo. We have now studied the localization pattern of TP2 in the rat spermatid nucleus. Colocalization studies using GC-selective DNA-binding dyes chromomycin A3 and 7-amino actinomycin D and an AT-selective dye, 4',6-diamidino-2-phenylindole, indicate that TP2 is preferentially localized to GC-rich sequences. Interestingly, as spermatids mature, TP2 and GC-rich DNA moves toward the nuclear periphery, and in the late stages of spermatid maturation, TP2 is predominantly localized at the nuclear periphery. Another interesting observation is the mutually exclusive localization of GC- and AT-rich DNA in the elongating and elongated spermatids. A combined immunofluorescence experiment with anti-TP2 and anti-TP1 antibodies revealed several foci of overlapping localization, indicating that TP1 and TP2 may have concerted functional roles during chromatin remodeling in mammalian spermiogenesis.
Resumo:
In this paper, pattern classification problem in tool wear monitoring is solved using nature inspired techniques such as Genetic Programming(GP) and Ant-Miner (AM). The main advantage of GP and AM is their ability to learn the underlying data relationships and express them in the form of mathematical equation or simple rules. The extraction of knowledge from the training data set using GP and AM are in the form of Genetic Programming Classifier Expression (GPCE) and rules respectively. The GPCE and AM extracted rules are then applied to set of data in the testing/validation set to obtain the classification accuracy. A major attraction in GP evolved GPCE and AM based classification is the possibility of obtaining an expert system like rules that can be directly applied subsequently by the user in his/her application. The performance of the data classification using GP and AM is as good as the classification accuracy obtained in the earlier study.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.
Resumo:
Single-stranded DNA-binding proteins (SSB) play an important role in most aspects of DNA metabolism including DNA replication, repair, and recombination. We report here the identification and characterization of SSB proteins of Mycobacterium smegmatis and Mycobacterium tuberculosis. Sequence comparison of M. smegmatis SSB revealed that it is homologous to M. tuberculosis SSB, except for a small spacer connecting the larger amino-terminal domain with the extreme carboxyl-terminal tail. The purified SSB proteins of mycobacteria bound single-stranded DNA with high affinity, and the association and dissociation constants were similar to that of the prototype SSB. The proteolytic signatures of free and bound forms of SSB proteins disclosed that DNA binding was associated with structural changes at the carboxyl-terminal domain. Significantly, SSB proteins from mycobacteria displayed high affinity for cognate RecA, whereas Escherichia coli SSB did not under comparable experimental conditions. Accordingly, SSB and RecA were coimmunoprecipitated from cell lysates, further supporting an interaction between these proteins in vivo. The carboxyl-terminal domain of M. smegmatis SSB, which is not essential for interaction with ssDNA, is the site of binding of its cognate RecA. These studies provide the first evidence for stable association of eubacterial SSB proteins with their cognate RecA, suggesting that these two proteins might function together during DNA repair and/or recombination.
Resumo:
The annual cycle of rainfall over the Korean Peninsula is marked by two peaks: one during July and the other during August. Since the mid-1970s, the maximum rainfall over the Korean Peninsula has shifted from July to August. This shift in rainfall peak was caused by a significant increase of August rainfall after the mid-1970s. The basic reason for this shift has been traced to a change in teleconnection between El Nino-Southern Oscillation (ENSO) and August rainfall. The relationship between August rainfall over Korea and ENSO changed from 1954-1975 (PI) to 1976-2002 (PII). The variability of August rainfall was significantly associated with sea surface temperature (SST) variation over the eastern equatorial Pacific during PI, but this relationship is absent during the PII period. In El Nino years during PI, low-level westerly and southerly wind anomalies are dominant around the East China Sea, which relates to strong August rainfall. In La Nina years during PI, easterly and northerly wind anomalies are dominant. During the PII period, however, westerly and southerly wind anomalies around the East China Sea were responsible for the high August rainfall over the East Asian region, even though La Nina SST conditions were in effect over the eastern Pacific.