68 resultados para Association mining
Resumo:
Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.
Resumo:
The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
he paper presents, in three parts, a new approach to improve the detection and tracking performance of a track-while-scan (TWS) radar. Part 1 presents a review of current status. In this part, Part 2, it is shown how the detection can be improved by utilising information from tracker. A new multitarget tracking algorithm, capable of tracking manoeuvring targets in clutter, is then presented. The algorithm is specifically tailored so that the solution to the combinatorial problem presented in a companion paper can be applied. The implementation aspects are discussed and a multiprocessor architecture identified to realise the full potential of the algorithm. Part 3 presents analytical derivations for quantitative assessment of the performance of the TWS radar system. It also shows how the performance can be optimised.
Resumo:
Molecular association of porphyrins and their metal derivatives has been recognized as one of the important properties for many of their biological functions. The association is classified into (i) self-aggregation, (ii) intermolecular association and (iii) intramolecular association. The presence of metal ions in the porphyrin cavity is shown to alter the magnitudes of binding constants and thermodynamic parameters of complexation. The interaction between the porphyrin unit and the acceptor is described in terms of π-π interaction. The manifestation of charge transfer states both in the ground and excited states of these complexes is shown to influence the rates of excited state electron transfer reactions. Owing to paucity of crystal structure data, the time-averaged geometries of many of these complexes have been derived from magnetic resonance data.
Resumo:
Transition protein 1 (TP1) and TP2 replace histones during midspermiogenesis (stages 12-15) and are finally replaced by protamines. TPs play a predominant role in DNA condensation and chromatin remodeling during mammalian spermiogenesis. TP2 is a zinc metalloprotein with two novel zinc finger modules that condenses DNA in vitro in a GC-preference manner. TP2 also localizes to the nucleolus in transfected HeLa and Cos-7 cells, suggesting a GC-rich preference, even in vivo. We have now studied the localization pattern of TP2 in the rat spermatid nucleus. Colocalization studies using GC-selective DNA-binding dyes chromomycin A3 and 7-amino actinomycin D and an AT-selective dye, 4',6-diamidino-2-phenylindole, indicate that TP2 is preferentially localized to GC-rich sequences. Interestingly, as spermatids mature, TP2 and GC-rich DNA moves toward the nuclear periphery, and in the late stages of spermatid maturation, TP2 is predominantly localized at the nuclear periphery. Another interesting observation is the mutually exclusive localization of GC- and AT-rich DNA in the elongating and elongated spermatids. A combined immunofluorescence experiment with anti-TP2 and anti-TP1 antibodies revealed several foci of overlapping localization, indicating that TP1 and TP2 may have concerted functional roles during chromatin remodeling in mammalian spermiogenesis.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.
Resumo:
Single-stranded DNA-binding proteins (SSB) play an important role in most aspects of DNA metabolism including DNA replication, repair, and recombination. We report here the identification and characterization of SSB proteins of Mycobacterium smegmatis and Mycobacterium tuberculosis. Sequence comparison of M. smegmatis SSB revealed that it is homologous to M. tuberculosis SSB, except for a small spacer connecting the larger amino-terminal domain with the extreme carboxyl-terminal tail. The purified SSB proteins of mycobacteria bound single-stranded DNA with high affinity, and the association and dissociation constants were similar to that of the prototype SSB. The proteolytic signatures of free and bound forms of SSB proteins disclosed that DNA binding was associated with structural changes at the carboxyl-terminal domain. Significantly, SSB proteins from mycobacteria displayed high affinity for cognate RecA, whereas Escherichia coli SSB did not under comparable experimental conditions. Accordingly, SSB and RecA were coimmunoprecipitated from cell lysates, further supporting an interaction between these proteins in vivo. The carboxyl-terminal domain of M. smegmatis SSB, which is not essential for interaction with ssDNA, is the site of binding of its cognate RecA. These studies provide the first evidence for stable association of eubacterial SSB proteins with their cognate RecA, suggesting that these two proteins might function together during DNA repair and/or recombination.
Resumo:
The annual cycle of rainfall over the Korean Peninsula is marked by two peaks: one during July and the other during August. Since the mid-1970s, the maximum rainfall over the Korean Peninsula has shifted from July to August. This shift in rainfall peak was caused by a significant increase of August rainfall after the mid-1970s. The basic reason for this shift has been traced to a change in teleconnection between El Nino-Southern Oscillation (ENSO) and August rainfall. The relationship between August rainfall over Korea and ENSO changed from 1954-1975 (PI) to 1976-2002 (PII). The variability of August rainfall was significantly associated with sea surface temperature (SST) variation over the eastern equatorial Pacific during PI, but this relationship is absent during the PII period. In El Nino years during PI, low-level westerly and southerly wind anomalies are dominant around the East China Sea, which relates to strong August rainfall. In La Nina years during PI, easterly and northerly wind anomalies are dominant. During the PII period, however, westerly and southerly wind anomalies around the East China Sea were responsible for the high August rainfall over the East Asian region, even though La Nina SST conditions were in effect over the eastern Pacific.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
A combined base station association and power control problem is studied for the uplink of multichannel multicell cellular networks, in which each channel is used by exactly one cell (i.e., base station). A distributed association and power update algorithm is proposed and shown to converge to a Nash equilibrium of a noncooperative game. We consider network models with discrete mobiles (yielding an atomic congestion game), as well as a continuum of mobiles (yielding a population game). We find that the equilibria need not be Pareto efficient, nor need they be system optimal. To address the lack of system optimality, we propose pricing mechanisms. It is shown that these mechanisms can be implemented in a distributed fashion.
Resumo:
Peroxisome proliferator activated receptor-gamma 2 (PPARG2) is a nuclear hormone receptor of ligand-dependent ranscription factor involved in adipogenesis and a molecular target of the insulin sensitizers thiazolidinediones. We addressed the question of whether the 3 variants (-1279G/A, Pro12Ala, and His478His) in the PPARG2 gene are associated with type 2 diabetes mellitus and its related traits in a South Indian population. The study subjects (1000 type 2 diabetes mellitus and 1000 normal glucose-tolerant subjects) were chosen randomly from the Chennai Urban Rural Epidemiology Study, an ongoing population-based study in southern India. The variants were screened by single-stranded conformational variant, direct sequencing, and restriction fragment length polymorphism. Linkage disequilibrium was estimated from the estimates of haplotypic frequencies. The -1279G/A, Pro12Ala, and His478His variants of the PPARG2 gene were not associated with type 2 diabetes mellitus. However, the 2-loci analyses showed that, in the presence of Pro/Pro genotype of the Pro12Ala variant, the -1279G/A promoter variant showed increased susceptibility to type 2 diabetes mellitus (odds ratio, 2.092; 95% confidence interval, 1.22-3.59; P = .008), whereas in the presence of 12Ala allele, the -1279G/A showed a protective effect against type 2 diabetes mellitus (odds ratio, 0.270; 95% confidence interval, 0.15-0.49; P < .0001). The 3-loci haplotype analysis showed that the A-Ala-T (-1279G/A-Pro12Ala-His478His) haplotype was associated with a reduced risk of type 2 diabetes mellitus (P < .0001). Although our data indicate that the PPARG2 gene variants, independently, have no association with type 2 diabetes mellitus, the 2-loci genotype analysis involving -1279G/A and Pro12Ala variants and the 3-loci haplotype analysis have shown a significant association with type 2 diabetes mellitus in this South Indian population. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.