119 resultados para Frequent Sequential Patterns
em Indian Institute of Science - Bangalore - Índia
Resumo:
We consider the problem of detecting statistically significant sequential patterns in multineuronal spike trains. These patterns are characterized by ordered sequences of spikes from different neurons with specific delays between spikes. We have previously proposed a data-mining scheme to efficiently discover such patterns, which occur often enough in the data. Here we propose a method to determine the statistical significance of such repeating patterns. The novelty of our approach is that we use a compound null hypothesis that not only includes models of independent neurons but also models where neurons have weak dependencies. The strength of interaction among the neurons is represented in terms of certain pair-wise conditional probabilities. We specify our null hypothesis by putting an upper bound on all such conditional probabilities. We construct a probabilistic model that captures the counting process and use this to derive a test of significance for rejecting such a compound null hypothesis. The structure of our null hypothesis also allows us to rank-order different significant patterns. We illustrate the effectiveness of our approach using spike trains generated with a simulator.
Resumo:
Nature has used the all-alpha-polypeptide backbone of proteins to create a remarkable diversity of folded structures. Sequential patterns of 20 distinct amino adds, which differ only in their side chains, determine the shape and form of proteins. Our understanding of these specific secondary structures is over half a century old and is based primarily on the fundamental elements: the Pauling alpha-helix and beta-sheet. Researchers can also generate structural diversity through the synthesis of polypeptide chains containing homologated (omega) amino acid residues, which contain a variable number of backbone atoms. However, incorporating amino adds with more atoms within the backbone introduces additional torsional freedom into the structure, which can complicate the structural analysis. Fortunately, gabapentin (Gpn), a readily available bulk drug, is an achiral beta,beta-disubstituted gamma amino add residue that contains a cyclohexyl ring at the C-beta carbon atom, which dramatically limits the range of torsion angles that can be obtained about the flanking C-C bonds. Limiting conformational flexibility also has the desirable effect of increasing peptide crystallinity, which permits unambiguous structural characterization by X-ray diffraction methods. This Account describes studies carried out in our laboratory that establish Gpn as a valuable residue in the design of specifically folded hybrid peptide structures. The insertion of additional atoms into polypeptide backbones facilitates the formation of intramolecular hydrogen bonds whose directionality is opposite to that observed in canonical alpha-peptide helices. If hybrid structures mimic proteins and biologically active peptides, the proteolytic stability conferred by unusual backbones can be a major advantage in the area of medicinal chemistry. We have demonstrated a variety of internally hydrogen-bonded structures in the solid state for Gpn-containing peptides, including the characterization of the C-7 and C-9 hydrogen bonds, which can lead to ribbons in homo-oligomeric sequences. In hybrid alpha gamma sequences, district C-12 hydrogen-bonded turn structures support formation of peptide helices and hairpins in longer sequences. Some peptides that include the Gpn residue have hydrogen-bond directionality that matches alpha-peptide helices, while others have the opposite directionality. We expect that expansion of the polypeptide backbone will lead to new classes of foldamer structures, which are thus far unknown to the world of alpha-polypeptides. The diversity of internally hydrogen-bonded structures observed in hybrid sequences containing Gpn shows promise for the rational design of novel peptide structures incorporating hybrid backbones.
Resumo:
Sequential firings with fixed time delays are frequently observed in simultaneous recordings from multiple neurons. Such temporal patterns are potentially indicative of underlying microcircuits and it is important to know when a repeatedly occurring pattern is statistically significant. These sequences are typically identified through correlation counts. In this paper we present a method for assessing the significance of such correlations. We specify the null hypothesis in terms of a bound on the conditional probabilities that characterize the influence of one neuron on another. This method of testing significance is more general than the currently available methods since under our null hypothesis we do not assume that the spiking processes of different neurons are independent. The structure of our null hypothesis also allows us to rank order the detected patterns. We demonstrate our method on simulated spike trains.
Resumo:
In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.
Resumo:
Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework(and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodesthat we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes.
Resumo:
Frequent episode discovery is a popular framework for pattern discovery from sequential data. It has found many applications in domains like alarm management in telecommunication networks, fault analysis in the manufacturing plants, predicting user behavior in web click streams and so on. In this paper, we address the discovery of serial episodes. In the episodes context, there have been multiple ways to quantify the frequency of an episode. Most of the current algorithms for episode discovery under various frequencies are apriori-based level-wise methods. These methods essentially perform a breadth-first search of the pattern space. However currently there are no depth-first based methods of pattern discovery in the frequent episode framework under many of the frequency definitions. In this paper, we try to bridge this gap. We provide new depth-first based algorithms for serial episode discovery under non-overlapped and total frequencies. Under non-overlapped frequency, we present algorithms that can take care of span constraint and gap constraint on episode occurrences. Under total frequency we present an algorithm that can handle span constraint. We provide proofs of correctness for the proposed algorithms. We demonstrate the effectiveness of the proposed algorithms by extensive simulations. We also give detailed run-time comparisons with the existing apriori-based methods and illustrate scenarios under which the proposed pattern-growth algorithms perform better than their apriori counterparts. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
A new exciting era in the study of rapidly solidified alloys has been ushered in by the discovery of a quasicrystalline phase in an Al-1O%Mn alloy by Shechtman et al. (l). The fact that a quasicrystal diffracts electrons and X-rays like a single crystal provides a powerful approach for exploring the atomic configuration in these alloys. Shechtman et al deduced the icosahedral point group symmetry exhibited by quasicrystals on the basis of a set of three electron diffraction patterns showing 5-fold, 3-fold and 2-fold axes of symmetry with appropriate angular relationships. The exotic crystallography of quasicrystals has been recently reviewed by Nelson and Halperin (2).
Resumo:
Thyristor forced commutated AC/DC convertors are useful for improving the power factor and waveform of AC-side line current. These are controlled through pulse-width modulation schemes for best performance. However, the 3-phase versions impose restrictions on the PWM strategies that can be implemented for excellent harmonic rejection. This paper presents new PWM control strategies for the 3-phase converters and compares them along with the conventional 4-pulse PWM strategy for harmonic elimination. Finally, two new PWM strategies are shown to be the best, for which oscillograms are presented from actual implementation.
Resumo:
An adaptive learning scheme, based on a fuzzy approximation to the gradient descent method for training a pattern classifier using unlabeled samples, is described. The objective function defined for the fuzzy ISODATA clustering procedure is used as the loss function for computing the gradient. Learning is based on simultaneous fuzzy decisionmaking and estimation. It uses conditional fuzzy measures on unlabeled samples. An exponential membership function is assumed for each class, and the parameters constituting these membership functions are estimated, using the gradient, in a recursive fashion. The induced possibility of occurrence of each class is useful for estimation and is computed using 1) the membership of the new sample in that class and 2) the previously computed average possibility of occurrence of the same class. An inductive entropy measure is defined in terms of induced possibility distribution to measure the extent of learning. The method is illustrated with relevant examples.
Resumo:
In bovines characterization of biochemical and molecular determinants of the dominant follicle before and during different time intervals after gonadotrophin surge requires precise identification of the dominant follicle from a follicular wave. The objectives of the present study were to standardize an experimental model in buffalo cows for accurately identifying the dominant follicle of the first wave of follicular growth and characterize changes in follicular fluid hormone concentrations as well as expression patterns of various genes associated with the process of ovulation. From the day of estrus (day 0), animals were subjected to blood sampling and ultrasonography for monitoring circulating progesterone levels and follicular growth. On day 7 of the cycle, animals were administered a PGF2α analogue (Tiaprost Trometamol, 750 μg i.m.) followed by an injection of hCG (2000 IU i.m.) 36 h later. Circulating progesterone levels progressively increased from day 1 of the cycle to 2.26 ± 0.17 ng/ml on day 7 of the cycle, but declined significantly after PGF2α injection. A progressive increase in the size of the dominant follicle was observed by ultrasonography. The follicular fluid estradiol and progesterone concentrations in the dominant follicle were 600 ± 16.7 and 38 ± 7.6 ng/ml, respectively, before hCG injection and the concentration of estradiol decreased to 125.8 ± 25.26 ng/ml, but concentration of progesterone increased to 195 ± 24.6 ng/ml, 24 h post-hCG injection. Inh-α and Cyp19A1 expressions in granulosa cells were maximal in the dominant follicle and declined in response to hCG treatment. Progesterone receptor, oxytocin and cycloxygenase-2 expressions in granulosa cells, regarded as markers of ovulation, were maximal at 24 h post-hCG. The expressions of genes belonging to the super family of proteases were also examined; Cathepsin L expression decreased, while ADAMTS 3 and 5 expressions increased 24 h post-hCG treatment. The results of the current study indicate that sequential treatments of PGF2α and hCG during early estrous cycle in the buffalo cow leads to follicular growth that culminates in ovulation. The model system reported in the present study would be valuable for examining temporo-spatial changes in the periovulatory follicle immediately before and after the onset of gonadotrophin surge.
Resumo:
This paper is a condensed version of the final report of a detailed field study of rural energy consumption patterns in six villages located west of Bangalore in the dry belt of Karnataka State in India. The study was carried out in two phases; first, a pilot study of four villages and second, the detailed study of six villages, the populations of which varied from around 350 to about 950. The pilot survey ended in late 1976, and most of the data was collected for the main project in 1977. Processing of the collected data was completed in 1980. The aim was to carry out a census survey, rather than a sample study. Hence, considerable effort was expended in production of both a suitable questionnaire, ensuring that all respondents were contacted, and devising methods which would accurately reflect the actual energy use in various energy-utilising activities. In the end, 560 households out of 578 (97%) were surveyed. The following ranking was found for the various energy sources in order of average percentage contribution to the annual total energy requirement: firewood, 81·6%; human energy, 7·7%; animal energy, 2·7%; kerosene, 2·1%; electricity, 0·6% and all other sources (rice husks, agro-wastes, coal and diesel fuel), 5·3%. In other words commercial fuels made only a small contribution to the overall energy use. It should be noted that dung cakes are not burned in this region. The average energy use pattern, sector by sector, again on a percentage basis, was as follows: domestic, 88·3%; industry, 4·7%; agriculture, 4·3%; lighting, 2·2% and transport, 0·5%. The total annual per capita energy consumption was 12·6 ± 1·2 GJ, giving an average annual household consumption of around 78·6 GJ.
Resumo:
This paper is a condensed version of the final report of a detailed field study of rural energy consumption patterns in six villages located west of Bangalore in the dry belt of Karnataka State in India. The study was carried out in two phases; first, a pilot study of four villages and second, the detailed study of six villages, the populations of which varied from around 350 to about 950. The pilot survey ended in late 1976, and most of the data was collected for the main project in 1977. Processing of the collected data was completed in 1980. The aim was to carry out a census survey, rather than a sample study. Hence, considerable effort was expended in production of both a suitable questionnaire, ensuring that all respondents were contacted, and devising methods which would accurately reflect the actual energy use in various energy-utilising activities. In the end, 560 households out of 578 (97%) were surveyed. The following ranking was found for the various energy sources in order of average percentage contribution to the annual total energy requirement: firewood, 81A·6%; human energy, 7A·7%; animal energy, 2A·7%; kerosene, 2A·1%; electricity, 0A·6% and all other sources (rice husks, agro-wastes, coal and diesel fuel), 5A·3%. In other words commercial fuels made only a small contribution to the overall energy use. It should be noted that dung cakes are not burned in this region. The average energy use pattern, sector by sector, again on a percentage basis, was as follows: domestic, 88A·3%; industry, 4A·7%; agriculture, 4A·3%; lighting, 2A·2% and transport, 0A·5%. The total annual per capita energy consumption was 12A·6 A± 1A·2 GJ, giving an average annual household consumption of around 78A·6 GJ.
Resumo:
A simple yet efficient method for the minimization of incompletely specified sequential machines (ISSMs) is proposed. Precise theorems are developed, as a consequence of which several compatibles can be deleted from consideration at the very first stage in the search for a minimal closed cover. Thus, the computational work is significantly reduced. Initial cardinality of the minimal closed cover is further reduced by a consideration of the maximal compatibles (MC's) only; as a result the method converges to the solution faster than the existing procedures. "Rank" of a compatible is defined. It is shown that ordering the compatibles, in accordance with their rank, reduces the number of comparisons to be made in the search for exclusion of compatibles. The new method is simple, systematic, and programmable. It does not involve any heuristics or intuitive procedures. For small- and medium-sized machines, it canle used for hand computation as well. For one of the illustrative examples used in this paper, 30 out of 40 compatibles can be ignored in accordance with the proposed rules and the remaining 10 compatibles only need be considered for obtaining a minimal solution.