55 resultados para frequency based knowledge discovery
em Indian Institute of Science - Bangalore - Índia
Resumo:
Time-frequency analysis of various simulated and experimental signals due to elastic wave scattering from damage are performed using wavelet transform (WT) and Hilbert-Huang transform (HHT) and their performances are compared in context of quantifying the damages. Spectral finite element method is employed for numerical simulation of wave scattering. An analytical study is carried out to study the effects of higher-order damage parameters on the reflected wave from a damage. Based on this study, error bounds are computed for the signals in the spectral and also on the time-frequency domains. It is shown how such an error bound can provide all estimate of error in the modelling of wave propagation in structure with damage. Measures of damage based on WT and HHT is derived to quantify the damage information hidden in the signal. The aim of this study is to obtain detailed insights into the problem of (1) identifying localised damages (2) dispersion of multifrequency non-stationary signals after they interact with various types of damage and (3) quantifying the damages. Sensitivity analysis of the signal due to scattered wave based on time-frequency representation helps to correlate the variation of damage index measures with respect to the damage parameters like damage size and material degradation factors.
Resumo:
The interest in low bit rate video coding has increased considerably. Despite rapid progress in storage density and digital communication system performance, demand for data-transmission bandwidth and storage capacity continue to exceed the capabilities of available technologies. The growth of data-intensive digital audio, video applications and the increased use of bandwidth-limited media such as video conferencing and full motion video have not only sustained the need for efficient ways to encode analog signals, but made signal compression central to digital communication and data-storage technology. In this paper we explore techniques for compression of image sequences in a manner that optimizes the results for the human receiver. We propose a new motion estimator using two novel block match algorithms which are based on human perception. Simulations with image sequences have shown an improved bit rate while maintaining ''image quality'' when compared to conventional motion estimation techniques using the MAD block match criteria.
Resumo:
Service discovery is vital in ubiquitous applications, where a large number of devices and software components collaborate unobtrusively and provide numerous services without user intervention. Existing service discovery schemes use a service matching process in order to offer services of interest to the users. Potentially, the context information of the users and surrounding environment can be used to improve the quality of service matching. To make use of context information in service matching, a service discovery technique needs to address certain challenges. Firstly, it is required that the context information shall have unambiguous representation. Secondly, the devices in the environment shall be able to disseminate high level and low level context information seamlessly in the different networks. And thirdly, dynamic nature of the context information be taken into account. We propose a C-IOB(Context-Information, Observation and Belief) based service discovery model which deals with the above challenges by processing the context information and by formulating the beliefs based on the observations. With these formulated beliefs the required services will be provided to the users. The method has been tested with a typical ubiquitous museum guide application over different cases. The simulation results are time efficient and quite encouraging.
Resumo:
Frequent episode discovery is a popular framework for mining data available as a long sequence of events. An episode is essentially a short ordered sequence of event types and the frequency of an episode is some suitable measure of how often the episode occurs in the data sequence. Recently,we proposed a new frequency measure for episodes based on the notion of non-overlapped occurrences of episodes in the event sequence, and showed that, such a definition, in addition to yielding computationally efficient algorithms, has some important theoretical properties in connecting frequent episode discovery with HMM learning. This paper presents some new algorithms for frequent episode discovery under this non-overlapped occurrences-based frequency definition. The algorithms presented here are better (by a factor of N, where N denotes the size of episodes being discovered) in terms of both time and space complexities when compared to existing methods for frequent episode discovery. We show through some simulation experiments, that our algorithms are very efficient. The new algorithms presented here have arguably the least possible orders of spaceand time complexities for the task of frequent episode discovery.
Resumo:
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.
Resumo:
Among the many different objectives of large scale structural genomics projects are expanding the protein fold space, enhancing understanding of a model or disease-related organism, and providing foundations for structure-based drug discovery. Systematic analysis of protein structures of Mycobacterium tuberculosis has been ongoing towards meeting some of these objectives. Indian participation in these efforts has been enthusiastic and substantial. The proteins of M. tuberculosis chosen for structural analysis by the Indian groups span almost all the functional categories. The structures determined by the Indian groups have led to significant improvement in the biochemical knowledge on these proteins and consequently have started providing useful insights into the biology of M. tuberculosis. Moreover, these structures form starting points for inhibitor design studies, early results of which are encouraging. The progress made by Indian structural biologists in determining structures of M. tuberculosis proteins is highlighted in this review. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The problem of on-line recognition and retrieval of relatively weak industrial signals such as partial discharges (PD), buried in excessive noise, has been addressed in this paper. The major bottleneck being the recognition and suppression of stochastic pulsive interference (PI) due to the overlapping broad band frequency spectrum of PI and PD pulses. Therefore, on-line, onsite, PD measurement is hardly possible in conventional frequency based DSP techniques. The observed PD signal is modeled as a linear combination of systematic and random components employing probabilistic principal component analysis (PPCA) and the pdf of the underlying stochastic process is obtained. The PD/PI pulses are assumed as the mean of the process and modeled instituting non-parametric methods, based on smooth FIR filters, and a maximum aposteriori probability (MAP) procedure employed therein, to estimate the filter coefficients. The classification of the pulses is undertaken using a simple PCA classifier. The methods proposed by the authors were found to be effective in automatic retrieval of PD pulses completely rejecting PI.
Resumo:
We address the problem of recognition and retrieval of relatively weak industrial signal such as Partial Discharges (PD) buried in excessive noise. The major bottleneck being the recognition and suppression of stochastic pulsive interference (PI) which has similar time-frequency characteristics as PD pulse. Therefore conventional frequency based DSP techniques are not useful in retrieving PD pulses. We employ statistical signal modeling based on combination of long-memory process and probabilistic principal component analysis (PPCA). An parametric analysis of the signal is exercised for extracting the features of desired pules. We incorporate a wavelet based bootstrap method for obtaining the noise training vectors from observed data. The procedure adopted in this work is completely different from the research work reported in the literature, which is generally based on deserved signal frequency and noise frequency.
Resumo:
Frequent episode discovery is a popular framework for temporal pattern discovery in event streams. An episode is a partially ordered set of nodes with each node associated with an event type. Currently algorithms exist for episode discovery only when the associated partial order is total order (serial episode) or trivial (parallel episode). In this paper, we propose efficient algorithms for discovering frequent episodes with unrestricted partial orders when the associated event-types are unique. These algorithms can be easily specialized to discover only serial or parallel episodes. Also, the algorithms are flexible enough to be specialized for mining in the space of certain interesting subclasses of partial orders. We point out that frequency alone is not a sufficient measure of interestingness in the context of partial order mining. We propose a new interestingness measure for episodes with unrestricted partial orders which, when used along with frequency, results in an efficient scheme of data mining. Simulations are presented to demonstrate the effectiveness of our algorithms.
Resumo:
Users can rarely reveal their information need in full detail to a search engine within 1--2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRank thus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.
Resumo:
In this paper, we develop a game theoretic approach for clustering features in a learning problem. Feature clustering can serve as an important preprocessing step in many problems such as feature selection, dimensionality reduction, etc. In this approach, we view features as rational players of a coalitional game where they form coalitions (or clusters) among themselves in order to maximize their individual payoffs. We show how Nash Stable Partition (NSP), a well known concept in the coalitional game theory, provides a natural way of clustering features. Through this approach, one can obtain some desirable properties of the clusters by choosing appropriate payoff functions. For a small number of features, the NSP based clustering can be found by solving an integer linear program (ILP). However, for large number of features, the ILP based approach does not scale well and hence we propose a hierarchical approach. Interestingly, a key result that we prove on the equivalence between a k-size NSP of a coalitional game and minimum k-cut of an appropriately constructed graph comes in handy for large scale problems. In this paper, we use feature selection problem (in a classification setting) as a running example to illustrate our approach. We conduct experiments to illustrate the efficacy of our approach.
Resumo:
Rich data bearing on the structural and evolutionary principles of protein protein interactions are paving the way to a better understanding of the regulation of function in the cell. This is particularly the case when these interactions are considered in the framework of key pathways. Knowledge of the interactions may provide insights into the mechanisms of crucial `driver' mutations in oncogenesis. They also provide the foundation toward the design of protein protein interfaces and inhibitors that can abrogate their formation or enhance them. The main features to learn from known 3-D structures of protein protein complexes and the extensive literature which analyzes them computationally and experimentally include the interaction details which permit undertaking structure-based drug discovery, the evolution of complexes and their interactions, the consequences of alterations such as post-translational modifications, ligand binding, disease causing mutations, host pathogen interactions, oligomerization, aggregation and the roles of disorder, dynamics, allostery and more to the protein and the cell. This review highlights some of the recent advances in these areas, including design, inhibition and prediction of protein protein complexes. The field is broad, and much work has been carried out in these areas, making it challenging to cover it in its entirety. Much of this is due to the fast increase in the number of molecules whose structures have been determined experimentally and the vast increase in computational power. Here we provide a concise overview. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
In this paper we consider the task of prototype selection whose primary goal is to reduce the storage and computational requirements of the Nearest Neighbor classifier while achieving better classification accuracies. We propose a solution to the prototype selection problem using techniques from cooperative game theory and show its efficacy experimentally.
Resumo:
A plethora of indices have been proposed and used to construct dominance hierarchies in a variety of vertebrate and invertebrate societies, although the rationale for choosing a particular index for a particular species is seldom explained. In this study, we analysed and compared three such indices, viz Clutton-Brock et al.'s index (CBI), originally developed for red deer, Cervus elaphus, David's score (DS) originally proposed by the statistician H. A. David and the frequency-based index of dominance (FDI) developed and routinely used by our group for the primitively eusocial wasps Ropalidia marginata and Ropalidia cyathiformis. Dominance ranks attributed by all three indices were strongly and positively correlated for both natural data sets from the wasp colonies and for artificial data sets generated for the purpose. However, the indices differed in their ability to yield unique (untied) ranks in the natural data sets. This appears to be caused by the presence of noninteracting individuals and reversals in the direction of dominance in some of the pairs in the natural data sets. This was confirmed by creating additional artificial data sets with noninteracting individuals and with reversals. Based on the criterion of yielding the largest proportion of unique ranks, we found that FDI is best suited for societies such as the wasps belonging to Ropalidia, DS is best suited for societies with reversals and CBI remains a suitable index for societies such as red deer in which multiple interactions are uncommon. (C) 2009 The Association for the Study of Animal Behaviour. Published by Elsevier Ltd. All rights reserved.
Resumo:
It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.