67 resultados para Graph mining
em Indian Institute of Science - Bangalore - Índia
Resumo:
Users can rarely reveal their information need in full detail to a search engine within 1--2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRank thus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.
Resumo:
Query suggestion is an important feature of the search engine with the explosive and diverse growth of web contents. Different kind of suggestions like query, image, movies, music and book etc. are used every day. Various types of data sources are used for the suggestions. If we model the data into various kinds of graphs then we can build a general method for any suggestions. In this paper, we have proposed a general method for query suggestion by combining two graphs: (1) query click graph which captures the relationship between queries frequently clicked on common URLs and (2) query text similarity graph which finds the similarity between two queries using Jaccard similarity. The proposed method provides literally as well as semantically relevant queries for users' need. Simulation results show that the proposed algorithm outperforms heat diffusion method by providing more number of relevant queries. It can be used for recommendation tasks like query, image, and product suggestion.
Resumo:
A desalination system is a complex multi energy domain system comprising power/energy flow across several domains such as electrical, thermal, and hydraulic. The dynamic modeling of a desalination system that comprehensively addresses all these multi energy domains is not adequately addressed in the literature. This paper proposes to address the issue of modeling the various energy domains for the case of a single stage flash evaporation desalination system. This paper presents a detailed bond graph modeling of a desalination unit with seamless integration of the power flow across electrical, thermal, and hydraulic domains. The paper further proposes a performance index function that leads to the tracking of the optimal chamber pressure giving the optimal flow rate for a given unit of energy expended. The model has been validated in steady state conditions by simulation and experimentation.
Resumo:
Bond graph is an apt modelling tool for any system working across multiple energy domains. Power electronics system modelling is usually the study of the interplay of energy in the domains of electrical, mechanical, magnetic and thermal. The usefulness of bond graph modelling in power electronic field has been realised by researchers. Consequently in the last couple of decades, there has been a steadily increasing effort in developing simulation tools for bond graph modelling that are specially suited for power electronic study. For modelling rotating magnetic fields in electromagnetic machine models, a support for vector variables is essential. Unfortunately, all bond graph simulation tools presently provide support only for scalar variables. We propose an approach to provide complex variable and vector support to bond graph such that it will enable modelling of polyphase electromagnetic and spatial vector systems. We also introduced a rotary gyrator element and use it along with the switched junction for developing the complex/vector variable's toolbox. This approach is implemented by developing a complex S-function tool box in Simulink inside a MATLAB environment This choice has been made so as to synthesise the speed of S-function, the user friendliness of Simulink and the popularity of MATLAB.
Resumo:
Flow-graph techniques are applied in this article for the analysis of an epicyclic gear train. A gear system based on this is designed and constructed for use in Numerical Control Systems.
Resumo:
A geodesic-based approach using Lamb waves is proposed to locate the acoustic emission (AE) source and damage in an isotropic metallic structure. In the case of the AE (passive) technique, the elastic waves take the shortest path from the source to the sensor array distributed in the structure. The geodesics are computed on the meshed surface of the structure using graph theory based on Dijkstra's algorithm. By propagating the waves in reverse virtually from these sensors along the geodesic path and by locating the first intersection point of these waves, one can get the AE source location. The same approach is extended for detection of damage in a structure. The wave response matrix of the given sensor configuration for the healthy and the damaged structure is obtained experimentally. The healthy and damage response matrix is compared and their difference gives the information about the reflection of waves from the damage. These waves are backpropagated from the sensors and the above method is used to locate the damage by finding the point where intersection of geodesics occurs. In this work, the geodesic approach is shown to be suitable to obtain a practicable source location solution in a more general set-up on any arbitrary surface containing finite discontinuities. Experiments were conducted on aluminum specimens of simple and complex geometry to validate this new method.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
Induction motor is a typical member of a multi-domain, non-linear, high order dynamic system. For speed control a three phase induction motor is modelled as a d–q model where linearity is assumed and non-idealities are ignored. Approximation of the physical characteristic gives a simulated behaviour away from the natural behaviour. This paper proposes a bond graph model of an induction motor that can incorporate the non-linearities and non-idealities thereby resembling the physical system more closely. The model is validated by applying the linearity and idealities constraints which shows that the conventional ‘abc’ model is a special case of the proposed generalised model.
Resumo:
In this article we study the one-dimensional random geometric (random interval) graph when the location of the nodes are independent and exponentially distributed. We derive exact results and limit theorems for the connectivity and other properties associated with this random graph. We show that the asymptotic properties of a graph with a truncated exponential distribution can be obtained using the exponential random geometric graph. © 2007 Wiley Periodicals, Inc. Random Struct. Alg., 2008.
Resumo:
We consider a multicommodity flow problem on a complete graph whose edges have random, independent, and identically distributed capacities. We show that, as the number of nodes tends to infinity, the maximumutility, given by the average of a concave function of each commodity How, has an almost-sure limit. Furthermore, the asymptotically optimal flow uses only direct and two-hop paths, and can be obtained in a distributed manner.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.