849 resultados para Incremental mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series having events with start times and end times, a set of allowed dwelling times and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present two online algorithms for maintaining a topological order of a directed n-vertex acyclic graph as arcs are added, and detecting a cycle when one is created. Our first algorithm handles m arc additions in O(m(3/2)) time. For sparse graphs (m/n = O(1)), this bound improves the best previous bound by a logarithmic factor, and is tight to within a constant factor among algorithms satisfying a natural locality property. Our second algorithm handles an arbitrary sequence of arc additions in O(n(5/2)) time. For sufficiently dense graphs, this bound improves the best previous bound by a polynomial factor. Our bound may be far from tight: we show that the algorithm can take Omega(n(2)2 root(2lgn)) time by relating its performance to a generalization of the k-levels problem of combinatorial geometry. A completely different algorithm running in Theta (n(2) log n) time was given recently by Bender, Fineman, and Gilbert. We extend both of our algorithms to the maintenance of strong components, without affecting the asymptotic time bounds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an experimental study on damage assessment of reinforced concrete (RC) beams subjected to incremental cyclic loading. During testing acoustic emissions (AEs) were recorded. The analysis of the AE released was carried out by using parameters relaxation ratio, load ratio and calm ratio. Digital image correlation (DIC) technique and tracking with available MATLAB program were used to measure the displacement and surface strains in concrete. Earlier researchers classified the damage in RC beams using Kaiser effect, crack mouth opening displacement and proposed a standard. In general (or in practical situations), multiple cracks occur in reinforced concrete beams. In the present study damage assessment in RC beams was studied according to different limit states specified by the code of practice IS-456:2000 and AE technique. Based on the two ratios namely load ratio and calm ratio and when the deflection reached approximately 85% of the maximum allowable deflection it was observed that the RC beams were heavily damaged. The combination of AE and DIC techniques has the potential to provide the state of damage in RC structures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mycobacterium tuberculosis owes its high pathogenic potential to its ability to evade host immune responses and thrive inside the macrophage. The outcome of infection is largely determined by the cellular response comprising a multitude of molecular events. The complexity and inter-relatedness in the processes makes it essential to adopt systems approaches to study them. In this work, we construct a comprehensive network of infection-related processes in a human macrophage comprising 1888 proteins and 14,016 interactions. We then compute response networks based on available gene expression profiles corresponding to states of health, disease and drug treatment. We use a novel formulation for mining response networks that has led to identifying highest activities in the cell. Highest activity paths provide mechanistic insights into pathogenesis and response to treatment. The approach used here serves as a generic framework for mining dynamic changes in genome-scale protein interaction networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MATHFINDER, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MATHFINDER searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MATHFINDER as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MATHFINDER resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MATHFINDER retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MATHFINDER in a MapReduce framework and evaluate its scalability and response time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online Social Networks (OSNs) facilitate to create and spread information easily and rapidly, influencing others to participate and propagandize. This work proposes a novel method of profiling Influential Blogger (IB) based on the activities performed on one's blog documents who influences various other bloggers in Social Blog Network (SBN). After constructing a social blogging site, a SBN is analyzed with appropriate parameters to get the Influential Blog Power (IBP) of each blogger in the network and demonstrate that profiling IB is adequate and accurate. The proposed Profiling Influential Blogger (PIB) Algorithm survival rate of IB is high and stable. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A general incremental micromechanical scheme for the nonlinear behavior of particulate composites is presented in this paper. The advantage of this scheme is that it can reflect partly the effects of the third invariant of the stress on the overall mechanical behavior of nonlinear composites. The difficulty involved is the determination of the effective compliance tensors of the anisotropic multiphase composites. This is completed by making use of the generalized self-consistent Mori-Tanaka method which was recently developed by Dai et al. (Polymer Composites 19(1998) 506-513; Acta Mechanica Solida 18 (1998) 199-208). Comparison with existing theoretical and numerical results demonstrates that the present incremental scheme is quite satisfactory. Based on this incremental scheme, the overall mechanical behavior of a hard-particle reinforced metal matrix composite with progressive particle debonding damage is investigated.