47 resultados para stream mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dynamic Voltage and Frequency Scaling (DVFS) offers a huge potential for designing trade-offs involving energy, power, temperature and performance of computing systems. In this paper, we evaluate three different DVFS schemes - our enhancement of a Petri net performance model based DVFS method for sequential programs to stream programs, a simple profile based Linear Scaling method, and an existing hardware based DVFS method for multithreaded applications - using multithreaded stream applications, in a full system Chip Multiprocessor (CMP) simulator. From our evaluation, we find that the software based methods achieve significant Energy/Throughput2(ET−2) improvements. The hardware based scheme degrades performance heavily and suffers ET−2 loss. Our results indicate that the simple profile based scheme achieves the benefits of the complex Petri net based scheme for stream programs, and present a strong case for the need for independent voltage/frequency control for different cores of CMPs, which is lacking in most of the state-of-the-art CMPs. This is in contrast to the conclusions of a recent evaluation of per-core DVFS schemes for multithreaded applications for CMPs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mycobacterium tuberculosis owes its high pathogenic potential to its ability to evade host immune responses and thrive inside the macrophage. The outcome of infection is largely determined by the cellular response comprising a multitude of molecular events. The complexity and inter-relatedness in the processes makes it essential to adopt systems approaches to study them. In this work, we construct a comprehensive network of infection-related processes in a human macrophage comprising 1888 proteins and 14,016 interactions. We then compute response networks based on available gene expression profiles corresponding to states of health, disease and drug treatment. We use a novel formulation for mining response networks that has led to identifying highest activities in the cell. Highest activity paths provide mechanistic insights into pathogenesis and response to treatment. The approach used here serves as a generic framework for mining dynamic changes in genome-scale protein interaction networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The current understanding of wildfire effects on water chemistry is limited by the quantification of the elemental dissolution rates from ash and element release rate from the plant litter, as well as quantification of the specific ash contribution to stream water chemistry. The main objective of the study was to provide such knowledge through combination of experimental modelling, field data and end-member mixing analysis (EMMA) of wildfire impact on a watershed scale. The study concerns watershed effects of fire in the Indian subcontinent, a region that is typically not well represented in the fire science literature. In plant litter ash, major elements are either hosted in readily-soluble phases (K, Mg) such as salts, carbonates and oxides or in less-soluble carrier-phases (Si, Ca) such as amorphous silica, quartz and calcite. Accordingly, elemental release rates, inferred from ash leaching experiments in batch reactor, indicated that the element release into solution followed the order K > Mg > Na > Si > Ca. Experiments on plant litter leaching in mixed-flow reactor indicated two dissolution regimes: rapid, over the week and slower over the month. The mean dissolution rates at steady-state (R-ss) indicated that the release of major elements from plant litter followed the order Ca > Si > Cl > Mg > K > Na. R-ss for Si and Ca for tree leaves and herbaceous species are similar to those reported for boreal and European tree species and are higher than that from the dissolution of soil clay minerals. This identifies tropical plant litters as important source of Si and Ca for tropical surface waters. In the wildfire-impacted year 2004, the EMMA indicated that the streamflow composition (Ca, K, Mg, Na, Si, Cl) was controlled by four main sources: rainwater, throughfall, ash leaching and soil solution. The influence of the ash end-member was maximal early in the rainy season (the two first storm events) and decreased later in the rainy season, when the stream was dominated by the throughfall end-member. The contribution of plant litter decay to the streamwater composition for a year not impacted by wildfire is significant with estimated solute fluxes originating from this decay greatly exceed, for most major elements, the annual elemental dissolved fluxes at the Mule Hole watershed outlet. This highlighted the importance of solute retention and vegetation back uptake processes within the soil profile. Overall, the fire increased the mobility and export of major elements from the soils to the stream. It also shifted the vegetation-related contribution to the elemental fluxes at the watershed outlet from long-term (seasonal) to short-term (daily to monthly). (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reproductive modes are diverse and unique in anurans. Selective pressures of evolution, ecology and environment are attributed to such diverse reproductive modes. Globally forty different reproductive modes in anurans have been described to date. The genus Nyctibatrachus has been recently revised and belongs to an ancient lineage of frog families in the Western Ghats of India. Species of this genus are known to exhibit mountain associated clade endemism and novel breeding behaviours. The purpose of this study is to present unique reproductive behaviour, oviposition and parental care in a new species Nyctibatrachus kumbara sp. nov. which is described in the paper. Nyctibatrachus kumbara sp. nov. is a medium sized stream dwelling frog. It is distinct from the congeners based on a suite of morphological characters and substantially divergent in DNA sequences of the mitochondrial 16S rRNA gene. Males exhibit parental care by mud packing the egg clutch. Such parental care has so far not been described from any other frog species worldwide. Besides this, we emphasize that three co-occurring congeneric species of Nyctibatrachus, namely N. jog, N. kempholeyensis and Nyctibatrachus kumbara sp. nov. from the study site differ in breeding behaviour, which could represent a case of reproductive character displacement. These three species are distinct in their size, call pattern, reproductive behaviour, maximum number of eggs in a clutch, oviposition and parental care, which was evident from the statistical analysis. The study throws light on the reproductive behaviour of Nyctibatrachus kumbara sp. nov. and associated species to understand the evolution and adaptation of reproductive modes of anurans in general, and Nyctibatrachus in particular from the Western Ghats.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The shape dynamics of droplets exposed to an air jet at intermediate droplet Reynolds numbers is investigated. High speed imaging and hot-wire anemometry are employed to examine the mechanism of droplet oscillation. The theory that the vortex shedding behind the droplet induces oscillation is examined. In these experiments, no particular dominant frequency is found in the wake region of the droplet. Hence the inherent free-stream disturbances prove to be driving the droplet oscillations. The modes of droplet oscillation show a band of dominant frequencies near the corresponding natural frequency, further proving that there is no particular forcing frequency involved. In the frequency spectrum of the lowest mode of oscillation for glycerol at the highest Reynolds number, no response is observed below the threshold frequency corresponding to the viscous dissipation time scale. This selective suppression of lower frequencies in the case of glycerol is corroborated by scaling arguments. The influence of surface tension on the droplet oscillation is studied using ethanol as a test fluid. Since a lower surface tension reduces the natural frequency, ethanol shows lower excited frequencies. The oscillation levels of different fluids are quantified using the droplet aspect ratio and correlated in terms of Weber number and Ohnesorge number. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MATHFINDER, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MATHFINDER searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MATHFINDER as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MATHFINDER resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MATHFINDER retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MATHFINDER in a MapReduce framework and evaluate its scalability and response time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thrust-generating flapping foils are known to produce jets inclined to the free stream at high Strouhal numbers St = fA/U-infinity, where f is the frequency and A is the amplitude of flapping and U-infinity is the free-stream velocity. Our experiments, in the limiting case of St —> infinity (zero free-stream speed), show that a purely oscillatory pitching motion of a chordwise flexible foil produces a coherent jet composed of a reverse Benard-Karman vortex street along the centreline, albeit over a specific range of effective flap stiffnesses. We obtain flexibility by attaching a thin flap to the trailing edge of a rigid NACA0015 foil; length of flap is 0.79 c where c is rigid foil chord length. It is the time-varying deflections of the flexible flap that suppress the meandering found in the jets produced by a pitching rigid foil for zero free-stream condition. Recent experiments (Marais et al., J. Fluid Mech., vol. 710, 2012, p. 659) have also shown that the flexibility increases the St at which non-deflected jets are obtained. Analysing the near-wake vortex dynamics from flow visualization and particle image velocimetry (PIV) measurements, we identify the mechanisms by which flexibility suppresses jet deflection and meandering. A convenient characterization of flap deformation, caused by fluid-flap interaction, is through a non-dimensional effective stiffness', EI* = 8 EI/(rho V-TEmax(2) s(f) c(f)(3)/2), representing the inverse of the flap deflection due to the fluid-dynamic loading; here, EI is the bending stiffness of flap, rho is fluid density, V-TEmax is the maximum velocity of rigid foil trailing edge, s(f) is span and c(f) is chord length of the flexible flap. By varying the amplitude and frequency of pitching, we obtain a variation in EI* over nearly two orders of magnitude and show that only moderate EI*. (0.1 less than or similar to EI * less than or similar to 1 generates a sustained, coherent, orderly jet. Relatively `stiff' flaps (EI* greater than or similar to 1), including the extreme case of no flap, produce meandering jets, whereas highly `flexible' flaps (EI* less than or similar to 0.1) produce spread-out jets. Obtained from the measured mean velocity fields, we present values of thrust coefficients for the cases for which orderly jets are observed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proportion of chemical elements passing through vegetation prior to being exported in a stream was quantified for a forested tropical watershed(Mule Hole, South India) using an extensive hydrological and geochemical monitoring at several scales. First, a solute annual mass balance was established at the scale of the soil-plant profile for assessing the contribution of canopy interaction and litter decay to the solute fluxes of soil inputs (overland flow) and soil outputs (pore water flow as seepages). Second, based on the respective contributions of overland flow and seepages to the stream flow as estimated by a hydrological lumped model, we assigned the proportion of chemical elements in the stream that transited through the vegetation at both flood event (End Member Mixing Analysis) and seasonal scales. At the scale of the 1D soil-plant profile, leaching from the canopy constituted the main source of K above the ground surface. Litter decay was the main source of Si, whereas alkalinity, Ca and Mg originated in the same proportions from both sources. The contribution of vegetation was negligible for Na. Within the soil, all elements but Na were removed from the pore water in proportions varying from 20% for Cl to 95% for K: The soil output fluxes corresponded to a residual fraction of the infiltration fluxes. The behavior of K, Cl, Ca and Mg in the soil-plant profile can be explained by internal cycling, as their soil output fluxes were similar to the atmospheric inputs. Na was released from soils as a result of Na-plagioclase weathering and accompanied by additional release of Si. Concentration of soil pore water by evapotranspiration might limit the chemical weathering in the soil. Overall, the solute K, Ca, Mg, alkalinity and Si fluxes associated with the vegetation turnover within the small experimental watershed represented 10-15 times the solute fluxes exported by the stream, of which 83-97% transited through the vegetation. One important finding is that alkalinity and Si fluxes at the outlet were not linked to the ``current weathering'' of silicates in this watershed. These results highlight the dual effect of the vegetation cover on the solute fluxes exported from the watershed: On one hand the runoff was limited by evapotranspiration and represented only 10% of the annual rainfall, while on the other hand, 80-90% of the overall solute flux exported by the stream transited through the vegetation. The approach combining geochemical monitoring and accurate knowledge of the watershed hydrological budget provided detailed understanding of several effects of vegetation on stream fluxes: (1) evapotranspiration (limiting), (2) vertical transfer through vegetation from vadose zone to ground surface (enhancing) and (3) redistribution by throughfalls and litter decay. It provides a good basis for calibrating geochemical models and more precisely assessing the role of vegetation on soil processes. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).