861 resultados para Mega-mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mycobacterium tuberculosis owes its high pathogenic potential to its ability to evade host immune responses and thrive inside the macrophage. The outcome of infection is largely determined by the cellular response comprising a multitude of molecular events. The complexity and inter-relatedness in the processes makes it essential to adopt systems approaches to study them. In this work, we construct a comprehensive network of infection-related processes in a human macrophage comprising 1888 proteins and 14,016 interactions. We then compute response networks based on available gene expression profiles corresponding to states of health, disease and drug treatment. We use a novel formulation for mining response networks that has led to identifying highest activities in the cell. Highest activity paths provide mechanistic insights into pathogenesis and response to treatment. The approach used here serves as a generic framework for mining dynamic changes in genome-scale protein interaction networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MATHFINDER, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MATHFINDER searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MATHFINDER as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MATHFINDER resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MATHFINDER retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MATHFINDER in a MapReduce framework and evaluate its scalability and response time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rapid and invasive urbanization has been associated with depletion of natural resources (vegetation and water resources), which in turn deteriorates the landscape structure and conditions in the local environment. Rapid increase in population due to the migration from rural areas is one of the critical issues of the urban growth. Urbanisation in India is drastically changing the land cover and often resulting in the sprawl. The sprawl regions often lack basic amenities such as treated water supply, sanitation, etc. This necessitates regular monitoring and understanding of the rate of urban development in order to ensure the sustenance of natural resources. Urban sprawl is the extent of urbanization which leads to the development of urban forms with the destruction of ecology and natural landforms. The rate of change of land use and extent of urban sprawl can be efficiently visualized and modelled with the help of geo-informatics. The knowledge of urban area, especially the growth magnitude, shape geometry, and spatial pattern is essential to understand the growth and characteristics of urbanization process. Urban pattern, shape and growth can be quantified using spatial metrics. This communication quantifies the urbanisation and associated growth pattern in Delhi. Spatial data of four decades were analysed to understand land over and land use dynamics. Further the region was divided into 4 zones and into circles of 1 km incrementing radius to understand and quantify the local spatial changes. Results of the landscape metrics indicate that the urban center was highly aggregated and the outskirts and the buffer regions were in the verge of aggregating urban patches. Shannon's Entropy index clearly depicted the outgrowth of sprawl areas in different zones of Delhi. (C) 2014 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online Social Networks (OSNs) facilitate to create and spread information easily and rapidly, influencing others to participate and propagandize. This work proposes a novel method of profiling Influential Blogger (IB) based on the activities performed on one's blog documents who influences various other bloggers in Social Blog Network (SBN). After constructing a social blogging site, a SBN is analyzed with appropriate parameters to get the Influential Blog Power (IBP) of each blogger in the network and demonstrate that profiling IB is adequate and accurate. The proposed Profiling Influential Blogger (PIB) Algorithm survival rate of IB is high and stable. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El estudio de los impactos económicos de las políticas de control del cambio climático requiere del uso de modelos adecuados. Este artículo presenta un Modelo Dinámico de Equilibrio General Aplicado tipo Ramsey. El modelo implementa un mercado de permisos de emisión perfecto que garantiza una reducción de emisiones eficiente y efectiva, permitiéndonos calcular los costes económicos mínimos asociados al control de las emisiones de efecto invernadero. Además aprovecha al máximo la disponibilidad de datos existentes en España 1) utilizando una matriz de contabilidad social (o SAM) energética mediante la integración de la información económica de la Tablas Input-Output y la información energética de los Balances Energéticos y 2) considerando todas la emisiones sujetas a control además del CO2. Los MEGAs dinámicos son inéditos en cuanto a su elaboración y aplicación en España y permiten investigar ex-ante los efectos de políticas públicas en el medio y en largo plazo.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Four fungal species, F71PJ Acremonium sp., F531 Cylindrocarpon sp., F542, Botrytis sp., and F964 Fusarium culmorum [Wm. G. Sm.] Sacc. were recovered from hydrilla [ Hydrilla verticillata (L. f.) Royle] shoots or from soil and water surrounding hydrilla growing in ponds and lakes in Florida and shown to be capable of killing hydrilla in a bioassay. The isolates were tested singly and in combination with the leaf-mining fly, Hydrellia pakistanae (Diptera: Ephydridae), for their capability to kill or severely damage hydrilla in a bioassay.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Progress report from the Mining Biodiversity, Digging into Data Challenge round 3, project.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to obtain the distribution rules of in situ stress and mining-induced stress of Beiminghe Iron Mine, the stress relief method by overcoring was used to measure the in situ stress, and the MC type bore-hole stress gauge was adopted to measure the mining-induced stress. In the in situ stress measuring, the technique of improved hollow inclusion cells was adopted, which can realize complete temperature compensation. Based on the measuring results, the distribution model of in situ stress was established and analyzed. The in situ stress measuring result shows that the maximum horizontal stress is 1.75-2.45 times of vertical stress and almost 1.83 times of the minimum horizontal stress in this mineral field. And the mining-induced stress measuring result shows that, according to the magnitude of front abutment pressure the stress region can be separated into stress-relaxed area, stress-concentrated area and initial stress area. At the -50 m mining level of this mine, the range of stress-relaxed area is 0-3 m before mining face; the range of stress-concentrated area is 3-55 m before mining face, and the maximum mining-induced stress is 16.5-17.5 MPa, which is 15-20 m from the mining face. The coefficient of stress concentration is 1.85.