32 resultados para landfill mining discarica recupero
Resumo:
A method, system, and computer program product for fault data correlation in a diagnostic system are provided. The method includes receiving the fault data including a plurality of faults collected over a period of time, and identifying a plurality of episodes within the fault data, where each episode includes a sequence of the faults. The method further includes calculating a frequency of the episodes within the fault data, calculating a correlation confidence of the faults relative to the episodes as a function of the frequency of the episodes, and outputting a report of the faults with the correlation confidence.
Resumo:
A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series having events with start times and end times, a set of allowed dwelling times and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.
Resumo:
The stability of a bioreactor landfill slope is influenced by the quantity and method of leachate recirculation as well as on the degree of decomposition. Other factors include properties variation of waste material and geometrical configurations, i.e., height and slope of landfills. Conventionally, the stability of slopes is evaluated using factor of safety approach, in which the variability in the engineering properties of MSW is not considered directly and stability issues are resolved from past experiences and good engineering judgments. On the other hand, probabilistic approach considers variability in mathematical framework and provides stability in a rational manner that helps in decision making. The objective of the present study is to perform a parametric study on the stability of a bioreactor landfill slope in probabilistic framework considering important influencing factors, such as, variation in MSW properties, amount of leachate recirculation, and age of degradation, in a systematic manner. The results are discussed in the light of existing relevant regulations, design and operation issues.
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.
Resumo:
Mycobacterium tuberculosis owes its high pathogenic potential to its ability to evade host immune responses and thrive inside the macrophage. The outcome of infection is largely determined by the cellular response comprising a multitude of molecular events. The complexity and inter-relatedness in the processes makes it essential to adopt systems approaches to study them. In this work, we construct a comprehensive network of infection-related processes in a human macrophage comprising 1888 proteins and 14,016 interactions. We then compute response networks based on available gene expression profiles corresponding to states of health, disease and drug treatment. We use a novel formulation for mining response networks that has led to identifying highest activities in the cell. Highest activity paths provide mechanistic insights into pathogenesis and response to treatment. The approach used here serves as a generic framework for mining dynamic changes in genome-scale protein interaction networks.
Resumo:
In the analysis and design of municipal solid waste (MSW) landfills, there are many uncertainties associated with the properties of MSW during and after MSW placement. Several studies are performed involving different laboratory and field tests to understand the complex behavior and properties of MSW, and based on these studies, different models are proposed for the analysis of time dependent settlement response of MSW. For the analysis of MSW settlement, it is very important to account for the variability of model parameters that reflect different processes such as primary compression under loading, mechanical creep and biodegradation. In this paper, regression equations based on response surface method (RSM) are used to represent the complex behavior of MSW using a newly developed constitutive model. An approach to assess landfill capacities and develop landfill closure plans based on prediction of landfill settlements is proposed. The variability associated with model parameters relating to primary compression, mechanical creep and biodegradation are used to examine their influence on MSW settlement using reliability analysis framework and influence of various parameters on the settlement of MSW are estimated through sensitivity analysis. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
The current study analyzes the leachate distribution in the Orchard Hills Landfill, Davis Junction, Illinois, using a two-phase flow model to assess the influence of variability in hydraulic conductivity on the effectiveness of the existing leachate recirculation system and its operations through reliability analysis. Numerical modeling, using finite-difference code, is performed with due consideration to the spatial variation of hydraulic conductivity of the municipal solid waste (MSW). The inhomogeneous and anisotropic waste condition is assumed because it is a more realistic representation of the MSW. For the reliability analysis, the landfill is divided into 10 MSW layers with different mean values of vertical and horizontal hydraulic conductivities (decreasing from top to bottom), and the parametric study is performed by taking the coefficients of variation (COVs) as 50, 100, 150, and 200%. Monte Carlo simulations are performed to obtain statistical information (mean and COV) of output parameters of the (1) wetted area of the MSW, (2) maximum induced pore pressure, and (3) leachate outflow. The results of the reliability analysis are used to determine the influence of hydraulic conductivity on the effectiveness of the leachate recirculation and are discussed in the light of a deterministic approach. The study is useful in understanding the efficiency of the leachate recirculation system. (C) 2013 American Society of Civil Engineers.
Resumo:
In today's API-rich world, programmer productivity depends heavily on the programmer's ability to discover the required APIs. In this paper, we present a technique and tool, called MATHFINDER, to discover APIs for mathematical computations by mining unit tests of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code to compute the expression by mapping its subexpressions to API method calls. For each subexpression, MATHFINDER searches for a method such that there is a mapping between method inputs and variables of the subexpression. The subexpression, when evaluated on the test inputs of the method under this mapping, should produce results that match the method output on a large number of tests. We implemented MATHFINDER as an Eclipse plugin for discovery of third-party Java APIs and performed a user study to evaluate its effectiveness. In the study, the use of MATHFINDER resulted in a 2x improvement in programmer productivity. In 96% of the subexpressions queried for in the study, MATHFINDER retrieved the desired API methods as the top-most result. The top-most pseudo-code snippet to implement the entire expression was correct in 93% of the cases. Since the number of methods and unit tests to mine could be large in practice, we also implement MATHFINDER in a MapReduce framework and evaluate its scalability and response time.
Resumo:
Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.
Resumo:
The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).
Resumo:
Seismic design of landfills requires an understanding of the dynamic properties of municipal solid waste (MSW) and the dynamic site response of landfill waste during seismic events. The dynamic response of the Mavallipura landfill situated in Bangalore, India, is investigated using field measurements, laboratory studies and recorded ground motions from the intraplate region. The dynamic shear modulus values for the MSW were established on the basis of field measurements of shear wave velocities. Cyclic triaxial testing was performed on reconstituted MSW samples and the shear modulus reduction and damping characteristics of MSW were studied. Ten ground motions were selected based on regional seismicity and site response parameters have been obtained considering one-dimensional non-linear analysis in the DEEPSOIL program. The surface spectral response varied from 0.6 to 2g and persisted only for a period of 1s for most of the ground motions. The maximum peak ground acceleration (PGA) obtained was 0.5g and the minimum and maximum amplifications are 1.35 and 4.05. Amplification of the base acceleration was observed at the top surface of the landfill underlined by a composite soil layer and bedrock for all ground motions. Dynamic seismic properties with amplification and site response parameters for MSW landfill in Bangalore, India, are presented in this paper. This study shows that MSW has less shear stiffness and more amplification due to loose filling and damping, which need to be accounted for seismic design of MSW landfills in India.