42 resultados para Modern mining
Resumo:
In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.
Resumo:
Rapid urbanisation in India has posed serious challenges to the decision makers in regional planning involving plethora of issues including provision of basic amenities (like electricity, water, sanitation, transport, etc.). Urban planning entails an understanding of landscape and urban dynamics with causal factors. Identifying, delineating and mapping landscapes on temporal scale provide an opportunity to monitor the changes, which is important for natural resource management and sustainable planning activities. Multi-source, multi-sensor, multi-temporal, multi-frequency or multi-polarization remote sensing data with efficient classification algorithms and pattern recognition techniques aid in capturing these dynamics. This paper analyses the landscape dynamics of Greater Bangalore by: (i) characterisation of direct impervious surface, (ii) computation of forest fragmentation indices and (iii) modeling to quantify and categorise urban changes. Linear unmixing is used for solving the mixed pixel problem of coarse resolution super spectral MODIS data for impervious surface characterisation. Fragmentation indices were used to classify forests – interior, perforated, edge, transitional, patch and undetermined. Based on this, urban growth model was developed to determine the type of urban growth – Infill, Expansion and Outlying growth. This helped in visualising urban growth poles and consequence of earlier policy decisions that can help in evolving strategies for effective land use policies.
Resumo:
This article is concerned with a study of an unusual effect due to density of biomass pellets in modern stoves based on close-coupled gasification-combustion process. The two processes, namely, flaming with volatiles and glowing of the char show different effects. The mass flux of the fuel bears a constant ratio with the air flow rate of gasification during the flaming process and is independent of particle density; char glowing process shows a distinct effect of density. The bed temperatures also have similar features: during flaming, they are identical, but distinct in the char burn (gasification) regime. For the cases, wood char and pellet char, the densities are 350, 990 kg/m(3), and the burn rates are 2.5 and 3.5 g/min with the bed temperatures being 1380 and 1502 K, respectively. A number of experiments on practical stoves showed wood char combustion rates of 2.5 +/- 0.5 g/min and pellet char burn rates of 3.5 +/- 0.5 g/min. In pursuit of the resolution of the differences, experimental data on single particle combustion for forced convection and ambient temperatures effects have been obtained. Single particle char combustion rate with air show a near-d(2) law and surface and core temperatures are identical for both wood and pellet char. A model based on diffusion controlled heat release-radiation-convection balance is set up. Explanation of the observed results needs to include the ash build-up over the char. This model is then used to explain observed behavior in the packed bed; the different packing densities of the biomass chars leading to different heat release rates per unit bed volume are deduced as the cause of the differences in burn rate and bed temperatures.
Resumo:
Optimal control laws are obtained for the elevator and the ailerons for a modern fighter aircraft in a rolling pullout maneuver. The problem is solved for three flight conditions using the conjugate gradient method.
Resumo:
Expanding energy access to the rural population of India presents a critical challenge for its government. The presence of 364 million people without access to electricity and 726 million who rely on biomass for cooking indicate both the failure of past policies and programs, and the need for a radical redesign of the current system. We propose an integrated implementation framework with recommendations for adopting business principles with innovative institutional, regulatory, financing and delivery mechanisms. The framework entails establishment of rural energy access authorities and energy access funds, both at the national and regional levels, to be empowered with enabling regulatory policies, capital resources and the support of multi-stakeholder partnership. These institutions are expected to design, lead, manage and monitor the rural energy interventions. At the other end, trained entrepreneurs would be expected to establish bioenergy-based micro-enterprises that will produce and distribute energy carriers to rural households at an affordable cost. The ESCOs will function as intermediaries between these enterprises and the international carbon market both in aggregating carbon credits and in trading them under CDM. If implemented, such a program could address the challenges of rural energy empowerment by creating access to modern energy carriers and climate change mitigation. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.
Resumo:
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.
Resumo:
A method, system, and computer program product for fault data correlation in a diagnostic system are provided. The method includes receiving the fault data including a plurality of faults collected over a period of time, and identifying a plurality of episodes within the fault data, where each episode includes a sequence of the faults. The method further includes calculating a frequency of the episodes within the fault data, calculating a correlation confidence of the faults relative to the episodes as a function of the frequency of the episodes, and outputting a report of the faults with the correlation confidence.
Resumo:
A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series having events with start times and end times, a set of allowed dwelling times and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.