818 resultados para MULTI-RELATIONAL DATA MINING


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Solar flares were first observed by plain eye in white light by William Carrington in England in 1859. Since then these eruptions in the solar corona have intrigued scientists. It is known that flares influence the space weather experienced by the planets in a multitude of ways, for example by causing aurora borealis. Understanding flares is at the epicentre of human survival in space, as astronauts cannot survive the highly energetic particles associated with large flares in high doses without contracting serious radiation disease symptoms, unless they shield themselves effectively during space missions. Flares may be at the epicentre of man s survival in the past as well: it has been suggested that giant flares might have played a role in exterminating many of the large species on Earth, including dinosaurs. Having said that prebiotic synthesis studies have shown lightning to be a decisive requirement for amino acid synthesis on the primordial Earth. Increased lightning activity could be attributed to space weather, and flares. This thesis studies flares in two ways: in the spectral and the spatial domain. We have extracted solar spectra using three different instruments, namely GOES (Geostationary Operational Environmental Satellite), RHESSI (Reuven Ramaty High Energy Solar Spectroscopic Imager) and XSM (X-ray Solar Monitor) for the same flares. The GOES spectra are low resolution obtained with a gas proportional counter, the RHESSI spectra are higher resolution obtained with Germanium detectors and the XSM spectra are very high resolution observed with a silicon detector. It turns out that the detector technology and response influence the spectra we see substantially, and are important to understanding what conclusions to draw from the data. With imaging data, there was not such a luxury of choice available. We used RHESSI imaging data to observe the spatial size of solar flares. In the present work the focus was primarily on current solar flares. However, we did make use of our improved understanding of solar flares to observe young suns in NGC 2547. The same techniques used with solar monitors were applied with XMM-Newton, a stellar X-ray monitor, and coupled with ground based Halpha observations these techniques yielded estimates for flare parameters in young suns. The material in this thesis is therefore structured from technology to application, covering the full processing path from raw data and detector responses to concrete physical parameter results, such as the first measurement of the length of plasma flare loops in young suns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a detailed description of the hardware design and implementation of PROMIDS: a PROtotype Multi-rIng Data flow System for functional programming languages. The hardware constraints and the design trade-offs are discussed. The design of the functional units is described in detail. Finally, we report our experience with PROMIDS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The memory subsystem is a major contributor to the performance, power, and area of complex SoCs used in feature rich multimedia products. Hence, memory architecture of the embedded DSP is complex and usually custom designed with multiple banks of single-ported or dual ported on-chip scratch pad memory and multiple banks of off-chip memory. Building software for such large complex memories with many of the software components as individually optimized software IPs is a big challenge. In order to obtain good performance and a reduction in memory stalls, the data buffers of the application need to be placed carefully in different types of memory. In this paper we present a unified framework (MODLEX) that combines different data layout optimizations to address the complex DSP memory architectures. Our method models the data layout problem as multi-objective genetic algorithm (GA) with performance and power being the objectives and presents a set of solution points which is attractive from a platform design viewpoint. While most of the work in the literature assumes that performance and power are non-conflicting objectives, our work demonstrates that there is significant trade-off (up to 70%) that is possible between power and performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A method, system, and computer program product for fault data correlation in a diagnostic system are provided. The method includes receiving the fault data including a plurality of faults collected over a period of time, and identifying a plurality of episodes within the fault data, where each episode includes a sequence of the faults. The method further includes calculating a frequency of the episodes within the fault data, calculating a correlation confidence of the faults relative to the episodes as a function of the frequency of the episodes, and outputting a report of the faults with the correlation confidence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The disclosure of information and its misuse in Privacy Preserving Data Mining (PPDM) systems is a concern to the parties involved. In PPDM systems data is available amongst multiple parties collaborating to achieve cumulative mining accuracy. The vertically partitioned data available with the parties involved cannot provide accurate mining results when compared to the collaborative mining results. To overcome the privacy issue in data disclosure this paper describes a Key Distribution-Less Privacy Preserving Data Mining (KDLPPDM) system in which the publication of local association rules generated by the parties is published. The association rules are securely combined to form the combined rule set using the Commutative RSA algorithm. The combined rule sets established are used to classify or mine the data. The results discussed in this paper compare the accuracy of the rules generated using the C4. 5 based KDLPPDM system and the CS. 0 based KDLPPDM system using receiver operating characteristics curves (ROC).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anthropogenic aerosols play a crucial role in our environment, climate, and health. Assessment of spatial and temporal variation in anthropogenic aerosols is essential to determine their impact. Aerosols are of natural and anthropogenic origin and together constitute a composite aerosol system. Information about either component needs elimination of the other from the composite aerosol system. In the present work we estimated the anthropogenic aerosol fraction (AF) over the Indian region following two different approaches and inter-compared the estimates. We espouse multi-satellite data analysis and model simulations (using the CHIMERE Chemical transport model) to derive natural aerosol distribution, which was subsequently used to estimate AF over the Indian subcontinent. These two approaches are significantly different from each other. Natural aerosol satellite-derived information was extracted in terms of optical depth while model simulations yielded mass concentration. Anthropogenic aerosol fraction distribution was studied over two periods in 2008: premonsoon (March-May) and winter (November-February) in regard to the known distinct seasonality in aerosol loading and type over the Indian region. Although both techniques have derived the same property, considerable differences were noted in temporal and spatial distribution. Satellite retrieval of AF showed maximum values during the pre-monsoon and summer months while lowest values were observed in winter. On the other hand, model simulations showed the highest concentration of AF in winter and the lowest during pre-monsoon and summer months. Both techniques provided an annual average AF of comparable magnitude (similar to 0.43 +/- 0.06 from the satellite and similar to 0.48 +/- 0.19 from the model). For winter months the model-estimated AF was similar to 0.62 +/- 0.09, significantly higher than that (0.39 +/- 0.05) estimated from the satellite, while during pre-monsoon months satellite-estimated AF was similar to 0.46 +/- 0.06 and the model simulation estimation similar to 0.53 +/- 0.14. Preliminary results from this work indicate that model-simulated results are nearer to the actual variation as compared to satellite estimation in view of general seasonal variation in aerosol concentrations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online Social Networks (OSNs) facilitate to create and spread information easily and rapidly, influencing others to participate and propagandize. This work proposes a novel method of profiling Influential Blogger (IB) based on the activities performed on one's blog documents who influences various other bloggers in Social Blog Network (SBN). After constructing a social blogging site, a SBN is analyzed with appropriate parameters to get the Influential Blog Power (IBP) of each blogger in the network and demonstrate that profiling IB is adequate and accurate. The proposed Profiling Influential Blogger (PIB) Algorithm survival rate of IB is high and stable. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).