Biblioteca Digital

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Veja mais

Distributed mining of molecular fragments

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Veja mais

Synthesis of a systolic array genetic algorithm

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper presents a design for a hardware genetic algorithm which uses a pipeline of systolic arrays. These arrays have been designed using systolic synthesis techniques which involve expressing the algorithm as a set of uniform recurrence relations. The final design divorces the fitness function evaluation from the hardware and can process chromosomes of different lengths, giving the design a generic quality. The paper demonstrates the design methodology by progressively re-writing a simple genetic algorithm, expressed in C code, into a form from which systolic structures can be deduced. This paper extends previous work by introducing a simplification to a previous systolic design for the genetic algorithm. The simplification results in the removal of 2N 2 + 4N cells and reduces the time complexity by 3N + 1 cycles.

Veja mais

Implementing a generic systolic array for genetic algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have designed a highly parallel design for a simple genetic algorithm using a pipeline of systolic arrays. The systolic design provides high throughput and unidirectional pipelining by exploiting the implicit parallelism in the genetic operators. The design is significant because, unlike other hardware genetic algorithms, it is independent of both the fitness function and the particular chromosome length used in a problem. We have designed and simulated a version of the mutation array using Xilinix FPGA tools to investigate the feasibility of hardware implementation. A simple 5-chromosome mutation array occupies 195 CLBs and is capable of performing more than one million mutations per second. I. Introduction Genetic algorithms (GAs) are established search and optimization techniques which have been applied to a range of engineering and applied problems with considerable success [1]. They operate by maintaining a population of trial solutions encoded, using a suitable encoding scheme.

Veja mais

The inclusion of non-conservative forcing into a conservative contour advection algorithm

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Time patterns in UK demand for alcohol and tobacco: an application of the EM algorithm

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Capturing the pattern of structural change is a relevant task in applied demand analysis, as consumer preferences may vary significantly over time. Filtering and smoothing techniques have recently played an increasingly relevant role. A dynamic Almost Ideal Demand System with random walk parameters is estimated in order to detect modifications in consumer habits and preferences, as well as changes in the behavioural response to prices and income. Systemwise estimation, consistent with the underlying constraints from economic theory, is achieved through the EM algorithm. The proposed model is applied to UK aggregate consumption of alcohol and tobacco, using quarterly data from 1963 to 2003. Increased alcohol consumption is explained by a preference shift, addictive behaviour and a lower price elasticity. The dynamic and time-varying specification is consistent with the theoretical requirements imposed at each sample point. (c) 2005 Elsevier B.V. All rights reserved.

Veja mais

Use of a novel Hill-climbing genetic algorithm in protein folding simulations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have developed a novel Hill-climbing genetic algorithm (GA) for simulation of protein folding. The program (written in C) builds a set of Cartesian points to represent an unfolded polypeptide's backbone. The dihedral angles determining the chain's configuration are stored in an array of chromosome structures that is copied and then mutated. The fitness of the mutated chain's configuration is determined by its radius of gyration. A four-helix bundle was used to optimise simulation conditions, and the program was compared with other, larger, genetic algorithms on a variety of structures. The program ran 50% faster than other GA programs. Overall, tests on 100 non-redundant structures gave comparable results to other genetic algorithms, with the Hill-climbing program running from between 20 and 50% faster. Examples including crambin, cytochrome c, cytochrome B and hemerythrin gave good secondary structure fits with overall alpha carbon atom rms deviations of between 5 and 5.6 Angstrom with an optimised hydrophobic term in the fitness function. (C) 2003 Elsevier Ltd. All rights reserved.

Veja mais

Chromatographic alignment of LC-MS and LC-MS/MS datasets by genetic algorithm feature extraction

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Liquid chromatography-mass spectrometry (LC-MS) datasets can be compared or combined following chromatographic alignment. Here we describe a simple solution to the specific problem of aligning one LC-MS dataset and one LC-MS/MS dataset, acquired on separate instruments from an enzymatic digest of a protein mixture, using feature extraction and a genetic algorithm. First, the LC-MS dataset is searched within a few ppm of the calculated theoretical masses of peptides confidently identified by LC-MS/MS. A piecewise linear function is then fitted to these matched peptides using a genetic algorithm with a fitness function that is insensitive to incorrect matches but sufficiently flexible to adapt to the discrete shifts common when comparing LC datasets. We demonstrate the utility of this method by aligning ion trap LC-MS/MS data with accurate LC-MS data from an FTICR mass spectrometer and show how hybrid datasets can improve peptide and protein identification by combining the speed of the ion trap with the mass accuracy of the FTICR, similar to using a hybrid ion trap-FTICR instrument. We also show that the high resolving power of FTICR can improve precision and linear dynamic range in quantitative proteomics. The alignment software, msalign, is freely available as open source.

Veja mais

Intelligent wireless web services: context-aware computing in construction-logistics supply chain

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The construction industry has incurred a considerable amount of waste as a result of poor logistics supply chain network management. Therefore, managing logistics in the construction industry is critical. An effective logistic system ensures delivery of the right products and services to the right players at the right time while minimising costs and rewarding all sectors based on value added to the supply chain. This paper reports on an on-going research study on the concept of context-aware services delivery in the construction project supply chain logistics. As part of the emerging wireless technologies, an Intelligent Wireless Web (IWW) using context-aware computing capability represents the next generation ICT application to construction-logistics management. This intelligent system has the potential of serving and improving the construction logistics through access to context-specific data, information and services. Existing mobile communication deployments in the construction industry rely on static modes of information delivery and do not take into account the worker’s changing context and dynamic project conditions. The major problems in these applications are lack of context-specificity in the distribution of information, services and other project resources, and lack of cohesion with the existing desktop based ICT infrastructure. The research works focus on identifying the context dimension such as user context, environmental context and project context, selection of technologies to capture context-parameters such wireless sensors and RFID, selection of supporting technologies such as wireless communication, Semantic Web, Web Services, agents, etc. The process of integration of Context-Aware Computing and Web-Services to facilitate the creation of intelligent collaboration environment for managing construction logistics will take into account all the necessary critical parameters such as storage, transportation, distribution, assembly, etc. within off and on-site project.

Veja mais

236 resultados para Computing algorithm

Filtro por publicador