986 resultados para Parallel methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Naïvement perçu, le processus d’évolution est une succession d’événements de duplication et de mutations graduelles dans le génome qui mènent à des changements dans les fonctions et les interactions du protéome. La famille des hydrolases de guanosine triphosphate (GTPases) similaire à Ras constitue un bon modèle de travail afin de comprendre ce phénomène fondamental, car cette famille de protéines contient un nombre limité d’éléments qui diffèrent en fonctionnalité et en interactions. Globalement, nous désirons comprendre comment les mutations singulières au niveau des GTPases affectent la morphologie des cellules ainsi que leur degré d’impact sur les populations asynchrones. Mon travail de maîtrise vise à classifier de manière significative différents phénotypes de la levure Saccaromyces cerevisiae via l’analyse de plusieurs critères morphologiques de souches exprimant des GTPases mutées et natives. Notre approche à base de microscopie et d’analyses bioinformatique des images DIC (microscopie d’interférence différentielle de contraste) permet de distinguer les phénotypes propres aux cellules natives et aux mutants. L’emploi de cette méthode a permis une détection automatisée et une caractérisation des phénotypes mutants associés à la sur-expression de GTPases constitutivement actives. Les mutants de GTPases constitutivement actifs Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V ont été analysés avec succès. En effet, l’implémentation de différents algorithmes de partitionnement, permet d’analyser des données qui combinent les mesures morphologiques de population native et mutantes. Nos résultats démontrent que l’algorithme Fuzzy C-Means performe un partitionnement efficace des cellules natives ou mutantes, où les différents types de cellules sont classifiés en fonction de plusieurs facteurs de formes cellulaires obtenus à partir des images DIC. Cette analyse démontre que les mutations Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V induisent respectivement des phénotypes amorphe, allongé, rond et large qui sont représentés par des vecteurs de facteurs de forme distincts. Ces distinctions sont observées avec différentes proportions (morphologie mutante / morphologie native) dans les populations de mutants. Le développement de nouvelles méthodes automatisées d’analyse morphologique des cellules natives et mutantes s’avère extrêmement utile pour l’étude de la famille des GTPases ainsi que des résidus spécifiques qui dictent leurs fonctions et réseau d’interaction. Nous pouvons maintenant envisager de produire des mutants de GTPases qui inversent leur fonction en ciblant des résidus divergents. La substitution fonctionnelle est ensuite détectée au niveau morphologique grâce à notre nouvelle stratégie quantitative. Ce type d’analyse peut également être transposé à d’autres familles de protéines et contribuer de manière significative au domaine de la biologie évolutive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The work is intended to study the following important aspects of document image processing and develop new methods. (1) Segmentation ofdocument images using adaptive interval valued neuro-fuzzy method. (2) Improving the segmentation procedure using Simulated Annealing technique. (3) Development of optimized compression algorithms using Genetic Algorithm and parallel Genetic Algorithm (4) Feature extraction of document images (5) Development of IV fuzzy rules. This work also helps for feature extraction and foreground and background identification. The proposed work incorporates Evolutionary and hybrid methods for segmentation and compression of document images. A study of different neural networks used in image processing, the study of developments in the area of fuzzy logic etc is carried out in this work

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chemical methods to predict the bioavailable fraction of organic contaminants are usually validated in the literature by comparison with established bioassays. A soil spiked with polycyclic aromatic hydrocarbons (PAHs) was aged over six months and subjected to butanol, cyclodextrin and tenax extractions as well as an exhaustive extraction to determine total PAH concentrations at several time points. Earthworm (Eisenia fetida) and rye grass root (Lolium multiflorum) accumulation bioassays were conducted in parallel. Butanol extractions gave the best relationship with earthworm accumulation (r2 ≤ 0.54, p ≤ 0.01); cyclodextrin, butanol and acetone–hexane extractions all gave good predictions of accumulation in rye grass roots (r2 ≤ 0.86, p ≤ 0.01). However, the profile of the PAHs extracted by the different chemical methods was significantly different (p < 0.01) to that accumulated in the organisms. Biota accumulated a higher proportion of the heavier 4-ringed PAHs. It is concluded that bioaccumulation is a complex process that cannot be predicted by measuring the bioavailable fraction alone. The ability of chemical methods to predict PAH accumulation in Eisenia fetida and Lolium multiflorum was hindered by the varied metabolic fate of the different PAHs within the organisms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical approaches have been applied to examine amino acid pairing preferences within parallel beta-sheets. The main chain hydrogen bonding pattern in parallel beta-sheets means that, for each residue pair, only one of the residues is involved in main chain hydrogen bonding with the strand containing the partner residue. We call this the hydrogen bonded (HB) residue and the partner residue the non-hydrogen bonded (nHB) residue, and differentiate between the favorability of a pair and that of its reverse pair, e.g. Asn(HB)-Thr(nHB)versus Thr(HB)-Asn(nHB). Significantly (p < or = 0.000001) favoured pairings were rationalised using stereochemical arguments. For instance, Asn(HB)-Thr(nHB) and Arg(HB)-Thr(nHB) were favoured pairs, where the residues adopted favoured chi1 rotamer positions that allowed side-chain interactions to occur. In contrast, Thr(HB)-Asn(nHB) and Thr(HB)-Arg(nHB) were not significantly favoured, and could only form side-chain interactions if the residues involved adopted less favourable chi1 conformations. The favourability of hydrophobic pairs e.g. Ile(HB)-Ile(nHB), Val(HB)-Val(nHB) and Leu(HB)-Ile(nHB) was explained by the residues adopting their most preferred chi1 and chi2 conformations, which enabled them to form nested arrangements. Cysteine-cysteine pairs are significantly favoured, although these do not form intrasheet disulphide bridges. Interactions between positively and negatively charged residues were asymmetrically preferred: those with the negatively charged residue at the HB position were more favoured. This trend was accounted for by the presence of general electrostatic interactions, which, based on analysis of distances between charged atoms, were likely to be stronger when the negatively charged residue is the HB partner. The Arg(HB)-Asp(nHB) interaction was an exception to this trend and its favorability was rationalised by the formation of specific side-chain interactions. This research provides rules that could be applied to protein structure prediction, comparative modelling and protein engineering and design. The methods used to analyse the pairing preferences are automated and detailed results are available (http://www.rubic.rdg.ac.uk/betapairprefsparallel/).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical approaches have been applied to examine amino acid pairing preferences within parallel beta-sheets. The main chain hydrogen bonding pattern in parallel beta-sheets means that, for each residue pair, only one of the residues is involved in main chain hydrogen bonding with the strand containing the partner residue. We call this the hydrogen bonded (HB) residue and the partner residue the non-hydrogen bonded (nHB) residue, and differentiate between the favourability of a pair and that of its reverse pair, e.g. Asn(HB)-Thr(nHB) versus Thr(HB)-Asn(nHB). Significantly (p <= 0.000001) favoured pairings were rationalised using stereochemical arguments. For instance, Asn(HB)-Thr(nHB) and Arg(HB)-Thr(nHB) were favoured pairs, where the residues adopted favoured chi(1) rotamer positions that allowed side-chain interactions to occur. In contrast, Thr(HB)-Asn(nHB) and Thr(HB)-Arg(nHB) were not significantly favoured, and could only form side-chain interactions if the residues involved adopted less favourable chi(1) conformations. The favourability of hydrophobic pairs e.g. Ile(HB)-Ile(nHB), Val(HB)-Val(nHB) and Leu(HB)-Ile(nHB) was explained by the residues adopting their most preferred chi(1) and chi(2) conformations, which enabled them to form nested arrangements. Cysteine-cysteine pairs are significantly favoured, although these do not form intrasheet disulphide bridges. Interactions between positively and negatively charged residues were asymmetrically preferred: those with the negatively charged residue at the HB position were more favoured. This trend was accounted for by the presence of general electrostatic interactions, which, based on analysis of distances between charged atoms, were likely to be stronger when the negatively charged residue is the HB partner. The Arg(HB)-Asp(nHB) interaction was an exception to this trend and its favourability was rationalised by the formation of specific side-chain interactions. This research provides rules that could be applied to protein structure prediction, comparative modelling and protein engineering and design. The methods used to analyse the pairing preferences are automated and detailed results are available (http:// www.rubic.rdg.ac.uk/betapairprefsparallel/). (c) 2005 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is addressed to the numerical solving of the rendering equation in realistic image creation. The rendering equation is integral equation describing the light propagation in a scene accordingly to a given illumination model. The used illumination model determines the kernel of the equation under consideration. Nowadays, widely used are the Monte Carlo methods for solving the rendering equation in order to create photorealistic images. In this work we consider the Monte Carlo solving of the rendering equation in the context of the parallel sampling scheme for hemisphere. Our aim is to apply this sampling scheme to stratified Monte Carlo integration method for parallel solving of the rendering equation. The domain for integration of the rendering equation is a hemisphere. We divide the hemispherical domain into a number of equal sub-domains of orthogonal spherical triangles. This domain partitioning allows to solve the rendering equation in parallel. It is known that the Neumann series represent the solution of the integral equation as a infinity sum of integrals. We approximate this sum with a desired truncation error (systematic error) receiving the fixed number of iteration. Then the rendering equation is solved iteratively using Monte Carlo approach. At each iteration we solve multi-dimensional integrals using uniform hemisphere partitioning scheme. An estimate of the rate of convergence is obtained using the stratified Monte Carlo method. This domain partitioning allows easy parallel realization and leads to convergence improvement of the Monte Carlo method. The high performance and Grid computing of the corresponding Monte Carlo scheme are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The sampling of certain solid angle is a fundamental operation in realistic image synthesis, where the rendering equation describing the light propagation in closed domains is solved. Monte Carlo methods for solving the rendering equation use sampling of the solid angle subtended by unit hemisphere or unit sphere in order to perform the numerical integration of the rendering equation. In this work we consider the problem for generation of uniformly distributed random samples over hemisphere and sphere. Our aim is to construct and study the parallel sampling scheme for hemisphere and sphere. First we apply the symmetry property for partitioning of hemisphere and sphere. The domain of solid angle subtended by a hemisphere is divided into a number of equal sub-domains. Each sub-domain represents solid angle subtended by orthogonal spherical triangle with fixed vertices and computable parameters. Then we introduce two new algorithms for sampling of orthogonal spherical triangles. Both algorithms are based on a transformation of the unit square. Similarly to the Arvo's algorithm for sampling of arbitrary spherical triangle the suggested algorithms accommodate the stratified sampling. We derive the necessary transformations for the algorithms. The first sampling algorithm generates a sample by mapping of the unit square onto orthogonal spherical triangle. The second algorithm directly compute the unit radius vector of a sampling point inside to the orthogonal spherical triangle. The sampling of total hemisphere and sphere is performed in parallel for all sub-domains simultaneously by using the symmetry property of partitioning. The applicability of the corresponding parallel sampling scheme for Monte Carlo and Quasi-D/lonte Carlo solving of rendering equation is discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we consider hybrid (fast stochastic approximation and deterministic refinement) algorithms for Matrix Inversion (MI) and Solving Systems of Linear Equations (SLAE). Monte Carlo methods are used for the stochastic approximation, since it is known that they are very efficient in finding a quick rough approximation of the element or a row of the inverse matrix or finding a component of the solution vector. We show how the stochastic approximation of the MI can be combined with a deterministic refinement procedure to obtain MI with the required precision and further solve the SLAE using MI. We employ a splitting A = D – C of a given non-singular matrix A, where D is a diagonal dominant matrix and matrix C is a diagonal matrix. In our algorithm for solving SLAE and MI different choices of D can be considered in order to control the norm of matrix T = D –1C, of the resulting SLAE and to minimize the number of the Markov Chains required to reach given precision. Further we run the algorithms on a mini-Grid and investigate their efficiency depending on the granularity. Corresponding experimental results are presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many scientific and engineering applications involve inverting large matrices or solving systems of linear algebraic equations. Solving these problems with proven algorithms for direct methods can take very long to compute, as they depend on the size of the matrix. The computational complexity of the stochastic Monte Carlo methods depends only on the number of chains and the length of those chains. The computing power needed by inherently parallel Monte Carlo methods can be satisfied very efficiently by distributed computing technologies such as Grid computing. In this paper we show how a load balanced Monte Carlo method for computing the inverse of a dense matrix can be constructed, show how the method can be implemented on the Grid, and demonstrate how efficiently the method scales on multiple processors. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In models of complicated physical-chemical processes operator splitting is very often applied in order to achieve sufficient accuracy as well as efficiency of the numerical solution. The recently rediscovered weighted splitting schemes have the great advantage of being parallelizable on operator level, which allows us to reduce the computational time if parallel computers are used. In this paper, the computational times needed for the weighted splitting methods are studied in comparison with the sequential (S) splitting and the Marchuk-Strang (MSt) splitting and are illustrated by numerical experiments performed by use of simplified versions of the Danish Eulerian model (DEM).