975 resultados para Massively parallel sequencing
Resumo:
Gadget-2 is a massively parallel structure formation code for cosmological simulations. In this paper, we present a Java version of Gadget-2. We evaluated the performance of the Java version by running colliding galaxies simulation and found that it can achieve around 70% of C Gadget-2's performance.
Resumo:
This paper makes a contribution in bridging the theory and practice of the polyhedral model for designing parallel algorithms. Although the theory of polyhedral model is well developed, designers of massively parallel algorithms are unable to benefit from the theory due to the lack of software tools that incorporate the wide range of transformations that are possible in the model. The Uniformization tool that we developed was the first to integrate a number of techniques and to completely automate the transformation step allowing designers to explore a wide range of feasible designs from high-level specifications.
Resumo:
Since its introduction in 1993, the Message Passing Interface (MPI) has become a de facto standard for writing High Performance Computing (HPC) applications on clusters and Massively Parallel Processors (MPPs). The recent emergence of multi-core processor systems presents a new challenge for established parallel programming paradigms, including those based on MPI. This paper presents a new Java messaging system called MPJ Express. Using this system, we exploit multiple levels of parallelism - messaging and threading - to improve application performance on multi-core processors. We refer to our approach as nested parallelism. This MPI-like Java library can support nested parallelism by using Java or Java OpenMP (JOMP) threads within an MPJ Express process. Practicality of this approach is assessed by porting to Java a massively parallel structure formation code from Cosmology called Gadget-2. We introduce nested parallelism in the Java version of the simulation code and report good speed-ups. To the best of our knowledge it is the first time this kind of hybrid parallelism is demonstrated in a high performance Java application. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
With the transition to multicore processors almost complete, the parallel processing community is seeking efficient ways to port legacy message passing applications on shared memory and multicore processors. MPJ Express is our reference implementation of Message Passing Interface (MPI)-like bindings for the Java language. Starting with the current release, the MPJ Express software can be configured in two modes: the multicore and the cluster mode. In the multicore mode, parallel Java applications execute on shared memory or multicore processors. In the cluster mode, Java applications parallelized using MPJ Express can be executed on distributed memory platforms like compute clusters and clouds. The multicore device has been implemented using Java threads in order to satisfy two main design goals of portability and performance. We also discuss the challenges of integrating the multicore device in the MPJ Express software. This turned out to be a challenging task because the parallel application executes in a single JVM in the multicore mode. On the contrary in the cluster mode, the parallel user application executes in multiple JVMs. Due to these inherent architectural differences between the two modes, the MPJ Express runtime is modified to ensure correct semantics of the parallel program. Towards the end, we compare performance of MPJ Express (multicore mode) with other C and Java message passing libraries---including mpiJava, MPJ/Ibis, MPICH2, MPJ Express (cluster mode)---on shared memory and multicore processors. We found out that MPJ Express performs signicantly better in the multicore mode than in the cluster mode. Not only this but the MPJ Express software also performs better in comparison to other Java messaging libraries including mpiJava and MPJ/Ibis when used in the multicore mode on shared memory or multicore processors. We also demonstrate effectiveness of the MPJ Express multicore device in Gadget-2, which is a massively parallel astrophysics N-body siimulation code.
Resumo:
Monitoring Earth's terrestrial water conditions is critically important to many hydrological applications such as global food production; assessing water resources sustainability; and flood, drought, and climate change prediction. These needs have motivated the development of pilot monitoring and prediction systems for terrestrial hydrologic and vegetative states, but to date only at the rather coarse spatial resolutions (∼10–100 km) over continental to global domains. Adequately addressing critical water cycle science questions and applications requires systems that are implemented globally at much higher resolutions, on the order of 1 km, resolutions referred to as hyperresolution in the context of global land surface models. This opinion paper sets forth the needs and benefits for a system that would monitor and predict the Earth's terrestrial water, energy, and biogeochemical cycles. We discuss six major challenges in developing a system: improved representation of surface‐subsurface interactions due to fine‐scale topography and vegetation; improved representation of land‐atmospheric interactions and resulting spatial information on soil moisture and evapotranspiration; inclusion of water quality as part of the biogeochemical cycle; representation of human impacts from water management; utilizing massively parallel computer systems and recent computational advances in solving hyperresolution models that will have up to 109 unknowns; and developing the required in situ and remote sensing global data sets. We deem the development of a global hyperresolution model for monitoring the terrestrial water, energy, and biogeochemical cycles a “grand challenge” to the community, and we call upon the international hydrologic community and the hydrological science support infrastructure to endorse the effort.
Resumo:
The DNA G-qadruplexes are one of the targets being actively explored for anti-cancer therapy by inhibiting them through small molecules. This computational study was conducted to predict the binding strengths and orientations of a set of novel dimethyl-amino-ethyl-acridine (DACA) analogues that are designed and synthesized in our laboratory, but did not diffract in Synchrotron light.Thecrystal structure of DNA G-Quadruplex(TGGGGT)4(PDB: 1O0K) was used as target for their binding properties in our studies.We used both the force field (FF) and QM/MM derived atomic charge schemes simultaneously for comparing the predictions of drug binding modes and their energetics. This study evaluates the comparative performance of fixed point charge based Glide XP docking and the quantum polarized ligand docking schemes. These results will provide insights on the effects of including or ignoring the drug-receptor interfacial polarization events in molecular docking simulations, which in turn, will aid the rational selection of computational methods at different levels of theory in future drug design programs. Plenty of molecular modelling tools and methods currently exist for modelling drug-receptor or protein-protein, or DNA-protein interactionssat different levels of complexities.Yet, the capasity of such tools to describevarious physico-chemical propertiesmore accuratelyis the next step ahead in currentresearch.Especially, the usage of most accurate methods in quantum mechanics(QM) is severely restricted by theirtedious nature. Though the usage of massively parallel super computing environments resulted in a tremendous improvement in molecular mechanics (MM) calculations like molecular dynamics,they are still capable of dealing with only a couple of tens to hundreds of atoms for QM methods. One such efficient strategy that utilizes thepowers of both MM and QM are the QM/MM hybrid methods. Lately, attempts have been directed towards the goal of deploying several different QM methods for betterment of force field based simulations, but with practical restrictions in place. One of such methods utilizes the inclusion of charge polarization events at the drug-receptor interface, that is not explicitly present in the MM FF.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
The last years have presented an increase in the acceptance and adoption of the parallel processing, as much for scientific computation of high performance as for applications of general intention. This acceptance has been favored mainly for the development of environments with massive parallel processing (MPP - Massively Parallel Processing) and of the distributed computation. A common point between distributed systems and MPPs architectures is the notion of message exchange, that allows the communication between processes. An environment of message exchange consists basically of a communication library that, acting as an extension of the programming languages that allow to the elaboration of applications parallel, such as C, C++ and Fortran. In the development of applications parallel, a basic aspect is on to the analysis of performance of the same ones. Several can be the metric ones used in this analysis: time of execution, efficiency in the use of the processing elements, scalability of the application with respect to the increase in the number of processors or to the increase of the instance of the treat problem. The establishment of models or mechanisms that allow this analysis can be a task sufficiently complicated considering parameters and involved degrees of freedom in the implementation of the parallel application. An joined alternative has been the use of collection tools and visualization of performance data, that allow the user to identify to points of strangulation and sources of inefficiency in an application. For an efficient visualization one becomes necessary to identify and to collect given relative to the execution of the application, stage this called instrumentation. In this work it is presented, initially, a study of the main techniques used in the collection of the performance data, and after that a detailed analysis of the main available tools is made that can be used in architectures parallel of the type to cluster Beowulf with Linux on X86 platform being used libraries of communication based in applications MPI - Message Passing Interface, such as LAM and MPICH. This analysis is validated on applications parallel bars that deal with the problems of the training of neural nets of the type perceptrons using retro-propagation. The gotten conclusions show to the potentiality and easinesses of the analyzed tools.
Resumo:
Unzipping carbon nanotubes (CNTs) is considered one of the most promising approaches for the controlled and large-scale production of graphene nanoribbons (GNR). These structures are considered of great importance for the development of nanoelectronics because of its dimensions and intrinsic nonzero band gap value. Despite many years of investigations some details on the dynamics of the CNT fracture/unzipping processes remain unclear. In this work we have investigated some of these process through molecular dynamics simulations using reactive force fields (ReaxFF), as implemented in the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) code. We considered multi-walled CNTs of different dimensions and chiralities and under induced mechanical stretching. Our preliminary results show that the unzipping mechanisms are highly dependent on CNT chirality. Well-defined and distinct fracture patterns were observed for the different chiralities. Armchair CNTs favor the creation of GNRs with well-defined armchair edges, while zigzag and chiral ones produce GNRs with less defined and defective edges. © 2012 Materials Research Society.
Resumo:
A implementação convencional do método de migração por diferenças finitas 3D, usa a técnica de splitting inline e crossline para melhorar a eficiência computacional deste algoritmo. Esta abordagem torna o algoritmo eficiente computacionalmente, porém cria anisotropia numérica. Esta anisotropia numérica por sua vez, pode levar a falsos posicionamentos de refletores inclinados, especialmente refletores com grandes ângulos de mergulho. Neste trabalho, como objetivo de evitar o surgimento da anisotropia numérica, implementamos o operador de extrapolação do campo de onda para baixo sem usar a técnica splitting inline e crossline no domínio frequência-espaço via método de diferenças finitas implícito, usando a aproximação de Padé complexa. Comparamos a performance do algoritmo iterativo Bi-gradiente conjugado estabilizado (Bi-CGSTAB) com o multifrontal massively parallel solver (MUMPS) para resolver o sistema linear oriundo do método de migração por diferenças finitas. Verifica-se que usando a expansão de Padé complexa ao invés da expansão de Padé real, o algoritmo iterativo Bi-CGSTAB fica mais eficientes computacionalmente, ou seja, a expansão de Padé complexa atua como um precondicionador para este algoritmo iterativo. Como consequência, o algoritmo iterativo Bi-CGSTAB é bem mais eficiente computacionalmente que o MUMPS para resolver o sistema linear quando usado apenas um termo da expansão de Padé complexa. Para aproximações de grandes ângulos, métodos diretos são necessários. Para validar e avaliar as propriedades desses algoritmos de migração, usamos o modelo de sal SEG/EAGE para calcular a sua resposta ao impulso.
Resumo:
Implementações dos métodos de migração diferença finita e Fourier (FFD) usam fatoração direcional para acelerar a performance e economizar custo computacional. Entretanto essa técnica introduz anisotropia numérica que podem erroneamente posicionar os refletores em mergulho ao longo das direções em que o não foi aplicado a fatoração no operador de migração. Implementamos a migração FFD 3D, sem usar a técnica do fatoração direcional, no domínio da frequência usando aproximação de Padé complexa. Essa aproximação elimina a anisotropia numérica ao preço de maior custo computacional buscando a solução do campo de onda para um sistema linear de banda larga. Experimentos numéricos, tanto no modelo homogêneo e heterogêneo, mostram que a técnica da fatoração direcional produz notáveis erros de posicionamento dos refletores em meios com forte variação lateral de velocidade. Comparamos a performance de resolução do algoritmo de FFD usando o método iterativo gradiente biconjugado estabilizado (BICGSTAB) e o multifrontal massively parallel direct solver (MUMPS). Mostrando que a aproximação de Padé complexa é um eficiente precondicionador para o BICGSTAB, reduzindo o número de iterações em relação a aproximação de Padé real. O método iterativo BICGSTAB é mais eficiente que o método direto MUMPS, quando usamos apenas um termo da expansão de Padé complexa. Para maior ângulo de abertura do operador, mais termos da série são requeridos no operador de migração, e neste caso, a performance do método direto é mais eficiente. A validação do algoritmo e as propriedades da evolução computacional foram avaliadas para a resposta ao impulso do modelo de sal SEG/EAGE.
Resumo:
Techniques of image combination, with extraction of objects to set a final scene, are very used in applications from photos montages to cinematographic productions. These techniques are called digital matting. With them is possible to decrease the cost of productions, because it is not necessary for the actor to be filmed in the location where the final scene occurs. This feature also favors its use in programs made to digital television, which demands a high quality image. Many digital matting algorithms use markings done on the images, to demarcate what is the foreground, the background and the uncertainty areas. This marking is called trimap, which is a triple map containing these three informations. The trimap is done, typically, from manual markings. In this project, methods were created that can be used in digital matting algorithms, with restriction of time and without human interaction, that is, the creation of an algorithm that generates the trimap automatically. This last one can be generated from the difference between a color of an arbitrary background and the foreground, or by using a depth map. It was also created a matting method, based on the Geodesic Matting (BAI; SAPIRO, 2009), which has an inferior processing time then the original one. Aiming to improve the performance of the applications that generates the trimap and of the algorithms that generates the alphamap (map that associates a value to the transparency of each pixel of the image), allowing its use in applications with time restrictions, it was used the CUDA architecture. Taking advantage, this way, of the computational power and the features of the GPGPU, which is massively parallel
Resumo:
Pós-graduação em Patologia - FMB
Resumo:
Pós-graduação em Patologia - FMB
Resumo:
Nowadays microfluidic is becoming an important technology in many chemical and biological processes and analysis applications. The potential to replace large-scale conventional laboratory instrumentation with miniaturized and self-contained systems, (called lab-on-a-chip (LOC) or point-of-care-testing (POCT)), offers a variety of advantages such as low reagent consumption, faster analysis speeds, and the capability of operating in a massively parallel scale in order to achieve high-throughput. Micro-electro-mechanical-systems (MEMS) technologies enable both the fabrication of miniaturized system and the possibility of developing compact and portable systems. The work described in this dissertation is towards the development of micromachined separation devices for both high-speed gas chromatography (HSGC) and gravitational field-flow fractionation (GrFFF) using MEMS technologies. Concerning the HSGC, a complete platform of three MEMS-based GC core components (injector, separation column and detector) is designed, fabricated and characterized. The microinjector consists of a set of pneumatically driven microvalves, based on a polymeric actuating membrane. Experimental results demonstrate that the microinjector is able to guarantee low dead volumes, fast actuation time, a wide operating temperature range and high chemical inertness. The microcolumn consists of an all-silicon microcolumn having a nearly circular cross-section channel. The extensive characterization has produced separation performances very close to the theoretical ideal expectations. A thermal conductivity detector (TCD) is chosen as most proper detector to be miniaturized since the volume reduction of the detector chamber results in increased mass and reduced dead volumes. The microTDC shows a good sensitivity and a very wide dynamic range. Finally a feasibility study for miniaturizing a channel suited for GrFFF is performed. The proposed GrFFF microchannel is at early stage of development, but represents a first step for the realization of a highly portable and potentially low-cost POCT device for biomedical applications.