171 resultados para Speedup


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also,a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also, a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with médium to large parallel execution overheads.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An Independent And-Parallel Prolog model and implementation, &-Prolog, are described. The description includes a summary of the system's architecture, some details of its execution model (based on the RAP-WAM model), and most importantly, its performance on sequential workstations and shared memory multiprocessors as compared with state-of-the-art Prolog systems. Speedup curves are provided for a collection of benchmark programs which demónstrate significant speed advantages over state-of the art sequential systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the current prototype of the distributed CIAO system. It introduces the concepts of "teams" and "active modules" (or active objects), which conveniently encapsulate different types of functionalities desirable from a distributed system, from parallelism for achieving speedup to client-server applications. The user primitives available are presented and their implementation described. This implementation uses attributed variables and, as an example of a communication abstraction, a blackboard that follows the Linda model. Finally, the CIAO WWW interface is also briefly described. The unctionalities of the system are illustrated through examples, using the implemented primitives.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the current prototype of the distributed CIAO system. It introduces the concepts of "teams" and "active modules" (or active objects), which conveniently encapsulate different types of functionalities desirable from a distributed system, from parallelism for achieving speedup to client-server applications. It presents the user primitives available and describes their implementation. This implementation uses attributed variables and, as an example of a communication abstraction, a blackboard that follows the Linda model. The functionalities of the system are illustrated through examples, using the implemented primitives. The implementation of most of the primitives is also described in detail.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Performance studies of actual parallel systems usually tend to concéntrate on the effectiveness of a given implementation. This is often done in the absolute, without quantitave reference to the potential parallelism contained in the programs from the point of view of the execution paradigm. We feel that studying the parallelism inherent to the programs is interesting, as it gives information about the best possible behavior of any implementation and thus allows contrasting the results obtained. We propose a method for obtaining ideal speedups for programs through a combination of sequential or parallel execution and simulation, and the algorithms that allow implementing the method. Our approach is novel and, we argüe, more accurate than previously proposed methods, in that a crucial part of the data - the execution times of tasks - is obtained from actual executions, while speedup is computed by simulation. This allows obtaining speedup (and other) data under controlled and ideal assumptions regarding issues such as number of processor, scheduling algorithm and overheads, etc. The results obtained can be used for example to evalúate the ideal parallelism that a program contains for a given model of execution and to compare such "perfect" parallelism to that obtained by a given implementation of that model. We also present a tool, IDRA, which implements the proposed method, and results obtained with IDRA for benchmark programs, which are then compared with those obtained in actual executions on real parallel systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Esta tesis constituye un gran avance en el conocimiento del estudio y análisis de inestabilidades hidrodinámicas desde un punto de vista físico y teórico, como consecuencia de haber desarrollado innovadoras técnicas para la resolución computacional eficiente y precisa de la parte principal del espectro correspondiente a los problemas de autovalores (EVP) multidimensionales que gobiernan la inestabilidad de flujos con dos o tres direcciones espaciales inhomogéneas, denominados problemas de estabilidad global lineal. En el contexto del trabajo de desarrollo de herramientas computacionales presentado en la tesis, la discretización mediante métodos de diferencias finitas estables de alto orden de los EVP bidimensionales y tridimensionales que se derivan de las ecuaciones de Navier-Stokes linealizadas sobre flujos con dos o tres direcciones espaciales inhomogéneas, ha permitido una aceleración de cuatro órdenes de magnitud en su resolución. Esta mejora de eficiencia numérica se ha conseguido gracias al hecho de que usando estos esquemas de diferencias finitas, técnicas eficientes de resolución de problemas lineales son utilizables, explotando el alto nivel de dispersión o alto número de elementos nulos en las matrices involucradas en los problemas tratados. Como más notable consecuencia cabe destacar que la resolución de EVPs multidimensionales de inestabilidad global, que hasta la fecha necesitaban de superordenadores, se ha podido realizar en ordenadores de sobremesa. Además de la solución de problemas de estabilidad global lineal, el mencionado desarrollo numérico facilitó la extensión de las ecuaciones de estabilidad parabolizadas (PSE) lineales y no lineales para analizar la inestabilidad de flujos que dependen fuertemente en dos direcciones espaciales y suavemente en la tercera con las ecuaciones de estabilidad parabolizadas tridimensionales (PSE-3D). Precisamente la capacidad de extensión del novedoso algoritmo PSE-3D para el estudio de interacciones no lineales de los modos de estabilidad, desarrollado íntegramente en esta tesis, permite la predicción de transición en flujos complejos de gran interés industrial y por lo tanto extiende el concepto clásico de PSE, el cuál ha sido empleado exitosamente durante las pasadas tres décadas en el mismo contexto para problemas de capa límite bidimensional. Típicos ejemplos de flujos incompresibles se han analizado en este trabajo sin la necesidad de recurrir a restrictivas presuposiciones usadas en el pasado. Se han estudiado problemas vorticales como es el caso de un vórtice aislado o sistemas de vórtices simulando la estela de alas, en los que la homogeneidad axial no se impone y así se puede considerar la difusión viscosa del flujo. Además, se ha estudiado el chorro giratorio turbulento, cuya inestabilidad se utiliza para mejorar las características de funcionamiento de combustores. En la tesis se abarcan adicionalmente problemas de flujos compresibles. Se presenta el estudio de inestabilidad de flujos de borde de ataque a diferentes velocidades de vuelo. También se analiza la estela formada por un elemento rugoso aislado en capa límite supersónica e hipersónica, mostrando excelentes comparaciones con resultados obtenidos mediante simulación numérica directa. Finalmente, nuevas inestabilidades se han identificado en el flujo hipersónico a Mach 7 alrededor de un cono elíptico que modela el vehículo de pruebas en vuelo HIFiRE-5. Los resultados comparan favorablemente con experimentos en vuelo, lo que subraya aún más el potencial de las metodologías de análisis de estabilidad desarrolladas en esta tesis. ABSTRACT The present thesis constitutes a step forward in advancing the frontiers of knowledge of fluid flow instability from a physical point of view, as a consequence of having been successful in developing groundbreaking methodologies for the efficient and accurate computation of the leading part of the spectrum pertinent to multi-dimensional eigenvalue problems (EVP) governing instability of flows with two or three inhomogeneous spatial directions. In the context of the numerical work presented in this thesis, the discretization of the spatial operator resulting from linearization of the Navier-Stokes equations around flows with two or three inhomogeneous spatial directions by variable-high-order stable finite-difference methods has permitted a speedup of four orders of magnitude in the solution of the corresponding two- and three-dimensional EVPs. This improvement of numerical performance has been achieved thanks to the high-sparsity level offered by the high-order finite-difference schemes employed for the discretization of the operators. This permitted use of efficient sparse linear algebra techniques without sacrificing accuracy and, consequently, solutions being obtained on typical workstations, as opposed to the previously employed supercomputers. Besides solution of the two- and three-dimensional EVPs of global linear instability, this development paved the way for the extension of the (linear and nonlinear) Parabolized Stability Equations (PSE) to analyze instability of flows which depend in a strongly-coupled inhomogeneous manner on two spatial directions and weakly on the third. Precisely the extensibility of the novel PSE-3D algorithm developed in the framework of the present thesis to study nonlinear flow instability permits transition prediction in flows of industrial interest, thus extending the classic PSE concept which has been successfully employed in the same context to boundary-layer type of flows over the last three decades. Typical examples of incompressible flows, the instability of which was analyzed in the present thesis without the need to resort to the restrictive assumptions used in the past, range from isolated vortices, and systems thereof, in which axial homogeneity is relaxed to consider viscous diffusion, as well as turbulent swirling jets, the instability of which is exploited in order to improve flame-holding properties of combustors. The instability of compressible subsonic and supersonic leading edge flows has been solved, and the wake of an isolated roughness element in a supersonic and hypersonic boundary-layer has also been analyzed with respect to its instability: excellent agreement with direct numerical simulation results has been obtained in all cases. Finally, instability analysis of Mach number 7 ow around an elliptic cone modeling the HIFiRE-5 flight test vehicle has unraveled flow instabilities near the minor-axis centerline, results comparing favorably with flight test predictions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El objetivo de este proyecto es evaluar la mejora de rendimiento que aporta la paralelización de algoritmos de procesamiento de imágenes, para su ejecución en una tarjeta gráfica. Para ello, una vez seleccionados los algoritmos a estudio, fueron desarrollados en lenguaje C++ bajo el paradigma secuencial. A continuación, tomando como base estas implementaciones, se paralelizaron siguiendo las directivas de la tecnología CUDA (Compute Unified Device Architecture) desarrollada por NVIDIA. Posteriormente, se desarrolló un interfaz gráfico de usuario en Visual C#, para una utilización más sencilla de la herramienta. Por último, se midió el rendimiento de cada uno de los algoritmos, en términos de tiempo de ejecución paralela y speedup, mediante el procesamiento de una serie de imágenes de distintos tamaños.---ABSTRACT---The aim of this Project is to evaluate the performance improvement provided by the parallelization of image processing algorithms, which will be executed on a graphics processing unit. In order to do this, once the algorithms to study were selected, each of them was developed in C++ under sequential paradigm. Then, based on these implementations, these algorithms were implemented using the compute unified device architecture (CUDA) programming model provided by NVIDIA. After that, a graphical user interface (GUI) was developed to increase application’s usability. Finally, performance of each algorithm was measured in terms of parallel execution time and speedup by processing a set of images of different sizes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper three p-adaptation strategies based on the minimization of the truncation error are presented for high order discontinuous Galerkin methods. The truncation error is approximated by means of a ? -estimation procedure and enables the identification of mesh regions that require adaptation. Three adaptation strategies are developed and termed a posteriori, quasi-a priori and quasi-a priori corrected. All strategies require fine solutions, which are obtained by enriching the polynomial order, but while the former needs time converged solutions, the last two rely on non-converged solutions, which lead to faster computations. In addition, the high order method permits the spatial decoupling for the estimated errors and enables anisotropic p-adaptation. These strategies are verified and compared in terms of accuracy and computational cost for the Euler and the compressible Navier?Stokes equations. It is shown that the two quasi- a priori methods achieve a significant reduction in computational cost when compared to a uniform polynomial enrichment. Namely, for a viscous boundary layer flow, we obtain a speedup of 6.6 and 7.6 for the quasi-a priori and quasi-a priori corrected approaches, respectively.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper focuses on the parallelization of an ocean model applying current multicore processor-based cluster architectures to an irregular computational mesh. The aim is to maximize the efficiency of the computational resources used. To make the best use of the resources offered by these architectures, this parallelization has been addressed at all the hardware levels of modern supercomputers: firstly, exploiting the internal parallelism of the CPU through vectorization; secondly, taking advantage of the multiple cores of each node using OpenMP; and finally, using the cluster nodes to distribute the computational mesh, using MPI for communication within the nodes. The speedup obtained with each parallelization technique as well as the combined overall speedup have been measured for the western Mediterranean Sea for different cluster configurations, achieving a speedup factor of 73.3 using 256 processors. The results also show the efficiency achieved in the different cluster nodes and the advantages obtained by combining OpenMP and MPI versus using only OpenMP or MPI. Finally, the scalability of the model has been analysed by examining computation and communication times as well as the communication and synchronization overhead due to parallelization.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work a p-adaptation (modification of the polynomial order) strategy based on the minimization of the truncation error is developed for high order discontinuous Galerkin methods. The truncation error is approximated by means of a truncation error estimation procedure and enables the identification of mesh regions that require adaptation. Three truncation error estimation approaches are developed and termed a posteriori, quasi-a priori and quasi-a priori corrected. Fine solutions, which are obtained by enriching the polynomial order, are required to solve the numerical problem with adequate accuracy. For the three truncation error estimation methods the former needs time converged solutions, while the last two rely on non-converged solutions, which lead to faster computations. Based on these truncation error estimation methods, algorithms for mesh adaptation were designed and tested. Firstly, an isotropic adaptation approach is presented, which leads to equally distributed polynomial orders in different coordinate directions. This first implementation is improved by incorporating a method to extrapolate the truncation error. This results in a significant reduction of computational cost. Secondly, the employed high order method permits the spatial decoupling of the estimated errors and enables anisotropic p-adaptation. The incorporation of anisotropic features leads to meshes with different polynomial orders in the different coordinate directions such that flow-features related to the geometry are resolved in a better manner. These adaptations result in a significant reduction of degrees of freedom and computational cost, while the amount of improvement depends on the test-case. Finally, this anisotropic approach is extended by using error extrapolation which leads to an even higher reduction in computational cost. These strategies are verified and compared in terms of accuracy and computational cost for the Euler and the compressible Navier-Stokes equations. The main result is that the two quasi-a priori methods achieve a significant reduction in computational cost when compared to a uniform polynomial enrichment. Namely, for a viscous boundary layer flow, we obtain a speedup of a factor of 6.6 and 7.6 for the quasi-a priori and quasi-a priori corrected approaches, respectively. RESUMEN En este trabajo se ha desarrollado una estrategia de adaptación-p (modificación del orden polinómico) para métodos Galerkin discontinuo de alto orden basada en la minimización del error de truncación. El error de truncación se estima utilizando el método tau-estimation. El estimador permite la identificación de zonas de la malla que requieren adaptación. Se distinguen tres técnicas de estimación: a posteriori, quasi a priori y quasi a priori con correción. Todas las estrategias requieren una solución obtenida en una malla fina, la cual es obtenida aumentando de manera uniforme el orden polinómico. Sin embargo, mientras que el primero requiere que esta solución esté convergida temporalmente, el resto utiliza soluciones no convergidas, lo que se traduce en un menor coste computacional. En este trabajo se han diseñado y probado algoritmos de adaptación de malla basados en métodos tau-estimation. En primer lugar, se presenta un algoritmo de adaptacin isótropo, que conduce a discretizaciones con el mismo orden polinómico en todas las direcciones espaciales. Esta primera implementación se mejora incluyendo un método para extrapolar el error de truncación. Esto resulta en una reducción significativa del coste computacional. En segundo lugar, el método de alto orden permite el desacoplamiento espacial de los errores estimados, permitiendo la adaptación anisotropica. Las mallas obtenidas mediante esta técnica tienen distintos órdenes polinómicos en cada una de las direcciones espaciales. La malla final tiene una distribución óptima de órdenes polinómicos, los cuales guardan relación con las características del flujo que, a su vez, depenen de la geometría. Estas técnicas de adaptación reducen de manera significativa los grados de libertad y el coste computacional. Por último, esta aproximación anisotropica se extiende usando extrapolación del error de truncación, lo que conlleva un coste computational aún menor. Las estrategias se verifican y se comparan en téminors de precisión y coste computacional utilizando las ecuaciones de Euler y Navier Stokes. Los dos métodos quasi a priori consiguen una reducción significativa del coste computacional en comparación con aumento uniforme del orden polinómico. En concreto, para una capa límite viscosa, obtenemos una mejora en tiempo de computación de 6.6 y 7.6 respectivamente, para las aproximaciones quasi-a priori y quasi-a priori con corrección.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many bacteria live only within animal cells and infect hosts through cytoplasmic inheritance. These endosymbiotic lineages show distinctive population structure, with small population size and effectively no recombination. As a result, endosymbionts are expected to accumulate mildly deleterious mutations. If these constitute a substantial proportion of new mutations, endosymbionts will show (i) faster sequence evolution and (ii) a possible shift in base composition reflecting mutational bias. Analyses of 16S rDNA of five independently derived endosymbiont clades show, in every case, faster evolution in endosymbionts than in free-living relatives. For aphid endosymbionts (genus Buchnera), coding genes exhibit accelerated evolution and unusually low ratios of synonymous to nonsynonymous substitutions compared to ratios for the same genes for enterics. This concentration of the rate increase in nonsynonymous substitutions is expected under the hypothesis of increased fixation of deleterious mutations. Polypeptides for all Buchnera genes analyzed have accumulated amino acids with codon families rich in A+T, supporting the hypothesis that substitutions are deleterious in terms of polypeptide function. These observations are best explained as the result of Muller's ratchet within small asexual populations, combined with mutational bias. In light of this explanation, two observations reported earlier for Buchnera, the apparent loss of a repair gene and the overproduction of a chaperonin, may reflect compensatory evolution. An alternative hypothesis, involving selection on genomic base composition, is contradicted by the observation that the speedup is concentrated at nonsynonymous sites.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Em Bioinformática são frequentes problemas cujo tratamento necessita de considerável poder de processamento/cálculo e/ou grande capacidade de armazenamento de dados e elevada largura de banda no acesso aos mesmos (de forma não comprometer a eficiência do seu processamento). Um exemplo deste tipo de problemas é a busca de regiões de similaridade em sequências de amino-ácidos de proteínas, ou em sequências de nucleótidos de DNA, por comparação com uma dada sequência fornecida (query sequence). Neste âmbito, a ferramenta computacional porventura mais conhecida e usada é o BLAST (Basic Local Alignment Search Tool) [1]. Donde, qualquer incremento no desempenho desta ferramenta tem impacto considerável (desde logo positivo) na atividade de quem a utiliza regularmente (seja para investigação, seja para fins comerciais). Precisamente, desde que o BLAST foi inicialmente introduzido, foram surgindo diversas versões, com desempenho melhorado, nomeadamente através da aplicação de técnicas de paralelização às várias fases do algoritmo (e. g., partição e distribuição das bases de dados a pesquisar, segmentação das queries, etc. ), capazes de tirar partido de diferentes ambientes computacionais de execução paralela, como: máquinas multi-core (BLAST+ 2), clusters de nós multi-core (mpiBLAST3J e, mais recentemente, co-processadores aceleradores como GPUs" ou FPGAs. É também possível usar as ferramentas da família BLAST através de um interface/sítio WEB5, que permite, de forma expedita, a pesquisa de uma variedade de bases de dados conhecidas (e em permanente atualização), com tempos de resposta suficientemente pequenos para a maioria dos utilizadores, graças aos recursos computacionais de elevado desempenho que sustentam o seu backend. Ainda assim, esta forma de utilização do BLAST poderá não ser a melhor opção em algumas situações, como por exemplo quando as bases de dados a pesquisar ainda não são de domínio público, ou, sendo-o, não estão disponíveis no referido sitio WEB. Adicionalmente, a utilização do referido sitio como ferramenta de trabalho regular pressupõe a sua disponibilidade permanente (dependente de terceiros) e uma largura de banda de qualidade suficiente, do lado do cliente, para uma interacção eficiente com o mesmo. Por estas razões, poderá ter interesse (ou ser mesmo necessário) implantar uma infra-estrutura BLAST local, capaz de albergar as bases de dados pertinentes e de suportar a sua pesquisa da forma mais eficiente possível, tudo isto levando em conta eventuais constrangimentos financeiros que limitam o tipo de hardware usado na implementação dessa infra-estrutura. Neste contexto, foi realizado um estudo comparativo de diversas versões do BLAST, numa infra-estrutura de computação paralela do IPB, baseada em componentes commodity: um cluster de 8 nós (virtuais, sob VMWare ESXi) de computação (com CPU Í7-4790K 4GHz, 32GB RAM e 128GB SSD) e um nó dotado de uma GPU (CPU Í7-2600 3.8GHz, 32GB RAM, 128 GB SSD, 1 TB HD, NVIDIA GTX 580). Assim, o foco principal incidiu na avaliação do desempenho do BLAST original e do mpiBLAST, dado que são fornecidos de base na distribuição Linux em que assenta o cluster [6]. Complementarmente, avaliou-se também o BLAST+ e o gpuBLAST no nó dotado de GPU. A avaliação contemplou diversas configurações de recursos, incluindo diferentes números de nós utilizados e diferentes plataformas de armazenamento das bases de dados (HD, SSD, NFS). As bases de dados pesquisadas correspondem a um subconjunto representativo das disponíveis no sitio WEB do BLAST, cobrindo uma variedade de dimensões (desde algumas dezenas de MBytes, até à centena de GBytes) e contendo quer sequências de amino-ácidos (env_nr e nr), quer de nucleótidos (drosohp. nt, env_nt, mito. nt, nt e patnt). Para as pesquisas foram 'usadas sequências arbitrárias de 568 letras em formato FASTA, e adoptadas as opções por omissão dos vários aplicativos BLAST. Salvo menção em contrário, os tempos de execução considerados nas comparações e no cálculo de speedups são relativos à primeira execução de uma pesquisa, não sendo assim beneficiados por qualquer efeito de cache; esta opção assume um cenário real em que não é habitual que uma mesma query seja executada várias vezes seguidas (embora possa ser re-executada, mais tarde). As principais conclusões do estudo comparativo realizado foram as seguintes: - e necessário acautelar, à priori, recursos de armazenamento com capacidade suficiente para albergar as bases de dados nas suas várias versões (originais/compactadas, descompactadas e formatadas); no nosso cenário de teste a coexistência de todas estas versões consumiu 600GBytes; - o tempo de preparação (formataçâo) das bases de dados para posterior pesquisa pode ser considerável; no nosso cenário experimental, a formatação das bases de dados mais pesadas (nr, env_nt e nt) demorou entre 30m a 40m (para o BLAST), e entre 45m a 55m (para o mpiBLAST); - embora economicamente mais onerosos, a utilização de discos de estado sólido, em alternativa a discos rígidos tradicionais, permite melhorar o tempo da formatação das bases de dados; no entanto, os benefícios registados (à volta de 9%) ficam bastante aquém do inicialmente esperado; - o tempo de execução do BLAST é fortemente penalizado quando as bases de dados são acedidas através da rede, via NFS; neste caso, nem sequer compensa usar vários cores; quando as bases de dados são locais e estão em SSD, o tempo de execução melhora bastante, em especial com a utilização de vários cores; neste caso, com 4 cores, o speedup chega a atingir 3.5 (sendo o ideal 4) para a pesquisa de BDs de proteínas, mas não passa de 1.8 para a pesquisa de BDs de nucleótidos; - o tempo de execução do mpiBLAST é muito prejudicado quando os fragmentos das bases de dados ainda não se encontram nos nós do cluster, tendo que ser distribuídos previamente à pesquisa propriamente dita; após a distribuição, a repetição das mesmas queries beneficia de speedups de 14 a 70; porém, como a mesma base de dados poderá ser usada para responder a diferentes queries, então não é necessário repetir a mesma query para amortizar o esforço de distribuição; - no cenário de teste, a utilização do mpiBLAST com 32+2 cores, face ao BLAST com 4 cores, traduz-se em speedups que, conforme a base de dados pesquisada (e previamente distribuída), variam entre 2 a 5, valores aquém do máximo teórico de 6.5 (34/4), mas ainda assim demonstradores de que, havendo essa possibilidade, compensa realizar as pesquisas em cluster; explorar vários cores) e com o gpuBLAST, realizada no nó com GPU (representativo de uma workstation típica), permite aferir qual a melhor opção no caso de não serem possíveis pesquisas em cluster; as observações realizadas indicam que não há diferenças significativas entre o BLAST e o BLAST+; adicionalmente, o desempenho do gpuBLAST foi sempre pior (aproximadmente em 50%) que o do BLAST e BLAST+, o que pode encontrar explicação na longevidade do modelo da GPU usada; - finalmente, a comparação da melhor opção no nosso cenário de teste, representada pelo uso do mpiBLAST, com o recurso a pesquisa online, no site do BLAST5, revela que o mpiBLAST apresenta um desempenho bastante competitivo com o BLAST online, chegando a ser claramente superior se se considerarem os tempos do mpiBLAST tirando partido de efeitos de cache; esta assunção acaba por se justa, Já que BLAST online também rentabiliza o mesmo tipo de efeitos; no entanto, com tempos de pequisa tão reduzidos (< 30s), só é defensável a utilização do mpiBLAST numa infra-estrutura local se o objetivo for a pesquisa de Bds não pesquisáveis via BLAS+ online.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Partial differential equation (PDE) solvers are commonly employed to study and characterize the parameter space for reaction-diffusion (RD) systems while investigating biological pattern formation. Increasingly, biologists wish to perform such studies with arbitrary surfaces representing ‘real’ 3D geometries for better insights. In this paper, we present a highly optimized CUDA-based solver for RD equations on triangulated meshes in 3D. We demonstrate our solver using a chemotactic model that can be used to study snakeskin pigmentation, for example. We employ a finite element based approach to perform explicit Euler time integrations. We compare our approach to a naive GPU implementation and provide an in-depth performance analysis, demonstrating the significant speedup afforded by our optimizations. The optimization strategies that we exploit could be generalized to other mesh based processing applications with PDE simulations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Photogrammetric reanalysis of 1985 aerial photos has revealed substantial submarine melting of the floating ice tongue of Jakobshavn Isbrae, west Greenland. The thickness of the floating tongue determined from hydrostatic equilibrium tapers from ~940 m near the grounding zone to ~600 m near the terminus. Feature tracking on orthophotos shows speeds on the July 1985 ice tongue to be nearly constant (~18.5 m/d), indicating negligible dynamic thinning. The thinning of the ice tongue is mostly due to submarine melting with average rates of 228 ± 49 m/yr (0.62 ± 0.13 m/d) between the summers of 1984 and 1985. The cause of the high melt rate is the circulation of warm seawater (thermal forcing of up to 4.2°C) beneath the tongue with convection driven by the substantial discharge of subglacial freshwater from the grounding zone. We believe that this buoyancy-driven convection is responsible for a deep channel incised into the sole of the floating tongue. A dramatic thinning, retreat, and speedup began in 1998 and continues today. The timing of the change is coincident with a 1.1°C warming of deep ocean waters entering the fjord after 1997. Assuming a linear relationship between thermal forcing and submarine melt rate, average melt rates should have increased by ~25% (~57 m/yr), sufficient to destabilize the ice tongue and initiate the ice thinning and the retreat that followed.