929 resultados para scientific computation
Resumo:
The modern GPUs are well suited for intensive computational tasks and massive parallel computation. Sparse matrix multiplication and linear triangular solver are the most important and heavily used kernels in scientific computation, and several challenges in developing a high performance kernel with the two modules is investigated. The main interest it to solve linear systems derived from the elliptic equations with triangular elements. The resulting linear system has a symmetric positive definite matrix. The sparse matrix is stored in the compressed sparse row (CSR) format. It is proposed a CUDA algorithm to execute the matrix vector multiplication using directly the CSR format. A dependence tree algorithm is used to determine which variables the linear triangular solver can determine in parallel. To increase the number of the parallel threads, a coloring graph algorithm is implemented to reorder the mesh numbering in a pre-processing phase. The proposed method is compared with parallel and serial available libraries. The results show that the proposed method improves the computation cost of the matrix vector multiplication. The pre-processing associated with the triangular solver needs to be executed just once in the proposed method. The conjugate gradient method was implemented and showed similar convergence rate for all the compared methods. The proposed method showed significant smaller execution time.
Resumo:
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in a relatively short time. The use of an object oriented programming paradigm, the adoption and development of a software package system, designing by contract, distributed development and collaboration with other projects are elements of this project's success. Individually, each of these concepts are useful and important but when combined they have provided a strong basis for rapid development and deployment of innovative and flexible research software for scientific computation. A primary objective of this initiative is achievement of total remote reproducibility of novel algorithmic research results.
Resumo:
As is well known B.E.M. is obtained as a mixture of the integral representation formula of classical elasticity and the discretization philosophy of the finite element method (F.E.M.). The paper presents the application of B.E.M. to elastodynamic problems. Both the transient and steady state solutions are presented as well as some techniques to simplify problems with a free-stress boundary.
Resumo:
El uso de aritmética de punto fijo es una opción de diseño muy extendida en sistemas con fuertes restricciones de área, consumo o rendimiento. Para producir implementaciones donde los costes se minimicen sin impactar negativamente en la precisión de los resultados debemos llevar a cabo una asignación cuidadosa de anchuras de palabra. Encontrar la combinación óptima de anchuras de palabra en coma fija para un sistema dado es un problema combinatorio NP-hard al que los diseñadores dedican entre el 25 y el 50 % del ciclo de diseño. Las plataformas hardware reconfigurables, como son las FPGAs, también se benefician de las ventajas que ofrece la aritmética de coma fija, ya que éstas compensan las frecuencias de reloj más bajas y el uso más ineficiente del hardware que hacen estas plataformas respecto a los ASICs. A medida que las FPGAs se popularizan para su uso en computación científica los diseños aumentan de tamaño y complejidad hasta llegar al punto en que no pueden ser manejados eficientemente por las técnicas actuales de modelado de señal y ruido de cuantificación y de optimización de anchura de palabra. En esta Tesis Doctoral exploramos distintos aspectos del problema de la cuantificación y presentamos nuevas metodologías para cada uno de ellos: Las técnicas basadas en extensiones de intervalos han permitido obtener modelos de propagación de señal y ruido de cuantificación muy precisos en sistemas con operaciones no lineales. Nosotros llevamos esta aproximación un paso más allá introduciendo elementos de Multi-Element Generalized Polynomial Chaos (ME-gPC) y combinándolos con una técnica moderna basada en Modified Affine Arithmetic (MAA) estadístico para así modelar sistemas que contienen estructuras de control de flujo. Nuestra metodología genera los distintos caminos de ejecución automáticamente, determina las regiones del dominio de entrada que ejercitarán cada uno de ellos y extrae los momentos estadísticos del sistema a partir de dichas soluciones parciales. Utilizamos esta técnica para estimar tanto el rango dinámico como el ruido de redondeo en sistemas con las ya mencionadas estructuras de control de flujo y mostramos la precisión de nuestra aproximación, que en determinados casos de uso con operadores no lineales llega a tener tan solo una desviación del 0.04% con respecto a los valores de referencia obtenidos mediante simulación. Un inconveniente conocido de las técnicas basadas en extensiones de intervalos es la explosión combinacional de términos a medida que el tamaño de los sistemas a estudiar crece, lo cual conlleva problemas de escalabilidad. Para afrontar este problema presen tamos una técnica de inyección de ruidos agrupados que hace grupos con las señales del sistema, introduce las fuentes de ruido para cada uno de los grupos por separado y finalmente combina los resultados de cada uno de ellos. De esta forma, el número de fuentes de ruido queda controlado en cada momento y, debido a ello, la explosión combinatoria se minimiza. También presentamos un algoritmo de particionado multi-vía destinado a minimizar la desviación de los resultados a causa de la pérdida de correlación entre términos de ruido con el objetivo de mantener los resultados tan precisos como sea posible. La presente Tesis Doctoral también aborda el desarrollo de metodologías de optimización de anchura de palabra basadas en simulaciones de Monte-Cario que se ejecuten en tiempos razonables. Para ello presentamos dos nuevas técnicas que exploran la reducción del tiempo de ejecución desde distintos ángulos: En primer lugar, el método interpolativo aplica un interpolador sencillo pero preciso para estimar la sensibilidad de cada señal, y que es usado después durante la etapa de optimización. En segundo lugar, el método incremental gira en torno al hecho de que, aunque es estrictamente necesario mantener un intervalo de confianza dado para los resultados finales de nuestra búsqueda, podemos emplear niveles de confianza más relajados, lo cual deriva en un menor número de pruebas por simulación, en las etapas iniciales de la búsqueda, cuando todavía estamos lejos de las soluciones optimizadas. Mediante estas dos aproximaciones demostramos que podemos acelerar el tiempo de ejecución de los algoritmos clásicos de búsqueda voraz en factores de hasta x240 para problemas de tamaño pequeño/mediano. Finalmente, este libro presenta HOPLITE, una infraestructura de cuantificación automatizada, flexible y modular que incluye la implementación de las técnicas anteriores y se proporciona de forma pública. Su objetivo es ofrecer a desabolladores e investigadores un entorno común para prototipar y verificar nuevas metodologías de cuantificación de forma sencilla. Describimos el flujo de trabajo, justificamos las decisiones de diseño tomadas, explicamos su API pública y hacemos una demostración paso a paso de su funcionamiento. Además mostramos, a través de un ejemplo sencillo, la forma en que conectar nuevas extensiones a la herramienta con las interfaces ya existentes para poder así expandir y mejorar las capacidades de HOPLITE. ABSTRACT Using fixed-point arithmetic is one of the most common design choices for systems where area, power or throughput are heavily constrained. In order to produce implementations where the cost is minimized without negatively impacting the accuracy of the results, a careful assignment of word-lengths is required. The problem of finding the optimal combination of fixed-point word-lengths for a given system is a combinatorial NP-hard problem to which developers devote between 25 and 50% of the design-cycle time. Reconfigurable hardware platforms such as FPGAs also benefit of the advantages of fixed-point arithmetic, as it compensates for the slower clock frequencies and less efficient area utilization of the hardware platform with respect to ASICs. As FPGAs become commonly used for scientific computation, designs constantly grow larger and more complex, up to the point where they cannot be handled efficiently by current signal and quantization noise modelling and word-length optimization methodologies. In this Ph.D. Thesis we explore different aspects of the quantization problem and we present new methodologies for each of them: The techniques based on extensions of intervals have allowed to obtain accurate models of the signal and quantization noise propagation in systems with non-linear operations. We take this approach a step further by introducing elements of MultiElement Generalized Polynomial Chaos (ME-gPC) and combining them with an stateof- the-art Statistical Modified Affine Arithmetic (MAA) based methodology in order to model systems that contain control-flow structures. Our methodology produces the different execution paths automatically, determines the regions of the input domain that will exercise them, and extracts the system statistical moments from the partial results. We use this technique to estimate both the dynamic range and the round-off noise in systems with the aforementioned control-flow structures. We show the good accuracy of our approach, which in some case studies with non-linear operators shows a 0.04 % deviation respect to the simulation-based reference values. A known drawback of the techniques based on extensions of intervals is the combinatorial explosion of terms as the size of the targeted systems grows, which leads to scalability problems. To address this issue we present a clustered noise injection technique that groups the signals in the system, introduces the noise terms in each group independently and then combines the results at the end. In this way, the number of noise sources in the system at a given time is controlled and, because of this, the combinato rial explosion is minimized. We also present a multi-way partitioning algorithm aimed at minimizing the deviation of the results due to the loss of correlation between noise terms, in order to keep the results as accurate as possible. This Ph.D. Thesis also covers the development of methodologies for word-length optimization based on Monte-Carlo simulations in reasonable times. We do so by presenting two novel techniques that explore the reduction of the execution times approaching the problem in two different ways: First, the interpolative method applies a simple but precise interpolator to estimate the sensitivity of each signal, which is later used to guide the optimization effort. Second, the incremental method revolves on the fact that, although we strictly need to guarantee a certain confidence level in the simulations for the final results of the optimization process, we can do it with more relaxed levels, which in turn implies using a considerably smaller amount of samples, in the initial stages of the process, when we are still far from the optimized solution. Through these two approaches we demonstrate that the execution time of classical greedy techniques can be accelerated by factors of up to ×240 for small/medium sized problems. Finally, this book introduces HOPLITE, an automated, flexible and modular framework for quantization that includes the implementation of the previous techniques and is provided for public access. The aim is to offer a common ground for developers and researches for prototyping and verifying new techniques for system modelling and word-length optimization easily. We describe its work flow, justifying the taken design decisions, explain its public API and we do a step-by-step demonstration of its execution. We also show, through an example, the way new extensions to the flow should be connected to the existing interfaces in order to expand and improve the capabilities of HOPLITE.
Resumo:
Acknowledgments The authors acknowledge the support from Engineering and Physical Sciences Research Council, grant number EP/M002322/1. The authors would also like to thank Numerical Analysis Group at the Rutherford Appleton Laboratory for their FORTRAN HSL packages (HSL, a collection of Fortran codes for large-scale scientific computation. See http://www.hsl.rl.ac.uk/).
Resumo:
This paper compares three alternative numerical algorithms applied to a nonlinear metal cutting problem. One algorithm is based on an explicit method and the other two are implicit. Domain decomposition (DD) is used to break the original domain into subdomains, each containing a properly connected, well-formulated and continuous subproblem. The serial version of the explicit algorithm is implemented in FORTRAN and its parallel version uses MPI (Message Passing Interface) calls. One implicit algorithm is implemented by coupling the state-of-the-art PETSc (Portable, Extensible Toolkit for Scientific Computation) software with in-house software in order to solve the subproblems. The second implicit algorithm is implemented completely within PETSc. PETSc uses MPI as the underlying communication library. Finally, a 2D example is used to test the algorithms and various comparisons are made.
Resumo:
A constante evolução da tecnologia disponibilizou, atualmente, ferramentas computacionais que eram apenas expectativas há 10 anos atrás. O aumento do potencial computacional aplicado a modelos numéricos que simulam a atmosfera permitiu ampliar o estudo de fenômenos atmosféricos, através do uso de ferramentas de computação de alto desempenho. O trabalho propôs o desenvolvimento de algoritmos com base em arquiteturas SIMT e aplicação de técnicas de paralelismo com uso da ferramenta OpenACC para processamento de dados de previsão numérica do modelo Weather Research and Forecast. Esta proposta tem forte conotação interdisciplinar, buscando a interação entre as áreas de modelagem atmosférica e computação científica. Foram testadas a influência da computação do cálculo de microfísica de nuvens na degradação temporal do modelo. Como a entrada de dados para execução na GPU não era suficientemente grande, o tempo necessário para transferir dados da CPU para a GPU foi maior do que a execução da computação na CPU. Outro fator determinante foi a adição de código CUDA dentro de um contexto MPI, causando assim condições de disputa de recursos entre os processadores, mais uma vez degradando o tempo de execução. A proposta do uso de diretivas para aplicar computação de alto desempenho em uma estrutura CUDA parece muito promissora, mas ainda precisa ser utilizada com muita cautela a fim de produzir bons resultados. A construção de um híbrido MPI + CUDA foi testada, mas os resultados não foram conclusivos.
Resumo:
This thesis deals with tensor completion for the solution of multidimensional inverse problems. We study the problem of reconstructing an approximately low rank tensor from a small number of noisy linear measurements. New recovery guarantees, numerical algorithms, non-uniform sampling strategies, and parameter selection algorithms are developed. We derive a fixed point continuation algorithm for tensor completion and prove its convergence. A restricted isometry property (RIP) based tensor recovery guarantee is proved. Probabilistic recovery guarantees are obtained for sub-Gaussian measurement operators and for measurements obtained by non-uniform sampling from a Parseval tight frame. We show how tensor completion can be used to solve multidimensional inverse problems arising in NMR relaxometry. Algorithms are developed for regularization parameter selection, including accelerated k-fold cross-validation and generalized cross-validation. These methods are validated on experimental and simulated data. We also derive condition number estimates for nonnegative least squares problems. Tensor recovery promises to significantly accelerate N-dimensional NMR relaxometry and related experiments, enabling previously impractical experiments. Our methods could also be applied to other inverse problems arising in machine learning, image processing, signal processing, computer vision, and other fields.
Resumo:
Although tyrosine kinase inhibitors (TKIs) such as imatinib have transformed chronic myelogenous leukemia (CML) into a chronic condition, these therapies are not curative in the majority of cases. Most patients must continue TKI therapy indefinitely, a requirement that is both expensive and that compromises a patient's quality of life. While TKIs are known to reduce leukemic cells' proliferative capacity and to induce apoptosis, their effects on leukemic stem cells, the immune system, and the microenvironment are not fully understood. A more complete understanding of their global therapeutic effects would help us to identify any limitations of TKI monotherapy and to address these issues through novel combination therapies. Mathematical models are a complementary tool to experimental and clinical data that can provide valuable insights into the underlying mechanisms of TKI therapy. Previous modeling efforts have focused on CML patients who show biphasic and triphasic exponential declines in BCR-ABL ratio during therapy. However, our patient data indicates that many patients treated with TKIs show fluctuations in BCR-ABL ratio yet are able to achieve durable remissions. To investigate these fluctuations, we construct a mathematical model that integrates CML with a patient's autologous immune response to the disease. In our model, we define an immune window, which is an intermediate range of leukemic concentrations that lead to an effective immune response against CML. While small leukemic concentrations provide insufficient stimulus, large leukemic concentrations actively suppress a patient's immune system, thus limiting it's ability to respond. Our patient data and modeling results suggest that at diagnosis, a patient's high leukemic concentration is able to suppress their immune system. TKI therapy drives the leukemic population into the immune window, allowing the patient's immune cells to expand and eventually mount an efficient response against the residual CML. This response drives the leukemic population below the immune window, causing the immune population to contract and allowing the leukemia to partially recover. The leukemia eventually reenters the immune window, thus stimulating a sequence of weaker immune responses as the two populations approach equilibrium. We hypothesize that a patient's autologous immune response to CML may explain the fluctuations in BCR-ABL ratio that are regularly seen during TKI therapy. These fluctuations may serve as a signature of a patient's individual immune response to CML. By applying our modeling framework to patient data, we are able to construct an immune profile that can then be used to propose patient-specific combination therapies aimed at further reducing a patient's leukemic burden. Our characterization of a patient's anti-leukemia immune response may be especially valuable in the study of drug resistance, treatment cessation, and combination therapy.
Resumo:
We present a detailed analysis of the application of a multi-scale Hierarchical Reconstruction method for solving a family of ill-posed linear inverse problems. When the observations on the unknown quantity of interest and the observation operators are known, these inverse problems are concerned with the recovery of the unknown from its observations. Although the observation operators we consider are linear, they are inevitably ill-posed in various ways. We recall in this context the classical Tikhonov regularization method with a stabilizing function which targets the specific ill-posedness from the observation operators and preserves desired features of the unknown. Having studied the mechanism of the Tikhonov regularization, we propose a multi-scale generalization to the Tikhonov regularization method, so-called the Hierarchical Reconstruction (HR) method. First introduction of the HR method can be traced back to the Hierarchical Decomposition method in Image Processing. The HR method successively extracts information from the previous hierarchical residual to the current hierarchical term at a finer hierarchical scale. As the sum of all the hierarchical terms, the hierarchical sum from the HR method provides an reasonable approximate solution to the unknown, when the observation matrix satisfies certain conditions with specific stabilizing functions. When compared to the Tikhonov regularization method on solving the same inverse problems, the HR method is shown to be able to decrease the total number of iterations, reduce the approximation error, and offer self control of the approximation distance between the hierarchical sum and the unknown, thanks to using a ladder of finitely many hierarchical scales. We report numerical experiments supporting our claims on these advantages the HR method has over the Tikhonov regularization method.
Resumo:
Theories of sparse signal representation, wherein a signal is decomposed as the sum of a small number of constituent elements, play increasing roles in both mathematical signal processing and neuroscience. This happens despite the differences between signal models in the two domains. After reviewing preliminary material on sparse signal models, I use work on compressed sensing for the electron tomography of biological structures as a target for exploring the efficacy of sparse signal reconstruction in a challenging application domain. My research in this area addresses a topic of keen interest to the biological microscopy community, and has resulted in the development of tomographic reconstruction software which is competitive with the state of the art in its field. Moving from the linear signal domain into the nonlinear dynamics of neural encoding, I explain the sparse coding hypothesis in neuroscience and its relationship with olfaction in locusts. I implement a numerical ODE model of the activity of neural populations responsible for sparse odor coding in locusts as part of a project involving offset spiking in the Kenyon cells. I also explain the validation procedures we have devised to help assess the model's similarity to the biology. The thesis concludes with the development of a new, simplified model of locust olfactory network activity, which seeks with some success to explain statistical properties of the sparse coding processes carried out in the network.
Resumo:
This dissertation concerns the well-posedness of the Navier-Stokes-Smoluchowski system. The system models a mixture of fluid and particles in the so-called bubbling regime. The compressible Navier-Stokes equations governing the evolution of the fluid are coupled to the Smoluchowski equation for the particle density at a continuum level. First, working on fixed domains, the existence of weak solutions is established using a three-level approximation scheme and based largely on the Lions-Feireisl theory of compressible fluids. The system is then posed over a moving domain. By utilizing a Brinkman-type penalization as well as penalization of the viscosity, the existence of weak solutions of the Navier-Stokes-Smoluchowski system is proved over moving domains. As a corollary the convergence of the Brinkman penalization is proved. Finally, a suitable relative entropy is defined. This relative entropy is used to establish a weak-strong uniqueness result for the Navier-Stokes-Smoluchowski system over moving domains, ensuring that strong solutions are unique in the class of weak solutions.
Resumo:
The dissertation is devoted to the study of problems in calculus of variation, free boundary problems and gradient flows with respect to the Wasserstein metric. More concretely, we consider the problem of characterizing the regularity of minimizers to a certain interaction energy. Minimizers of the interaction energy have a somewhat surprising relationship with solutions to obstacle problems. Here we prove and exploit this relationship to obtain novel regularity results. Another problem we tackle is describing the asymptotic behavior of the Cahn-Hilliard equation with degenerate mobility. By framing the Cahn-Hilliard equation with degenerate mobility as a gradient flow in Wasserstein metric, in one space dimension, we prove its convergence to a degenerate parabolic equation under the framework recently developed by Sandier-Serfaty.
Resumo:
This dissertation investigates the connection between spectral analysis and frame theory. When considering the spectral properties of a frame, we present a few novel results relating to the spectral decomposition. We first show that scalable frames have the property that the inner product of the scaling coefficients and the eigenvectors must equal the inverse eigenvalues. From this, we prove a similar result when an approximate scaling is obtained. We then focus on the optimization problems inherent to the scalable frames by first showing that there is an equivalence between scaling a frame and optimization problems with a non-restrictive objective function. Various objective functions are considered, and an analysis of the solution type is presented. For linear objectives, we can encourage sparse scalings, and with barrier objective functions, we force dense solutions. We further consider frames in high dimensions, and derive various solution techniques. From here, we restrict ourselves to various frame classes, to add more specificity to the results. Using frames generated from distributions allows for the placement of probabilistic bounds on scalability. For discrete distributions (Bernoulli and Rademacher), we bound the probability of encountering an ONB, and for continuous symmetric distributions (Uniform and Gaussian), we show that symmetry is retained in the transformed domain. We also prove several hyperplane-separation results. With the theory developed, we discuss graph applications of the scalability framework. We make a connection with graph conditioning, and show the in-feasibility of the problem in the general case. After a modification, we show that any complete graph can be conditioned. We then present a modification of standard PCA (robust PCA) developed by Cand\`es, and give some background into Electron Energy-Loss Spectroscopy (EELS). We design a novel scheme for the processing of EELS through robust PCA and least-squares regression, and test this scheme on biological samples. Finally, we take the idea of robust PCA and apply the technique of kernel PCA to perform robust manifold learning. We derive the problem and present an algorithm for its solution. There is also discussion of the differences with RPCA that make theoretical guarantees difficult.
Resumo:
This work is devoted to creating an abstract framework for the study of certain spectral properties of parabolic systems. Specifically, we determine under which general conditions to expect the presence of absolutely continuous spectral measures. We use these general conditions to derive results for spectral properties of time-changes of unipotent flows on homogeneous spaces of semisimple groups regarding absolutely continuous spectrum as well as maximal spectral type; the time-changes of the horocycle flow are special cases of this general category of flows. In addition we use the general conditions to derive spectral results for twisted horocycle flows and to rederive spectral results for skew products over translations and Furstenberg transformations.