Biblioteca Digital

904 resultados para 291605 Processor Architectures

Modelling and verifying smell-free architectures with the Archery language

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Architectural (bad) smells are design decisions found in software architectures that degrade the ability of systems to evolve. This paper presents an approach to verify that a software architecture is smellfree using the Archery architectural description language. The language provides a core for modelling software architectures and an extension for specifying constraints. The approach consists in precisely specifying architectural smells as constraints, and then verifying that software architectures do not satisfy any of them. The constraint language is based on a propositional modal logic with recursion that includes: a converse operator for relations among architectural concepts, graded modalities for describing the cardinality in such relations, and nominals referencing architectural elements. Four architectural smells illustrate the approach.

Aceleración de Algoritmos en Procesadores Multicore, GPGPU y FPGA. Algorithm Acceleration with Multicore, GPGPU and FPGA Processors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El avance en la potencia de cómputo en nuestros días viene dado por la paralelización del procesamiento, dadas las características que disponen las nuevas arquitecturas de hardware. Utilizar convenientemente este hardware impacta en la aceleración de los algoritmos en ejecución (programas). Sin embargo, convertir de forma adecuada el algoritmo en su forma paralela es complejo, y a su vez, esta forma, es específica para cada tipo de hardware paralelo. En la actualidad los procesadores de uso general más comunes son los multicore, procesadores paralelos, también denominados Symmetric Multi-Processors (SMP). Hoy en día es difícil hallar un procesador para computadoras de escritorio que no tengan algún tipo de paralelismo del caracterizado por los SMP, siendo la tendencia de desarrollo, que cada día nos encontremos con procesadores con mayor numero de cores disponibles. Por otro lado, los dispositivos de procesamiento de video (Graphics Processor Units - GPU), a su vez, han ido desarrollando su potencia de cómputo por medio de disponer de múltiples unidades de procesamiento dentro de su composición electrónica, a tal punto que en la actualidad no es difícil encontrar placas de GPU con capacidad de 200 a 400 hilos de procesamiento paralelo. Estos procesadores son muy veloces y específicos para la tarea que fueron desarrollados, principalmente el procesamiento de video. Sin embargo, como este tipo de procesadores tiene muchos puntos en común con el procesamiento científico, estos dispositivos han ido reorientándose con el nombre de General Processing Graphics Processor Unit (GPGPU). A diferencia de los procesadores SMP señalados anteriormente, las GPGPU no son de propósito general y tienen sus complicaciones para uso general debido al límite en la cantidad de memoria que cada placa puede disponer y al tipo de procesamiento paralelo que debe realizar para poder ser productiva su utilización. Los dispositivos de lógica programable, FPGA, son dispositivos capaces de realizar grandes cantidades de operaciones en paralelo, por lo que pueden ser usados para la implementación de algoritmos específicos, aprovechando el paralelismo que estas ofrecen. Su inconveniente viene derivado de la complejidad para la programación y el testing del algoritmo instanciado en el dispositivo. Ante esta diversidad de procesadores paralelos, el objetivo de nuestro trabajo está enfocado en analizar las características especificas que cada uno de estos tienen, y su impacto en la estructura de los algoritmos para que su utilización pueda obtener rendimientos de procesamiento acordes al número de recursos utilizados y combinarlos de forma tal que su complementación sea benéfica. Específicamente, partiendo desde las características del hardware, determinar las propiedades que el algoritmo paralelo debe tener para poder ser acelerado. Las características de los algoritmos paralelos determinará a su vez cuál de estos nuevos tipos de hardware son los mas adecuados para su instanciación. En particular serán tenidos en cuenta el nivel de dependencia de datos, la necesidad de realizar sincronizaciones durante el procesamiento paralelo, el tamaño de datos a procesar y la complejidad de la programación paralela en cada tipo de hardware. Today´s advances in high-performance computing are driven by parallel processing capabilities of available hardware architectures. These architectures enable the acceleration of algorithms when thes ealgorithms are properly parallelized and exploit the specific processing power of the underneath architecture. Most current processors are targeted for general pruposes and integrate several processor cores on a single chip, resulting in what is known as a Symmetric Multiprocessing (SMP) unit. Nowadays even desktop computers make use of multicore processors. Meanwhile, the industry trend is to increase the number of integrated rocessor cores as technology matures. On the other hand, Graphics Processor Units (GPU), originally designed to handle only video processing, have emerged as interesting alternatives to implement algorithm acceleration. Current available GPUs are able to implement from 200 to 400 threads for parallel processing. Scientific computing can be implemented in these hardware thanks to the programability of new GPUs that have been denoted as General Processing Graphics Processor Units (GPGPU).However, GPGPU offer little memory with respect to that available for general-prupose processors; thus, the implementation of algorithms need to be addressed carefully. Finally, Field Programmable Gate Arrays (FPGA) are programmable devices which can implement hardware logic with low latency, high parallelism and deep pipelines. Thes devices can be used to implement specific algorithms that need to run at very high speeds. However, their programmability is harder that software approaches and debugging is typically time-consuming. In this context where several alternatives for speeding up algorithms are available, our work aims at determining the main features of thes architectures and developing the required know-how to accelerate algorithm execution on them. We look at identifying those algorithms that may fit better on a given architecture as well as compleme

Implementation of Multi-Core Processor Based on PLASMA (Most MIPS I) IP Core

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multi-core processors is a design philosophy that has become mainstream in scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices has permitted complex logic systems to be implemented on a single programmable device. By using VHDL here we present an implementation of one multi-core processor by using the PLASMA IP core based on the (most) MIPS I ISA and give an overview of the processor architecture and share theexecution results.

Software for Explicitly Parallel Memory-Centric Processor Architecture

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in computer memory technology justify research towards new and different views on computer organization. This paper proposes a novel memory-centric computing architecture with the goal to merge memory and processing elements in order to provide better conditions for parallelization and performance. The paper introduces the architectural concepts and afterwards shows the design and implementation of a corresponding assembler and simulator.

Efficient query processing in co-processor-accelerated databases

Relevância:

20.00% 20.00%

Publicador:

Resumo:

...Diese Dissertation zeigt, wie wir Datenbankmanagementsysteme bauen können, die heterogene Prozessoren effizient und zuverlässig zur Beschleunigung der Anfrageverarbeitung nutzen können. Daher untersuchen wir typische Entwurfsentscheidungen von coprozessorbeschleunigten Datenbankmanagementsystemen und leiten darauf aufbauend eine generische Architektur für solche Systeme ab. Unsere Untersuchungen zeigen, dass eines der wichtigsten Probleme für solche Datenbankmanagementsysteme die Entscheidung ist, welche Operatoren einer Anfrage auf welchem Prozessor ausgeführt werden sollen...

Addressing Sea Surface Salinity retrieval at different spatial resolution. Feasibility study using OCCAM data within SEPS simulator and L2 processor tools

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Projecte de recerca elaborat a partir d’una estada a la National Oceanography Centre of Southampton (NOCS), Gran Bretanya, entre maig i juliol del 2006. La possibilitat d’obtenir una estimació precissa de la salinitat marina (SSS) és important per a investigar i predir l’extensió del fenòmen del canvi climàtic. La missió Soil Moisture and Ocean Salinity (SMOS) va ser seleccionada per l’Agència Espacial Europea (ESA) per a obtenir mapes de salinitat de la superfície marina a escala global i amb un temps de revisita petit. Abans del llençament de SMOS es preveu l’anàlisi de la variabilitat horitzontal de la SSS i del potencial de les dades recuperades a partir de mesures de SMOS per a reproduir comportaments oceanogràfics coneguts. L’objectiu de tot plegat és emplenar el buit existent entre les fonts de dades d’entrada/auxiliars fiables i les eines desenvolupades per a simular i processar les dades adquirides segons la configuració de SMOS. El SMOS End-to-end Performance Simulator (SEPS) és un simulador adhoc desenvolupat per la Universitat Politècnica de Catalunya (UPC) per a generar dades segons la configuració de SMOS. Es va utilitzar dades d’entrada a SEPS procedents del projecte Ocean Circulation and Climate Advanced Modeling (OCCAM), utilitzat al NOCS, a diferents resolucions espacials. Modificant SEPS per a poder fer servir com a entrada les dades OCCAM es van obtenir dades de temperatura de brillantor simulades durant un mes amb diferents observacions ascendents que cobrien la zona seleccionada. Les tasques realitzades durant l’estada a NOCS tenien la finalitat de proporcionar una tècnica fiable per a realitzar la calibració externa i per tant cancel•lar el bias, una metodologia per a promitjar temporalment les diferents adquisicions durant les observacions ascendents, i determinar la millor configuració de la funció de cost abans d’explotar i investigar les posibiltats de les dades SEPS/OCCAM per a derivar la SSS recuperada amb patrons d’alta resolució.

Simulations of parasitic and intrinsic capacitances related to Multiple-Gate-Field-Effect Transistor architectures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Report for the scientific sojourn carried out at the Université Catholique de Louvain, Belgium, from March until June 2007. In the first part, the impact of important geometrical parameters such as source and drain thickness, fin spacing, spacer width, etc. on the parasitic fringing capacitance component of multiple-gate field-effect transistors (MuGFET) is deeply analyzed using finite element simulations. Several architectures such as single gate, FinFETs (double gate), triple-gate represented by Pi-gate MOSFETs are simulated and compared in terms of channel and fringing capacitances for the same occupied die area. Simulations highlight the great impact of diminishing the spacing between fins for MuGFETs and the trade-off between the reduction of parasitic source and drain resistances and the increase of fringing capacitances when Selective Epitaxial Growth (SEG) technology is introduced. The impact of these technological solutions on the transistor cut-off frequencies is also discussed. The second part deals with the study of the effect of the volume inversion (VI) on the capacitances of undoped Double-Gate (DG) MOSFETs. For that purpose, we present simulation results for the capacitances of undoped DG MOSFETs using an explicit and analytical compact model. It monstrates that the transition from volume inversion regime to dual gate behaviour is well simulated. The model shows an accurate dependence on the silicon layer thickness,consistent withtwo dimensional numerical simulations, for both thin and thick silicon films. Whereas the current drive and transconductance are enhanced in volume inversion regime, our results show thatintrinsic capacitances present higher values as well, which may limit the high speed (delay time) behaviour of DG MOSFETs under volume inversion regime.

Benchmarks i mètriques d'avaluació de plataformes encastades multimèdia

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aquest projecte compara les possibilitats per a aplicacions multimèdia d'algunes de les arquitectures de processador que podem trobar en sistemes encastats. Per fer-ho s'ha seleccionat una sèrie de benchmarks que inclouen una mostra d'aplicacions multimèdia, així com un conjunt de benchmarks que ens permet mesurar aspectes d'un sistema operatiu GNU/Linux. També s'ha determinat quines haurien de ser les principals mètriques a considerar en el context dels sistemes encastats.

Análisis de prestaciones de procesadores multi-core y multi-thread

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Los procesadores multi-core y el multi-threading por hardware permiten aumentar el rendimiento de las aplicaciones. Por un lado, los procesadores multi-core combinan 2 o más procesadores en un mismo chip. Por otro lado, el multi-threading por hardware es una técnica que incrementa la utilización de los recursos del procesador. Este trabajo presenta un análisis de rendimiento de los resultados obtenidos en dos aplicaciones, multiplicación de matrices densas y transformada rápida de Fourier. Ambas aplicaciones se han ejecutado en arquitecturas multi-core que explotan el paralelismo a nivel de thread pero con un modelo de multi-threading diferente. Los resultados obtenidos muestran la importancia de entender y saber analizar el efecto del multi-core y multi-threading en el rendimiento.

Gestión de recursos en nodos multi-core de memoria compartida

Relevância:

20.00% 20.00%

Publicador:

Resumo:

La gestión de recursos en los procesadores multi-core ha ganado importancia con la evolución de las aplicaciones y arquitecturas. Pero esta gestión es muy compleja. Por ejemplo, una misma aplicación paralela ejecutada múltiples veces con los mismos datos de entrada, en un único nodo multi-core, puede tener tiempos de ejecución muy variables. Hay múltiples factores hardware y software que afectan al rendimiento. La forma en que los recursos hardware (cómputo y memoria) se asignan a los procesos o threads, posiblemente de varias aplicaciones que compiten entre sí, es fundamental para determinar este rendimiento. La diferencia entre hacer la asignación de recursos sin conocer la verdadera necesidad de la aplicación, frente a asignación con una meta específica es cada vez mayor. La mejor manera de realizar esta asignación és automáticamente, con una mínima intervención del programador. Es importante destacar, que la forma en que la aplicación se ejecuta en una arquitectura no necesariamente es la más adecuada, y esta situación puede mejorarse a través de la gestión adecuada de los recursos disponibles. Una apropiada gestión de recursos puede ofrecer ventajas tanto al desarrollador de las aplicaciones, como al entorno informático donde ésta se ejecuta, permitiendo un mayor número de aplicaciones en ejecución con la misma cantidad de recursos. Así mismo, esta gestión de recursos no requeriría introducir cambios a la aplicación, o a su estrategia operativa. A fin de proponer políticas para la gestión de los recursos, se analizó el comportamiento de aplicaciones intensivas de cómputo e intensivas de memoria. Este análisis se llevó a cabo a través del estudio de los parámetros de ubicación entre los cores, la necesidad de usar la memoria compartida, el tamaño de la carga de entrada, la distribución de los datos dentro del procesador y la granularidad de trabajo. Nuestro objetivo es identificar cómo estos parámetros influyen en la eficiencia de la ejecución, identificar cuellos de botella y proponer posibles mejoras. Otra propuesta es adaptar las estrategias ya utilizadas por el Scheduler con el fin de obtener mejores resultados.

LittleProc 2.0. : l'evolució als 16 bits, amb arquitectura superescalar i amb pipeline

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aquest projecte consisteix en evolucionar el LittleProc 1.0, un processador simple dissenyat per ser destinat al món de la docència per tres professors de la UAB. Aquestes evolucions consisteixen en aplicar diversos mètodes i arquitectures diferents per tal d’obtenir un millor rendiment del processador, arribant a executar programes amb la meitat de temps que tardava el LittleProc 1.0. Un cop implementades les diferents arquitectures per tal de millorar el rendiment, es realitzarà un estudi de quin tant per cent de millora ha sigut aquest rendiment.

Microfluidic processor allows rapid HER2 immunohistochemistry of breast carcinomas and significantly reduces ambiguous (2+) read-outs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biomarker analysis is playing an essential role in cancer diagnosis, prognosis, and prediction. Quantitative assessment of immunohistochemical biomarker expression on tumor tissues is of clinical relevance when deciding targeted treatments for cancer patients. Here, we report a microfluidic tissue processor that permits accurate quantification of the expression of biomarkers on tissue sections, enabled by the ultra-rapid and uniform fluidic exchange of the device. An important clinical biomarker for invasive breast cancer is human epidermal growth factor receptor 2 [(HER2), also known as neu], a transmembrane tyrosine kinase that connotes adverse prognostic information for the patients concerned and serves as a target for personalized treatment using the humanized antibody trastuzumab. Unfortunately, when using state-of-the-art methods, the intensity of an immunohistochemical signal is not proportional to the extent of biomarker expression, causing ambiguous outcomes. Using our device, we performed tests on 76 invasive breast carcinoma cases expressing various levels of HER2. We eliminated more than 90% of the ambiguous results (n = 27), correctly assigning cases to the amplification status as assessed by in situ hybridization controls, whereas the concordance for HER2-negative (n = 31) and -positive (n = 18) cases was 100%. Our results demonstrate the clinical potential of microfluidics for accurate biomarker expression analysis. We anticipate our technique will be a diagnostic tool that will provide better and more reliable data, onto which future treatment regimes can be based.

Probabilistically analysable real-time systems (PROARTIS)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Critical real-time ebedded (CRTE) Systems require safe and tight worst-case execution time (WCET) estimations to provide required safety levels and keep costs low. However, CRTE Systems require increasing performance to satisfy performance needs of existing and new features. Such performance can be only achieved by means of more agressive hardware architectures, which are much harder to analyze from a WCET perspective. The main features considered include cache memòries and multi-core processors.Thus, althoug such features provide higher performance, corrent WCET analysis methods are unable to provide tight WCET estimations. In fact, WCET estimations become worse than for simple rand less powerful hardware. The main reason is the fact that hardware behavior is deterministic but unknown and, therefore, the worst-case behavior must be assumed most of the time, leading to large WCET estimations. The purpose of this project is developing new hardware designs together with WCET analysis tools able to provide tight and safe WCET estimations. In order to do so, those pieces of hardware whose behavior is not easily analyzable due to lack of accurate information during WCET analysis will be enhanced to produce a probabilistically analyzable behavior. Thus, even if the worst-case behavior cannot be removed, its probabilty can be bounded, and hence, a safe and tight WCET can be provided for a particular safety level in line with the safety levels of the remaining components of the system. During the first year the project we have developed molt of the evaluation infraestructure as well as the techniques hardware techniques to analyze cache memories. During the second year those techniques have been evaluated, and new purely-softwar techniques have been developed.

Capacity-approaching rate function for layered multiantenna architectures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The simultaneous use of multiple transmit and receive antennas can unleash very large capacity increases in rich multipath environments. Although such capacities can be approached by layered multi-antenna architectures with per-antenna rate control, the need for short-term feedback arises as a potential impediment, in particular as the number of antennas—and thus the number of rates to be controlled—increases. What we show, however, is that the need for short-term feedback in fact vanishes as the number of antennas and/or the diversity order increases. Specifically, the rate supported by each transmit antenna becomes deterministic and a sole function of the signal-to-noise, the ratio of transmit and receive antennas, and the decoding order, all of which are either fixed or slowly varying. More generally, we illustrate -through this specific derivation— the relevance of some established random CDMA results to the single-user multi-antenna problem.

Per-antenna rate and power control for MIMO layered architectures in the low- and high-power regimes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a MIMO layered architecture, several codewordsare transmitted from a multiplicity of antennas. Although thespectral efficiency is maximized if the rates of these codewordsare separately controlled, the feedback rate within the linkadaptation loop is reduced if they are constrained to be identical.This poses a direct tradeoff between performance andfeedback overhead. This paper provides analytical expressionsthat quantify the difference in spectral efficiency between bothapproaches for arbitrary numbers of antennas. Specifically, thecharacterization takes place in the realm of the low- and highpowerregimes via expansions that are shown to have a widerange of validity.In addition, the possibility of adjusting the transmit powerof each codeword individually is considered as an alternative tothe separate control of their rates. Power allocation, however,turns out to be inferior to rate control within the context of thisproblem.

«
1
2
3
4
5
6
7
8
...
60
61
»