962 resultados para ubiquitous multi-core framework
Resumo:
Recently a new recipe for developing and deploying real-time systems has become increasingly adopted in the JET tokamak. Powered by the advent of x86 multi-core technology and the reliability of the JET’s well established Real-Time Data Network (RTDN) to handle all real-time I/O, an official Linux vanilla kernel has been demonstrated to be able to provide realtime performance to user-space applications that are required to meet stringent timing constraints. In particular, a careful rearrangement of the Interrupt ReQuests’ (IRQs) affinities together with the kernel’s CPU isolation mechanism allows to obtain either soft or hard real-time behavior depending on the synchronization mechanism adopted. Finally, the Multithreaded Application Real-Time executor (MARTe) framework is used for building applications particularly optimised for exploring multicore architectures. In the past year, four new systems based on this philosophy have been installed and are now part of the JET’s routine operation. The focus of the present work is on the configuration and interconnection of the ingredients that enable these new systems’ real-time capability and on the impact that JET’s distributed real-time architecture has on system engineering requirements, such as algorithm testing and plant commissioning. Details are given about the common real-time configuration and development path of these systems, followed by a brief description of each system together with results regarding their real-time performance. A cycle time jitter analysis of a user-space MARTe based application synchronising over a network is also presented. The goal is to compare its deterministic performance while running on a vanilla and on a Messaging Real time Grid (MRG) Linux kernel.
Resumo:
We present a framework for the analysis of the decoding delay in multiview video coding (MVC). We show that in real-time applications, an accurate estimation of the decoding delay is essential to achieve a minimum communication latency. As opposed to single-view codecs, the complexity of the multiview prediction structure and the parallel decoding of several views requires a systematic analysis of this decoding delay, which we solve using graph theory and a model of the decoder hardware architecture. Our framework assumes a decoder implementation in general purpose multi-core processors with multi-threading capabilities. For this hardware model, we show that frame processing times depend on the computational load of the decoder and we provide an iterative algorithm to compute jointly frame processing times and decoding delay. Finally, we show that decoding delay analysis can be applied to design decoders with the objective of minimizing the communication latency of the MVC system.
Resumo:
Los tipos de datos concurrentes son implementaciones concurrentes de las abstracciones de datos clásicas, con la diferencia de que han sido específicamente diseñados para aprovechar el gran paralelismo disponible en las modernas arquitecturas multiprocesador y multinúcleo. La correcta manipulación de los tipos de datos concurrentes resulta esencial para demostrar la completa corrección de los sistemas de software que los utilizan. Una de las mayores dificultades a la hora de diseñar y verificar tipos de datos concurrentes surge de la necesidad de tener que razonar acerca de un número arbitrario de procesos que invocan estos tipos de datos de manera concurrente. Esto requiere considerar sistemas parametrizados. En este trabajo estudiamos la verificación formal de propiedades temporales de sistemas concurrentes parametrizados, poniendo especial énfasis en programas que manipulan estructuras de datos concurrentes. La principal dificultad a la hora de razonar acerca de sistemas concurrentes parametrizados proviene de la interacción entre el gran nivel de concurrencia que éstos poseen y la necesidad de razonar al mismo tiempo acerca de la memoria dinámica. La verificación de sistemas parametrizados resulta en sí un problema desafiante debido a que requiere razonar acerca de estructuras de datos complejas que son accedidas y modificadas por un numero ilimitado de procesos que manipulan de manera simultánea el contenido de la memoria dinámica empleando métodos de sincronización poco estructurados. En este trabajo, presentamos un marco formal basado en métodos deductivos capaz de ocuparse de la verificación de propiedades de safety y liveness de sistemas concurrentes parametrizados que manejan estructuras de datos complejas. Nuestro marco formal incluye reglas de prueba y técnicas especialmente adaptadas para sistemas parametrizados, las cuales trabajan en colaboración con procedimientos de decisión especialmente diseñados para analizar complejas estructuras de datos concurrentes. Un aspecto novedoso de nuestro marco formal es que efectúa una clara diferenciación entre el análisis del flujo de control del programa y el análisis de los datos que se manejan. El flujo de control del programa se analiza utilizando reglas de prueba y técnicas de verificación deductivas especialmente diseñadas para lidiar con sistemas parametrizados. Comenzando a partir de un programa concurrente y la especificación de una propiedad temporal, nuestras técnicas deductivas son capaces de generar un conjunto finito de condiciones de verificación cuya validez implican la satisfacción de dicha especificación temporal por parte de cualquier sistema, sin importar el número de procesos que formen parte del sistema. Las condiciones de verificación generadas se corresponden con los datos manipulados. Estudiamos el diseño de procedimientos de decisión especializados capaces de lidiar con estas condiciones de verificación de manera completamente automática. Investigamos teorías decidibles capaces de describir propiedades de tipos de datos complejos que manipulan punteros, tales como implementaciones imperativas de pilas, colas, listas y skiplists. Para cada una de estas teorías presentamos un procedimiento de decisión y una implementación práctica construida sobre SMT solvers. Estos procedimientos de decisión son finalmente utilizados para verificar de manera automática las condiciones de verificación generadas por nuestras técnicas de verificación parametrizada. Para concluir, demostramos como utilizando nuestro marco formal es posible probar no solo propiedades de safety sino además de liveness en algunas versiones de protocolos de exclusión mutua y programas que manipulan estructuras de datos concurrentes. El enfoque que presentamos en este trabajo resulta ser muy general y puede ser aplicado para verificar un amplio rango de tipos de datos concurrentes similares. Abstract Concurrent data types are concurrent implementations of classical data abstractions, specifically designed to exploit the great deal of parallelism available in modern multiprocessor and multi-core architectures. The correct manipulation of concurrent data types is essential for the overall correctness of the software system built using them. A major difficulty in designing and verifying concurrent data types arises by the need to reason about any number of threads invoking the data type simultaneously, which requires considering parametrized systems. In this work we study the formal verification of temporal properties of parametrized concurrent systems, with a special focus on programs that manipulate concurrent data structures. The main difficulty to reason about concurrent parametrized systems comes from the combination of their inherently high concurrency and the manipulation of dynamic memory. This parametrized verification problem is very challenging, because it requires to reason about complex concurrent data structures being accessed and modified by threads which simultaneously manipulate the heap using unstructured synchronization methods. In this work, we present a formal framework based on deductive methods which is capable of dealing with the verification of safety and liveness properties of concurrent parametrized systems that manipulate complex data structures. Our framework includes special proof rules and techniques adapted for parametrized systems which work in collaboration with specialized decision procedures for complex data structures. A novel aspect of our framework is that it cleanly differentiates the analysis of the program control flow from the analysis of the data being manipulated. The program control flow is analyzed using deductive proof rules and verification techniques specifically designed for coping with parametrized systems. Starting from a concurrent program and a temporal specification, our techniques generate a finite collection of verification conditions whose validity entails the satisfaction of the temporal specification by any client system, in spite of the number of threads. The verification conditions correspond to the data manipulation. We study the design of specialized decision procedures to deal with these verification conditions fully automatically. We investigate decidable theories capable of describing rich properties of complex pointer based data types such as stacks, queues, lists and skiplists. For each of these theories we present a decision procedure, and its practical implementation on top of existing SMT solvers. These decision procedures are ultimately used for automatically verifying the verification conditions generated by our specialized parametrized verification techniques. Finally, we show how using our framework it is possible to prove not only safety but also liveness properties of concurrent versions of some mutual exclusion protocols and programs that manipulate concurrent data structures. The approach we present in this work is very general, and can be applied to verify a wide range of similar concurrent data types.
Resumo:
La tesis está focalizada en la resolución de problemas de optimización combinatoria, haciendo uso de las opciones tecnológicas actuales que ofrecen las tecnologías de la información y las comunicaciones, y la investigación operativa. Los problemas de optimización combinatoria se resuelven en general mediante programación lineal y metaheurísticas. La aplicación de las técnicas de resolución de los problemas de optimización combinatoria requiere de una elevada carga computacional, y los algoritmos deben diseñarse, por un lado pensando en la efectividad para encontrar buenas soluciones del problema, y por otro lado, pensando en un uso adecuado de los recursos informáticos disponibles. La programación lineal y las metaheurísticas son técnicas de resolución genéricas, que se pueden aplicar a diferentes problemas, partiendo de una base común que se particulariza para cada problema concreto. En el campo del desarrollo de software, los frameworks cumplen esa función de comenzar un proyecto con el trabajo general ya disponible, con la opción de cambiar o extender ese comportamiento base o genérico, para construir el sistema concreto, lo que permite reducir el tiempo de desarrollo, y amplía las posibilidades de éxito del proyecto. En esta tesis se han desarrollado dos frameworks de desarrollo. El framework ILP permite modelar y resolver problemas de programación lineal, de forma independiente al software de resolución de programación lineal que se utilice. El framework LME permite resolver problemas de optimización combinatoria mediante metaheurísticas. Tradicionalmente, las aplicaciones de resolución de problemas de optimización combinatoria son aplicaciones de escritorio que permiten gestionar toda la información de entrada del problema y resuelven el problema en local, con los recursos hardware disponibles. Recientemente ha aparecido un nuevo paradigma de despliegue y uso de aplicaciones que permite compartir recursos informáticos especializados por Internet. Esta nueva forma de uso de recursos informáticos es la computación en la nube, que presenta el modelo de software como servicio (SaaS). En esta tesis se ha construido una plataforma SaaS, para la resolución de problemas de optimización combinatoria, que se despliega sobre arquitecturas compuestas por procesadores multi-núcleo y tarjetas gráficas, y dispone de algoritmos de resolución basados en frameworks de programación lineal y metaheurísticas. Toda la infraestructura es independiente del problema de optimización combinatoria a resolver, y se han desarrollado tres problemas que están totalmente integrados en la plataforma SaaS. Estos problemas se han seleccionado por su importancia práctica. Uno de los problemas tratados en la tesis, es el problema de rutas de vehículos (VRP), que consiste en calcular las rutas de menor coste de una flota de vehículos, que reparte mercancías a todos los clientes. Se ha partido de la versión más clásica del problema y se han hecho estudios en dos direcciones. Por un lado se ha cuantificado el aumento en la velocidad de ejecución de la resolución del problema en tarjetas gráficas. Por otro lado, se ha estudiado el impacto en la velocidad de ejecución y en la calidad de soluciones, en la resolución por la metaheurística de colonias de hormigas (ACO), cuando se introduce la programación lineal para optimizar las rutas individuales de cada vehículo. Este problema se ha desarrollado con los frameworks ILP y LME, y está disponible en la plataforma SaaS. Otro de los problemas tratados en la tesis, es el problema de asignación de flotas (FAP), que consiste en crear las rutas de menor coste para la flota de vehículos de una empresa de transporte de viajeros. Se ha definido un nuevo modelo de problema, que engloba características de problemas presentados en la literatura, y añade nuevas características, lo que permite modelar los requerimientos de las empresas de transporte de viajeros actuales. Este nuevo modelo resuelve de forma integrada el problema de definir los horarios de los trayectos, el problema de asignación del tipo de vehículo, y el problema de crear las rotaciones de los vehículos. Se ha creado un modelo de programación lineal para el problema, y se ha resuelto por programación lineal y por colonias de hormigas (ACO). Este problema se ha desarrollado con los frameworks ILP y LME, y está disponible en la plataforma SaaS. El último problema tratado en la tesis es el problema de planificación táctica de personal (TWFP), que consiste en definir la configuración de una plantilla de trabajadores de menor coste, para cubrir una demanda de carga de trabajo variable. Se ha definido un modelo de problema muy flexible en la definición de contratos, que permite el uso del modelo en diversos sectores productivos. Se ha definido un modelo matemático de programación lineal para representar el problema. Se han definido una serie de casos de uso, que muestran la versatilidad del modelo de problema, y permiten simular el proceso de toma de decisiones de la configuración de una plantilla de trabajadores, cuantificando económicamente cada decisión que se toma. Este problema se ha desarrollado con el framework ILP, y está disponible en la plataforma SaaS. ABSTRACT The thesis is focused on solving combinatorial optimization problems, using current technology options offered by information technology and communications, and operations research. Combinatorial optimization problems are solved in general by linear programming and metaheuristics. The application of these techniques for solving combinatorial optimization problems requires a high computational load, and algorithms are designed, on the one hand thinking to find good solutions to the problem, and on the other hand, thinking about proper use of the available computing resources. Linear programming and metaheuristic are generic resolution techniques, which can be applied to different problems, beginning with a common base that is particularized for each specific problem. In the field of software development, frameworks fulfill this function that allows you to start a project with the overall work already available, with the option to change or extend the behavior or generic basis, to build the concrete system, thus reducing the time development, and expanding the possibilities of success of the project. In this thesis, two development frameworks have been designed and developed. The ILP framework allows to modeling and solving linear programming problems, regardless of the linear programming solver used. The LME framework is designed for solving combinatorial optimization problems using metaheuristics. Traditionally, applications for solving combinatorial optimization problems are desktop applications that allow the user to manage all the information input of the problem and solve the problem locally, using the available hardware resources. Recently, a new deployment paradigm has appeared, that lets to share hardware and software resources by the Internet. This new use of computer resources is cloud computing, which presents the model of software as a service (SaaS). In this thesis, a SaaS platform has been built for solving combinatorial optimization problems, which is deployed on architectures, composed of multi-core processors and graphics cards, and has algorithms based on metaheuristics and linear programming frameworks. The SaaS infrastructure is independent of the combinatorial optimization problem to solve, and three problems are fully integrated into the SaaS platform. These problems have been selected for their practical importance. One of the problems discussed in the thesis, is the vehicle routing problem (VRP), which goal is to calculate the least cost of a fleet of vehicles, which distributes goods to all customers. The VRP has been studied in two directions. On one hand, it has been quantified the increase in execution speed when the problem is solved on graphics cards. On the other hand, it has been studied the impact on execution speed and quality of solutions, when the problem is solved by ant colony optimization (ACO) metaheuristic, and linear programming is introduced to optimize the individual routes of each vehicle. This problem has been developed with the ILP and LME frameworks, and is available in the SaaS platform. Another problem addressed in the thesis, is the fleet assignment problem (FAP), which goal is to create lower cost routes for a fleet of a passenger transport company. It has been defined a new model of problem, which includes features of problems presented in the literature, and adds new features, allowing modeling the business requirements of today's transport companies. This new integrated model solves the problem of defining the flights timetable, the problem of assigning the type of vehicle, and the problem of creating aircraft rotations. The problem has been solved by linear programming and ACO. This problem has been developed with the ILP and LME frameworks, and is available in the SaaS platform. The last problem discussed in the thesis is the tactical planning staff problem (TWFP), which is to define the staff of lower cost, to cover a given work load. It has been defined a very rich problem model in the definition of contracts, allowing the use of the model in various productive sectors. It has been defined a linear programming mathematical model to represent the problem. Some use cases has been defined, to show the versatility of the model problem, and to simulate the decision making process of setting up a staff, economically quantifying every decision that is made. This problem has been developed with the ILP framework, and is available in the SaaS platform.
Resumo:
Devido às tendências de crescimento da quantidade de dados processados e a crescente necessidade por computação de alto desempenho, mudanças significativas estão acontecendo no projeto de arquiteturas de computadores. Com isso, tem-se migrado do paradigma sequencial para o paralelo, com centenas ou milhares de núcleos de processamento em um mesmo chip. Dentro desse contexto, o gerenciamento de energia torna-se cada vez mais importante, principalmente em sistemas embarcados, que geralmente são alimentados por baterias. De acordo com a Lei de Moore, o desempenho de um processador dobra a cada 18 meses, porém a capacidade das baterias dobra somente a cada 10 anos. Esta situação provoca uma enorme lacuna, que pode ser amenizada com a utilização de arquiteturas multi-cores heterogêneas. Um desafio fundamental que permanece em aberto para estas arquiteturas é realizar a integração entre desenvolvimento de código embarcado, escalonamento e hardware para gerenciamento de energia. O objetivo geral deste trabalho de doutorado é investigar técnicas para otimização da relação desempenho/consumo de energia em arquiteturas multi-cores heterogêneas single-ISA implementadas em FPGA. Nesse sentido, buscou-se por soluções que obtivessem o melhor desempenho possível a um consumo de energia ótimo. Isto foi feito por meio da combinação de mineração de dados para a análise de softwares baseados em threads aliadas às técnicas tradicionais para gerenciamento de energia, como way-shutdown dinâmico, e uma nova política de escalonamento heterogeneity-aware. Como principais contribuições pode-se citar a combinação de técnicas de gerenciamento de energia em diversos níveis como o nível do hardware, do escalonamento e da compilação; e uma política de escalonamento integrada com uma arquitetura multi-core heterogênea em relação ao tamanho da memória cache L1.
Resumo:
In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture.
Resumo:
Pro-cyclical fiscal tightening might be one reason for the anaemic economic recovery in Europe, raising questions about the effectiveness of the EU’s fiscal framework in achieving its two main objectives: public debt sustainability and fiscal stabilisation. • In theory, the current EU fiscal rules, with cyclically adjusted targets, flexibility clauses and the option to enter an excessive deficit procedure, allow for large-scale fiscal stabilisation during a recession. However, implementation of the rules is hindered by the badly-measured structural balance indicator and incorrect forecasts, leading to erroneous policy recommendations. The large number of flexibility clauses makes the system opaque. • The current inefficient European fiscal framework should be replaced with a system based on rules that are more conducive to the two objectives, more transparent, easier to implement and which have a higher potential to be complied with. • The best option, re-designing the fiscal framework from scratch, is currently unrealistic. Therefore we propose to eliminate the structural balance rules and to introduce a new public expenditure rule with debt-correction feedback, embodied in a multi-annual framework, which would also support the central bank’s inflation target. A European Fiscal Council could oversee the system.
Resumo:
L'obiettivo principale della politica di sicurezza alimentare è quello di garantire la salute dei consumatori attraverso regole e protocolli di sicurezza specifici. Al fine di rispondere ai requisiti di sicurezza alimentare e standardizzazione della qualità, nel 2002 il Parlamento Europeo e il Consiglio dell'UE (Regolamento (CE) 178/2002 (CE, 2002)), hanno cercato di uniformare concetti, principi e procedure in modo da fornire una base comune in materia di disciplina degli alimenti e mangimi provenienti da Stati membri a livello comunitario. La formalizzazione di regole e protocolli di standardizzazione dovrebbe però passare attraverso una più dettagliata e accurata comprensione ed armonizzazione delle proprietà globali (macroscopiche), pseudo-locali (mesoscopiche), ed eventualmente, locali (microscopiche) dei prodotti alimentari. L'obiettivo principale di questa tesi di dottorato è di illustrare come le tecniche computazionali possano rappresentare un valido supporto per l'analisi e ciò tramite (i) l’applicazione di protocolli e (ii) miglioramento delle tecniche ampiamente applicate. Una dimostrazione diretta delle potenzialità già offerte dagli approcci computazionali viene offerta nel primo lavoro in cui un virtual screening basato su docking è stato applicato al fine di valutare la preliminare xeno-androgenicità di alcuni contaminanti alimentari. Il secondo e terzo lavoro riguardano lo sviluppo e la convalida di nuovi descrittori chimico-fisici in un contesto 3D-QSAR. Denominata HyPhar (Hydrophobic Pharmacophore), la nuova metodologia così messa a punto è stata usata per esplorare il tema della selettività tra bersagli molecolari strutturalmente correlati e ha così dimostrato di possedere i necessari requisiti di applicabilità e adattabilità in un contesto alimentare. Nel complesso, i risultati ci permettono di essere fiduciosi nel potenziale impatto che le tecniche in silico potranno avere nella identificazione e chiarificazione di eventi molecolari implicati negli aspetti tossicologici e nutrizionali degli alimenti.
Resumo:
Various streams of organizational research have examined the relationship between creativity and leadership, albeit using slightly different names such as “creative leadership”, “leading for creativity and innovation”, and “managing creatives”. In this article, we review this dispersed body of knowledge and synthesize it under a global construct of creative leadership, which refers to leading others toward the attainment of a creative outcome. Under this unifying construct, we classify three more narrow conceptualizations that we observe in the literature: facilitating employee creativity; directing the materialization of a leader's creative vision; and integrating heterogeneous creative contributions. After examining the contextual characteristics associated with the three conceptualizations, we suggest that they represent three distinct collaborative contexts of creative leadership. We discuss the theoretical implications of a multi-context framework of creative leadership, especially in terms of resolving three persisting problems in the extant literature: lack of definitional clarity, shortage of nuanced theories, and low contextual sensitivity.
Resumo:
Ageing of the population is a worldwide phenomenon. Numerous ICT-based solutions have been developed for elderly care but mainly connected to the physiological and nursing aspects in services for the elderly. Social work is a profession that should pay attention to the comprehensive wellbeing and social needs of the elderly. Many people experience loneliness and depression in their old age, either as a result of living alone or due to a lack of close family ties and reduced connections with their culture of origin, which results in an inability to participate actively in community activities (Singh & Misra, 2009). Participation in society would enhance the quality of life. With the development of information technology, the use of technology in social work practice has risen dramatically. The aim of this literature review is to map out the state of the art of knowledge about the usage of ICT in elderly care and to figure out research-based knowledge about the usability of ICT for the prevention of loneliness and social isolation of elderly people. The data for the current research comes from the core collection of the Web of Science and the data searching was performed using Boolean? The searching resulted in 216 published English articles. After going through the topics and abstracts, 34 articles were selected for the data analysis that is based on a multi approach framework. The analysis of the research approach is categorized according to some aspects of using ICT by older adults from the adoption of ICT to the impact of usage, and the social services for them. This literature review focused on the function of communication by excluding the applications that mainly relate to physical nursing. The results show that the so-called ‘digital divide’ still exists, but the older adults have the willingness to learn and utilise ICT in daily life, especially for communication. The data shows that the usage of ICT can prevent the loneliness and social isolation of older adults, and they are eager for technical support in using ICT. The results of data analysis on theoretical frames and concepts show that this research field applies different theoretical frames from various scientific fields, while a social work approach is lacking. However, a synergic frame of applied theories will be suggested from the perspective of social work.
Resumo:
In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.
Resumo:
With security and surveillance, there is an increasing need to process image data efficiently and effectively either at source or in a large data network. Whilst a Field-Programmable Gate Array (FPGA) has been seen as a key technology for enabling this, the design process has been viewed as problematic in terms of the time and effort needed for implementation and verification. The work here proposes a different approach of using optimized FPGA-based soft-core processors which allows the user to exploit the task and data level parallelism to achieve the quality of dedicated FPGA implementations whilst reducing design time. The paper also reports some preliminary
progress on the design flow to program the structure. An implementation for a Histogram of Gradients algorithm is also reported which shows that a performance of 328 fps can be achieved with this design approach, whilst avoiding the long design time, verification and debugging steps associated with conventional FPGA implementations.
Resumo:
We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.
Resumo:
As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management. Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area. Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance. Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future.
Resumo:
La eliminación de barreras entre países es una consecuencia que llega con la globalización y con los acuerdos de TLC firmados en los últimos años. Esto implica un crecimiento significativo del comercio exterior, lo cual se ve reflejado en un aumento de la complejidad de la cadena de suministro de las empresas. Debido a lo anterior, se hace necesaria la búsqueda de alternativas para obtener altos niveles de productividad y competitividad dentro de las empresas en Colombia, ya que el entorno se ha vuelto cada vez más complejo, saturado de competencia no sólo nacional, sino también internacional. Para mantenerse en una posición competitiva favorable, las compañías deben enfocarse en las actividades que le agregan valor a su negocio, por lo cual una de las alternativas que se están adoptando hoy en día es la tercerización de funciones logísticas a empresas especializadas en el manejo de estos servicios. Tales empresas son los Proveedores de servicios logísticos (LSP), quienes actúan como agentes externos a la organización al gestionar, controlar y proporcionar actividades logísticas en nombre de un contratante. Las actividades realizadas pueden incluir todas o parte de las actividades logísticas, pero como mínimo la gestión y ejecución del transporte y almacenamiento deben estar incluidos (Berglund, 2000). El propósito del documento es analizar el papel de los Operadores Logísticos de Tercer nivel (3PL) como promotores del desempeño organizacional en las empresas colombianas, con el fin de informar a las MIPYMES acerca de los beneficios que se obtienen al trabajar con LSP como un medio para mejorar la posición competitiva del país.