974 resultados para Software-reconfigurable array processing architectures


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Developing a theoretical framework for pervasive information environments is an enormous goal. This paper aims to provide a small step towards such a goal. The following pages report on our initial investigations to devise a framework that will continue to support locative, experiential and evaluative data from ‘user feedback’ in an increasingly pervasive information environment. We loosely attempt to outline this framework by developing a methodology capable of moving from rapid-deployment of software and hardware technologies, towards a goal of realistic immersive experience of pervasive information. We propose various technical solutions and address a range of problems such as; information capture through a novel model of sensing, processing, visualization and cognition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is now clear that the concept of a HPC compiler which automatically produces highly efficient parallel implementations is a pipe-dream. Another route is to recognise from the outset that user information is required and to develop tools that embed user interaction in the transformation of code from scalar to parallel form, and then use conventional compilers with a set of communication calls. This represents the key idea underlying the development of the CAPTools software environment. The initial version of CAPTools is focused upon single block structured mesh computational mechanics codes. The capability for unstructured mesh codes is under test now and block structured meshes will be included next. The parallelisation process can be completed rapidly for modest codes and the parallel performance approaches that which is delivered by hand parallelisations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of software. The ability to carry out reverse-engineering of software is of great importance within the security and forensics elds, particularly when investigating malicious software or carrying out forensic analysis following a successful security breach. Due to the complexity of the Nvidia CUDA (Compute Uni ed Device Architecture) framework, it is not clear how best to approach the reverse engineering of a piece of CUDA software. We carry out a review of the di erent binary output formats which may be encountered from the CUDA compiler, and their implications on reverse engineering. We then demonstrate the process of carrying out disassembly of an example CUDA application, to establish the various techniques available to forensic investigators carrying out black-box disassembly and reverse engineering of CUDA binaries. We show that the Nvidia compiler, using default settings, leaks useful information. Finally, we demonstrate techniques to better protect intellectual property in CUDA algorithm implementations from reverse engineering.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BRITTO, Ricardo S.; MEDEIROS, Adelardo A. D.; ALSINA, Pablo J. Uma arquitetura distribuída de hardware e software para controle de um robô móvel autônomo. In: SIMPÓSIO BRASILEIRO DE AUTOMAÇÃO INTELIGENTE,8., 2007, Florianópolis. Anais... Florianópolis: SBAI, 2007.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern software application testing, such as the testing of software driven by graphical user interfaces (GUIs) or leveraging event-driven architectures in general, requires paying careful attention to context. Model-based testing (MBT) approaches first acquire a model of an application, then use the model to construct test cases covering relevant contexts. A major shortcoming of state-of-the-art automated model-based testing is that many test cases proposed by the model are not actually executable. These \textit{infeasible} test cases threaten the integrity of the entire model-based suite, and any coverage of contexts the suite aims to provide. In this research, I develop and evaluate a novel approach for classifying the feasibility of test cases. I identify a set of pertinent features for the classifier, and develop novel methods for extracting these features from the outputs of MBT tools. I use a supervised logistic regression approach to obtain a model of test case feasibility from a randomly selected training suite of test cases. I evaluate this approach with a set of experiments. The outcomes of this investigation are as follows: I confirm that infeasibility is prevalent in MBT, even for test suites designed to cover a relatively small number of unique contexts. I confirm that the frequency of infeasibility varies widely across applications. I develop and train a binary classifier for feasibility with average overall error, false positive, and false negative rates under 5\%. I find that unique event IDs are key features of the feasibility classifier, while model-specific event types are not. I construct three types of features from the event IDs associated with test cases, and evaluate the relative effectiveness of each within the classifier. To support this study, I also develop a number of tools and infrastructure components for scalable execution of automated jobs, which use state-of-the-art container and continuous integration technologies to enable parallel test execution and the persistence of all experimental artifacts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in FPGA technology and higher processing capabilities requirements have pushed to the emerge of All Programmable Systems-on-Chip, which incorporate a hard designed processing system and a programmable logic that enable the development of specialized computer systems for a wide range of practical applications, including data and signal processing, high performance computing, embedded systems, among many others. To give place to an infrastructure that is capable of using the benefits of such a reconfigurable system, the main goal of the thesis is to implement an infrastructure composed of hardware, software and network resources, that incorporates the necessary services for the operation, management and interface of peripherals, that coompose the basic building blocks for the execution of applications. The project will be developed using a chip from the Zynq-7000 All Programmable Systems-on-Chip family.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The philosophy of minimalism in robotics promotes gaining an understanding of sensing and computational requirements for solving a task. This minimalist approach lies in contrast to the common practice of first taking an existing sensory motor system, and only afterwards determining how to apply the robotic system to the task. While it may seem convenient to simply apply existing hardware systems to the task at hand, this design philosophy often proves to be wasteful in terms of energy consumption and cost, along with unnecessary complexity and decreased reliability. While impressive in terms of their versatility, complex robots such as the PR2 (which cost hundreds of thousands of dollars) are impractical for many common applications. Instead, if a specific task is required, sensing and computational requirements can be determined specific to that task, and a clever hardware implementation can be built to accomplish the task. Since this minimalist hardware would be designed around accomplishing the specified task, significant reductions in hardware complexity can be obtained. This can lead to huge advantages in battery life, cost, and reliability. Even if cost is of no concern, battery life is often a limiting factor in many applications. Thus, a minimalist hardware system is critical in achieving the system requirements. In this thesis, we will discuss an implementation of a counting, tracking, and actuation system as it relates to ergodic bodies to illustrate a minimalist design methodology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents the achievements and scientific work conducted using a previously designed and fabricated 64 x 64-pixel ion camera with the use of a 0.35 μm CMOS technology. We used an array of Ion Sensitive Field Effect Transistors (ISFETs) to monitor and measure chemical and biochemical reactions in real time. The area of our observation was a 4.2 x 4.3 mm silicon chip while the actual ISFET array covered an area of 715.8 x 715.8 μm consisting of 4096 ISFET pixels in total with a 1 μm separation space among them. The ion sensitive layer, the locus where all reactions took place was a silicon nitride layer, the final top layer of the austriamicrosystems 0.35 μm CMOS technology used. Our final measurements presented an average sensitivity of 30 mV/pH. With the addition of extra layers we were able to monitor a 65 mV voltage difference during our experiments with glucose and hexokinase, whereas a difference of 85 mV was detected for a similar glucose reaction mentioned in literature, and a 55 mV voltage difference while performing photosynthesis experiments with a biofilm made from cyanobacteria, whereas a voltage difference of 33.7 mV was detected as presented in literature for a similar cyanobacterial species using voltamemtric methods for detection. To monitor our experiments PXIe-6358 measurement cards were used and measurements were controlled by LabVIEW software. The chip was packaged and encapsulated using a PGA-100 chip carrier and a two-component commercial epoxy. Printed circuit board (PCB) has also been previously designed to provide interface between the chip and the measurement cards.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reconfigurable platforms are a promising technology that offers an interesting trade-off between flexibility and performance, which many recent embedded system applications demand, especially in fields such as multimedia processing. These applications typically involve multiple ad-hoc tasks for hardware acceleration, which are usually represented using formalisms such as Data Flow Diagrams (DFDs), Data Flow Graphs (DFGs), Control and Data Flow Graphs (CDFGs) or Petri Nets. However, none of these models is able to capture at the same time the pipeline behavior between tasks (that therefore can coexist in order to minimize the application execution time), their communication patterns, and their data dependencies. This paper proves that the knowledge of all this information can be effectively exploited to reduce the resource requirements and the timing performance of modern reconfigurable systems, where a set of hardware accelerators is used to support the computation. For this purpose, this paper proposes a novel task representation model, named Temporal Constrained Data Flow Diagram (TCDFD), which includes all this information. This paper also presents a mapping-scheduling algorithm that is able to take advantage of the new TCDFD model. It aims at minimizing the dynamic reconfiguration overhead while meeting the communication requirements among the tasks. Experimental results show that the presented approach achieves up to 75% of resources saving and up to 89% of reconfiguration overhead reduction with respect to other state-of-the-art techniques for reconfigurable platforms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A cintigrafia óssea de corpo inteiro representa um dos exames imagiológicos mais frequentes realizados em medicina nuclear. Para além de outras aplicações, este procedimento é capaz de fornecer o diagnóstico de metástases ósseas. Em doentes oncológicos, a presença de metástases ósseas representa um forte indicador prognóstico da longevidade do doente. Para além disso, a presença ou ausência de metástases ósseas irá influenciar o planeamento do tratamento, requerendo para isso uma interpretação precisa dos resultados imagiológicos. Problema: Tendo em conta que a metastização óssea é considerada uma complicação severa relacionada com aumento da morbilidade e diminuição de sobrevivência dos doentes, o conceito de patient care torna-se ainda mais imperativo nestas situações. Assim, devem ser implementadas as melhores práticas imagiológicas de forma a obter o melhor resultado possível do procedimento efetuado, associado ao desconforto mínimo do doente. Uma técnica provável para atingir este objetivo no caso específico da cintigrafia óssea de corpo inteiro é a redução do tempo de aquisição, contudo, as imagens obtidas por si só teriam qualidade de tal forma reduzida que os resultados poderiam ser enviesados. Atualmente, surgiram novas técnicas, nomeadamente relativas a processamento de imagem, através das quais é possível gerar imagens cintigráficas com contagem reduzida de qualidade comparável àquela obtida com o protocolo considerado como standard. Ainda assim, alguns desses métodos continuam associados a algumas incertezas, particularmente no que respeita a sustentação da confiança diagnóstica após a modificação dos protocolos de rotina. Objetivos: O presente trabalho pretende avaliar a performance do algoritmo Pixon para processamento de imagem por meio de um estudo com fantoma. O objetivo será comparar a qualidade de imagem e a detetabilidade fornecidas por imagens não processadas com aquelas submetidas à referida técnica de processamento. Para além disso, pretende-se também avaliar o efeito deste algoritmo na redução do tempo de aquisição. De forma a atingir este objetivo, irá ser feita uma comparação entre as imagens obtidas com o protocolo standard e aquelas adquiridas usando protocolos mais rápidos, posteriormente submetidas ao método de processamento referido. Material e Métodos: Esta investigação for realizada no departamento de Radiologia e Medicina Nuclear do Radboud University Nijmegen Medical Centre, situado na Holanda. Foi utilizado um fantoma cilíndrico contendo um conjunto de seis esferas de diferentes tamanhos, adequado à técnica de imagem planar. O fantoma foi preparado com diferentes rácios de atividade entre as esferas e o background (4:1, 8:1, 17:1, 22:1, 32:1 e 71:1). Posteriormente, para cada teste experimental, o fantoma foi submetido a vários protocolos de aquisição de imagem, nomeadamente com diferentes velocidades de aquisição: 8 cm/min, 12 cm/min, 16 cm/min e 20 cm/min. Todas as imagens foram adquiridas na mesma câmara gama - e.cam Signature Dual Detector System (Siemens Medical Solutions USA, Inc.) - utilizando os mesmos parâmetros técnicos de aquisição, à exceção da velocidade. Foram adquiridas 24 imagens, todas elas submetidas a pós-processamento com recurso a um software da Siemens (Siemens Medical Solutions USA, Inc.) que inclui a ferramenta necessária ao processamento de imagens cintigráficas de corpo inteiro. Os parâmetros de reconstrução utilizados foram os mesmos para cada série de imagens, estando estabelecidos em modo automático. A análise da informação recolhida foi realizada com recurso a uma avaliação objetiva (utilizando parâmetros físicos de qualidade de imagem) e outra subjetiva (através de dois observadores). A análise estatística foi efetuada recorrendo ao software SPSS versão 22 para Windows. Resultados: Através da análise subjetiva de cada rácio de atividade foi demonstrado que, no geral, a detetabilidade das esferas aumentou após as imagens serem processadas. A concordância entre observadores para a distribuição desta análise foi substancial, tanto para imagens não processadas como imagens processadas. Foi igualmente demonstrado que os parâmetros físicos de qualidade de imagem progrediram depois de o algoritmo de processamento ter sido aplicado. Para além disso, observou-se ao comparar as imagens standard (adquiridas com 8 cm/min) e aquelas processadas e adquiridas com protocolos mais rápidos que: imagens adquiridas com uma velocidade de aquisição de 12 cm/min podem fornecer resultados melhorados, com parâmetros de qualidade de imagem e detetabilidade superiores; imagens adquiridas com uma velocidade de 16 cm/min fornecem resultados comparáveis aos standard, com valores aproximados de qualidade de imagem e detetabilidade; e imagens adquiridas com uma velocidade de 20 cm/min resultam em valores diminuídos de qualidade de imagem, bem como redução a nível da detetabilidade. Discussão: Os resultados obtidos foram igualmente estabelecidos por meio de um estudo clínico numa investigação independente, no mesmo departamento. Foram incluídos cinquenta e um doentes referidos com carcinomas da mama e da próstata, com o objetivo de estudar o impacto desta técnica na prática clínica. Os doentes foram, assim, submetidos ao protocolo standard e posteriormente a uma aquisição adicional com uma velocidade de aquisição de 16 cm/min. Depois de as imagens terem sido cegamente avaliadas por três médicos especialistas, concluiu-se que a qualidade de imagem bem como a detetabilidade entre imagens era comparável, corroborando os resultados desta investigação. Conclusão: Com o objetivo de reduzir o tempo de aquisição aplicando um algoritmo de processamento de imagem, foi demonstrado que o protocolo com 16 cm/min de velocidade de aquisição será o limite para o aumento dessa mesma velocidade. Após processar a informação, este protocolo fornece os resultados mais equivalentes àqueles obtidos com o protocolo standard. Tendo em conta que esta técnica foi estabelecida com sucesso na prática clínica, pode-se concluir que, pelo menos em doentes referidos com carcinomas da mama e da próstata, o tempo de aquisição pode ser reduzido para metade, duplicando a velocidade de aquisição de 8 para 16 cm/min.