611 resultados para SCIL processor


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A virtual system that emulates an ARM-based processor machine has been created to replace a traditional hardware-based system for teaching assembly language. The proposed virtual system integrates, in a single environment, all the development tools necessary to deliver introductory or advanced courses on modern assembly language programming. The virtual system runs a Linux operating system in either a graphical or console mode on a Windows or Linux host machine. No software licenses or extra hardware are required to use the virtual system, thus students are free to carry their own ARM emulator with them on a USB memory stick. Institutions adopting this, or a similar virtual system, can also benefit by reducing capital investment in hardware-based development kits and enable distance learning courses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A large body of psycholinguistic research has revealed that during sentence interpretation adults coordinate multiple sources of information. Particularly, they draw both on linguistic properties of the message and on information from the context to constrain their interpretations. Relatively little however is known about how this integrative processor develops through language acquisition and about how children process language. In this study, two on-line picture verification tasks were used to examine how 1st, 2nd and 4th/5th grade monolingual Greek children resolve pronoun ambiguities during sentence interpretation and how their performance compares to that of adults on the same tasks. Specifically, we manipulated the type of subject pronoun, i.e. null or overt, and examined how this affected participants’ preferences for competing antecedents, i.e. in the subject or object position. The results revealed both similarities and differences in how adults and the various child groups comprehended ambiguous pronominal forms. Particularly, although adults and children alike showed sensitivity to the distribution of overt and null subject pronouns, this did not always lead to convergent interpretation preferences.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Wireless Sensor Networks (WSNs) have been an exciting topic in recent years. The services offered by a WSN can be classified into three major categories: monitoring, alerting, and information on demand. WSNs have been used for a variety of applications related to the environment (agriculture, water and forest fire detection), the military, buildings, health (elderly people and home monitoring), disaster relief, and area or industrial monitoring. In most WSNs tasks like processing the sensed data, making decisions and generating emergency messages are carried out by a remote server, hence the need for efficient means of transferring data across the network. Because of the range of applications and types of WSN there is a need for different kinds of MAC and routing protocols in order to guarantee delivery of data from the source nodes to the server (or sink). In order to minimize energy consumption and increase performance in areas such as reliability of data delivery, extensive research has been conducted and documented in the literature on designing energy efficient protocols for each individual layer. The most common way to conserve energy in WSNs involves using the MAC layer to put the transceiver and the processor of the sensor node into a low power, sleep state when they are not being used. Hence the energy wasted due to collisions, overhearing and idle listening is reduced. As a result of this strategy for saving energy, the routing protocols need new solutions that take into account the sleep state of some nodes, and which also enable the lifetime of the entire network to be increased by distributing energy usage between nodes over time. This could mean that a combined MAC and routing protocol could significantly improve WSNs because the interaction between the MAC and network layers lets nodes be active at the same time in order to deal with data transmission. In the research presented in this thesis, a cross-layer protocol based on MAC and routing protocols was designed in order to improve the capability of WSNs for a range of different applications. Simulation results, based on a range of realistic scenarios, show that these new protocols improve WSNs by reducing their energy consumption as well as enabling them to support mobile nodes, where necessary. A number of conference and journal papers have been published to disseminate these results for a range of applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a new technique for obtaining model fittings to very long baseline interferometric images of astrophysical jets. The method minimizes a performance function proportional to the sum of the squared difference between the model and observed images. The model image is constructed by summing N(s) elliptical Gaussian sources characterized by six parameters: two-dimensional peak position, peak intensity, eccentricity, amplitude, and orientation angle of the major axis. We present results for the fitting of two main benchmark jets: the first constructed from three individual Gaussian sources, the second formed by five Gaussian sources. Both jets were analyzed by our cross-entropy technique in finite and infinite signal-to-noise regimes, the background noise chosen to mimic that found in interferometric radio maps. Those images were constructed to simulate most of the conditions encountered in interferometric images of active galactic nuclei. We show that the cross-entropy technique is capable of recovering the parameters of the sources with a similar accuracy to that obtained from the very traditional Astronomical Image Processing System Package task IMFIT when the image is relatively simple (e. g., few components). For more complex interferometric maps, our method displays superior performance in recovering the parameters of the jet components. Our methodology is also able to show quantitatively the number of individual components present in an image. An additional application of the cross-entropy technique to a real image of a BL Lac object is shown and discussed. Our results indicate that our cross-entropy model-fitting technique must be used in situations involving the analysis of complex emission regions having more than three sources, even though it is substantially slower than current model-fitting tasks (at least 10,000 times slower for a single processor, depending on the number of sources to be optimized). As in the case of any model fitting performed in the image plane, caution is required in analyzing images constructed from a poorly sampled (u, v) plane.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a parallel hardware architecture for image feature detection based on the Scale Invariant Feature Transform algorithm and applied to the Simultaneous Localization And Mapping problem. The work also proposes specific hardware optimizations considered fundamental to embed such a robotic control system on-a-chip. The proposed architecture is completely stand-alone; it reads the input data directly from a CMOS image sensor and provides the results via a field-programmable gate array coupled to an embedded processor. The results may either be used directly in an on-chip application or accessed through an Ethernet connection. The system is able to detect features up to 30 frames per second (320 x 240 pixels) and has accuracy similar to a PC-based implementation. The achieved system performance is at least one order of magnitude better than a PC-based solution, a result achieved by investigating the impact of several hardware-orientated optimizations oil performance, area and accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In 2006 the Route load balancing algorithm was proposed and compared to other techniques aiming at optimizing the process allocation in grid environments. This algorithm schedules tasks of parallel applications considering computer neighborhoods (where the distance is defined by the network latency). Route presents good results for large environments, although there are cases where neighbors do not have an enough computational capacity nor communication system capable of serving the application. In those situations the Route migrates tasks until they stabilize in a grid area with enough resources. This migration may take long time what reduces the overall performance. In order to improve such stabilization time, this paper proposes RouteGA (Route with Genetic Algorithm support) which considers historical information on parallel application behavior and also the computer capacities and load to optimize the scheduling. This information is extracted by using monitors and summarized in a knowledge base used to quantify the occupation of tasks. Afterwards, such information is used to parameterize a genetic algorithm responsible for optimizing the task allocation. Results confirm that RouteGA outperforms the load balancing carried out by the original Route, which had previously outperformed others scheduling algorithms from literature.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the use of a multiprocessor architecture for the performance improvement of tomographic image reconstruction. Image reconstruction in computed tomography (CT) is an intensive task for single-processor systems. We investigate the filtered image reconstruction suitability based on DSPs organized for parallel processing and its comparison with the Message Passing Interface (MPI) library. The experimental results show that the speedups observed for both platforms were increased in the same direction of the image resolution. In addition, the execution time to communication time ratios (Rt/Rc) as a function of the sample size have shown a narrow variation for the DSP platform in comparison with the MPI platform, which indicates its better performance for parallel image reconstruction.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to achieve the high performance, we need to have an efficient scheduling of a parallelprogram onto the processors in multiprocessor systems that minimizes the entire executiontime. This problem of multiprocessor scheduling can be stated as finding a schedule for ageneral task graph to be executed on a multiprocessor system so that the schedule length can be minimize [10]. This scheduling problem is known to be NP- Hard.In multi processor task scheduling, we have a number of CPU’s on which a number of tasksare to be scheduled that the program’s execution time is minimized. According to [10], thetasks scheduling problem is a key factor for a parallel multiprocessor system to gain betterperformance. A task can be partitioned into a group of subtasks and represented as a DAG(Directed Acyclic Graph), so the problem can be stated as finding a schedule for a DAG to beexecuted in a parallel multiprocessor system so that the schedule can be minimized. Thishelps to reduce processing time and increase processor utilization. The aim of this thesis workis to check and compare the results obtained by Bee Colony algorithm with already generatedbest known results in multi processor task scheduling domain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The movement of graphics and audio programming towards three dimensions is to better simulate the way we experience our world. In this project I looked to use methods for coming closer to such simulation via realistic graphics and sound combined with a natural interface. I did most of my work on a Dell OptiPlex with an 800 MHz Pentium III processor and an NVIDlA GeForce 256 AGP Plus graphics accelerator -high end products in the consumer market as of April 2000. For graphics, I used OpenGL [1], an open·source, multi-platform set of graphics libraries that is relatively easy to use, coded in C . The basic engine I first put together was a system to place objects in a scene and to navigate around the scene in real time. Once I accomplished this, I was able to investigate specific techniques for making parts of a scene more appealing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Audio coding is used to compress digital audio signals, thereby reducing the amount of bits needed to transmit or to store an audio signal. This is useful when network bandwidth or storage capacity is very limited. Audio compression algorithms are based on an encoding and decoding process. In the encoding step, the uncompressed audio signal is transformed into a coded representation, thereby compressing the audio signal. Thereafter, the coded audio signal eventually needs to be restored (e.g. for playing back) through decoding of the coded audio signal. The decoder receives the bitstream and reconverts it into an uncompressed signal. ISO-MPEG is a standard for high-quality, low bit-rate video and audio coding. The audio part of the standard is composed by algorithms for high-quality low-bit-rate audio coding, i.e. algorithms that reduce the original bit-rate, while guaranteeing high quality of the audio signal. The audio coding algorithms consists of MPEG-1 (with three different layers), MPEG-2, MPEG-2 AAC, and MPEG-4. This work presents a study of the MPEG-4 AAC audio coding algorithm. Besides, it presents the implementation of the AAC algorithm on different platforms, and comparisons among implementations. The implementations are in C language, in Assembly of Intel Pentium, in C-language using DSP processor, and in HDL. Since each implementation has its own application niche, each one is valid as a final solution. Moreover, another purpose of this work is the comparison among these implementations, considering estimated costs, execution time, and advantages and disadvantages of each one.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Na era de sistemas embarcados complexos, a interface direta de dispositivos e sistemas integrados para o mundo real demanda o uso de sensores e seus circuitos analógicos de suporte. Desde que a maioria das características físicas de um sensor requer algum tipo de calibração, este trabalho compara e discute quatro técnicas digitais de calibração adaptadas para aplicação em sistemas embarcados. Para fins de comparação, estes métodos de calibração foram implementados em Matlab5.3, e em um DSP (Digital Signal Processor) . Através das medidas realizadas durante a operação em regime do DSP, pode-se determinar parâmetros importantes de projeto, como potência dissipada e tempo de processamento. Outros critérios de comparação, como área consumida, tempo de processamento, facilidade de automação e taxa de crescimento do custo área e do custo velocidade com o aumento de resolução também foram analisados. Os resultados das implementações são apresentados e discutidos com o objetivo de descobrir qual o melhor método de calibração para aplicações em sistemas embarcados.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O Setor de Telecomunicações representa um dos segmentos maiS relevantes do sistema social brasileiro. Surgiu, se desenvolveu e alcançou patamares altamente significativos tanto no Brasil como no concerto das nações marcado, sempre, pelo desafio e pela necessidade dos homens de se comunicarem e de se fazerem ouvir a distâncias cada vez maiores. Aos dias de hoje, no momento em que no Brasil o setor vem consolidando o desenho de mais uma etapa da sua evolução, calcado na competição entre organizações, como resultado da recente quebra do monopólio estatal em que se fundamentou durante um largo período de sua existência, cabe pesquisar e buscar compreender o processo de seu desenvolvimento e identificar as diversas mudanças de tendências que se verificaram ao longo desta traj etória. o Contexto Histórico do Processo de Institucionalização das Telecomunicações no Brasil é o tema da presente dissertação que, diferentemente de estudos de Institucionalização relacionados ao desenvolvimento de Organizações individualizadas, trata de um setor como um todo, da dinâmica de suas variáveis internas e do seu relacionamento com a sociedade, beneficiária dos serviços por ele prestados. Neste contexto, aplicam-se pressupostos relacionados a uma visão sistêmica onde o setor se posiciona como um processador que recebe insumos do ambiente externo e devolve resultados em forma de produtos ou serviços. Das inúmeras classificações que vêm servindo para definir o conceito de sistema, este trabalho se concentrará na caracterização do setor de telecomunicações brasileiro como um sistema social, complexo e aberto. Para isso foram estudadas diferentes etapas do seu processo de desenvolvimento institucional, identificando momentos de sua evolução no contexto da interação, política, econômica, social e tecnológica com o conjunto da nação. o objetivo da presente dissertação é descrever este processo de desenvolvimento setorial mediante a utilização de subsídios teóricos e da análise comparativa, subjetiva (empírica), com o desenvolvimento do País. A escolha do tema decorreu da identificação da necessidade de se evoluir de estudos existentes e que refletem meramente relatos históricos para uma abordagem mais ampla, integrada e relacionada, entre outras, às responsabilidades do setor perante o meio ambiente onde ele se insere.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O objetivo deste trabalho é caracterizar o padrão de concorrência no mercado de leite fluido (longa vida e pasteurizado) na cidade de São Paulo a partir de evidências sobre os movimentos de preços no varejo e do comportamento das margens de mercado. Utilizou-se o modelo originalmente proposto por Houck (1977) acrescido das observações feitas por Carman e Sexton (2005). Essa abordagem separa as variáveis explicativas entre aumentos e diminuições de preços pagos ao produtor. Além de maior clareza na sua estrutura, essa construção permite comparar a defasagem entre esses dois movimentos e estudar a estratégia de preços dos agentes a partir das margens dos intermediários. O período analisado foi de dezembro de 1999 à dezembro de 2005, com dados de preços ao consumidor da FIPE e dados de preços ao produtor da CEPEA/ USP. Identificou-se que o padrão de concorrência do leite longa vida é bastante diverso do encontrado para o leite pasteurizado. Enquanto para o longa vida o padrão de concorrência é mais próximo do modelo competitivo, para o leite pasteurizado o padrão encontrado foi de pouca concorrência. Para compreender essas diferenças, foi discutido o aspecto locacional do varejo e a importância do mercado relevante geográfico. Os resultados permitem algumas inferências para análises setoriais e de políticas públicas voltadas à produção leiteira. O vertiginoso crescimento das vendas de leite longa vida, absorvendo grande parte do mercado antes abastecido pelo leite pasteurizado, trouxe maior concorrência nos segmentos de indústria e distribuição, assim como maior velocidade de transmissão de preços ao longo da cadeia produtiva. Entretanto, a precificação com markups com percentual fixo, observada no leite longa vida, indica que indústria e distribuição gozam de algum poder de mercado e que variações de custo da matéria-prima são repassadas mais que proporcionalmente, em termos absolutos, ao consumidor final.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this thesis, we present a novel approach to combine both reuse and prediction of dynamic sequences of instructions called Reuse through Speculation on Traces (RST). Our technique allows the dynamic identification of instruction traces that are redundant or predictable, and the reuse (speculative or not) of these traces. RST addresses the issue, present on Dynamic Trace Memoization (DTM), of traces not being reused because some of their inputs are not ready for the reuse test. These traces were measured to be 69% of all reusable traces in previous studies. One of the main advantages of RST over just combining a value prediction technique with an unrelated reuse technique is that RST does not require extra tables to store the values to be predicted. Applying reuse and value prediction in unrelated mechanisms but at the same time may require a prohibitive amount of storage in tables. In RST, the values are already stored in the Trace Memoization Table, and there is no extra cost in reading them if compared with a non-speculative trace reuse technique. . The input context of each trace (the input values of all instructions in the trace) already stores the values for the reuse test, which may also be used for prediction. Our main contributions include: (i) a speculative trace reuse framework that can be adapted to different processor architectures; (ii) specification of the modifications in a superscalar, superpipelined processor in order to implement our mechanism; (iii) study of implementation issues related to this architecture; (iv) study of the performance limits of our technique; (v) a performance study of a realistic, constrained implementation of RST; and (vi) simulation tools that can be used in other studies which represent a superscalar, superpipelined processor in detail. In a constrained architecture with realistic confidence, our RST technique is able to achieve average speedups (harmonic means) of 1.29 over the baseline architecture without reuse and 1.09 over a non-speculative trace reuse technique (DTM).