Biblioteca Digital

38 resultados para efficient algorithm

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal

Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent integrated circuit technologies have opened the possibility to design parallel architectures with hundreds of cores on a single chip. The design space of these parallel architectures is huge with many architectural options. Exploring the design space gets even more difficult if, beyond performance and area, we also consider extra metrics like performance and area efficiency, where the designer tries to design the architecture with the best performance per chip area and the best sustainable performance. In this paper we present an algorithm-oriented approach to design a many-core architecture. Instead of doing the design space exploration of the many core architecture based on the experimental execution results of a particular benchmark of algorithms, our approach is to make a formal analysis of the algorithms considering the main architectural aspects and to determine how each particular architectural aspect is related to the performance of the architecture when running an algorithm or set of algorithms. The architectural aspects considered include the number of cores, the local memory available in each core, the communication bandwidth between the many-core architecture and the external memory and the memory hierarchy. To exemplify the approach we did a theoretical analysis of a dense matrix multiplication algorithm and determined an equation that relates the number of execution cycles with the architectural parameters. Based on this equation a many-core architecture has been designed. The results obtained indicate that a 100 mm(2) integrated circuit design of the proposed architecture, using a 65 nm technology, is able to achieve 464 GFLOPs (double precision floating-point) for a memory bandwidth of 16 GB/s. This corresponds to a performance efficiency of 71 %. Considering a 45 nm technology, a 100 mm(2) chip attains 833 GFLOPs which corresponds to 84 % of peak performance These figures are better than those obtained by previous many-core architectures, except for the area efficiency which is limited by the lower memory bandwidth considered. The results achieved are also better than those of previous state-of-the-art many-cores architectures designed specifically to achieve high performance for matrix multiplication.

Low Complexity Intra Mode Selection for Efficient Distributed Video Coding

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.

An Efficient Long Distance Echo Canceller

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes an implementation of a long distance echo canceller, operating on full-duplex with hands-free and in real-time with a single Digital Signal Processor (DSP). The proposed solution is based on short length adaptive filters centered on the positions of the most significant echoes, which are tracked by time delay estimators, for which we use a new approach. To deal with double talking situations a speech detector is employed. The floating-point DSP TMS320C6713 from Texas Instruments is used with software written in C++, with compiler optimizations for fast execution. The resulting algorithm enables long distance echo cancellation with low computational requirements, suited for embbeded systems. It reaches greater echo return loss enhancement and shows faster convergence speed when compared to the conventional approach. The experimental results approach the CCITT G.165 recommendation levels.

Coding mode decision algorithm for binary descriptor coding

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In visual sensor networks, local feature descriptors can be computed at the sensing nodes, which work collaboratively on the data obtained to make an efficient visual analysis. In fact, with a minimal amount of computational effort, the detection and extraction of local features, such as binary descriptors, can provide a reliable and compact image representation. In this paper, it is proposed to extract and code binary descriptors to meet the energy and bandwidth constraints at each sensing node. The major contribution is a binary descriptor coding technique that exploits the correlation using two different coding modes: Intra, which exploits the correlation between the elements that compose a descriptor; and Inter, which exploits the correlation between descriptors of the same image. The experimental results show bitrate savings up to 35% without any impact in the performance efficiency of the image retrieval task. © 2014 EURASIP.

Efficient feature selection filters for high-dimensional data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.

New developments on VCA unmixing algorithm

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hyperspectral sensors are being developed for remote sensing applications. These sensors produce huge data volumes which require faster processing and analysis tools. Vertex component analysis (VCA) has become a very useful tool to unmix hyperspectral data. It has been successfully used to determine endmembers and unmix large hyperspectral data sets without the use of any a priori knowledge of the constituent spectra. Compared with other geometric-based approaches VCA is an efficient method from the computational point of view. In this paper we introduce new developments for VCA: 1) a new signal subspace identification method (HySime) is applied to infer the signal subspace where the data set live. This step also infers the number of endmembers present in the data set; 2) after the projection of the data set onto the signal subspace, the algorithm iteratively projects the data set onto several directions orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to these extreme of the projections. The capability of VCA to unmix large hyperspectral scenes (real or simulated), with low computational complexity, is also illustrated.

A fast parallel hyperspectral coded aperture algorithm for compressive sensing using OpenCL

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

On the Suitability of Suffix Arrays for Lempel-Ziv Data Compression

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.

Tuning Iris Recognition for Noisy Images

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of iris recognition for human authentication has been spreading in the past years. Daugman has proposed a method for iris recognition, composed by four stages: segmentation, normalization, feature extraction, and matching. In this paper we propose some modifications and extensions to Daugman's method to cope with noisy images. These modifications are proposed after a study of images of CASIA and UBIRIS databases. The major modification is on the computationally demanding segmentation stage, for which we propose a faster and equally accurate template matching approach. The extensions on the algorithm address the important issue of pre-processing that depends on the image database, being mandatory when we have a non infra-red camera, like a typical WebCam. For this scenario, we propose methods for reflection removal and pupil enhancement and isolation. The tests, carried out by our C# application on grayscale CASIA and UBIRIS images show that the template matching segmentation method is more accurate and faster than the previous one, for noisy images. The proposed algorithms are found to be efficient and necessary when we deal with non infra-red images and non uniform illumination.

Conversor estático de potência tolerante a falhas

Relevância:

20.00% 20.00%

Publicador:

Resumo:

O trabalho apresentado nesta dissertação refere-se à concepção, projecto e realização experimental de um conversor estático de potência tolerante a falhas. Foram analisados trabalhos de investigação sobre modos de falha de conversores electrónicos de potência, topologias de conversores tolerantes a falhas, métodos de detecção de falhas, entre outros. Com vista à concepção de uma solução, foram nomeados e analisados os principais modos de falhas para três soluções propostas de conversores com topologias tolerantes a falhas onde existem elementos redundantes em modo de espera. Foram analisados os vários aspectos de natureza técnica dos circuitos de potência e guiamento de sinais onde se salientam a necessidade de tempos mortos entre os sinais de disparo de IGBT do mesmo ramo, o isolamento galvânico entre os vários andares de disparo, a necessidade de minimizar as auto-induções entre o condensador DC e os braços do conversor de potência. Com vista a melhorar a fiabilidade e segurança de funcionamento do conversor estático de potência tolerante a falhas, foi concebido um circuito electrónico permitindo a aceleração da actuação normal de contactores e outro circuito responsável pelo encaminhamento e inibição dos sinais de disparo. Para a aplicação do conversor estático de potência tolerante a falhas desenvolvido num accionamento com um motor de corrente contínua, foi implementado um algoritmo de controlo numa placa de processamento digital de sinais (DSP), sendo a supervisão e actuação do sistema realizados em tempo-real, para a detecção de falhas e actuação de contactores e controlo de corrente e velocidade do motor utilizando uma estratégia de comando PWM. Foram realizados ensaios que, mediante uma detecção adequada de falhas, realiza a comutação entre blocos de conversores de potência. São apresentados e discutidos resultados experimentais, obtidos usando o protótipo laboratorial.

Manutenção proactiva de sistemas AVAC com recurso aos sistemas inteligentes multiagente

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nos tempos actuais os equipamentos para Aquecimento Ventilação e Ar Condicionado (AVAC) ocupam um lugar de grande importância na concepção, desenvolvimento e manutenção de qualquer edifício por mais pequeno que este seja. Assim, surge a necessidade premente de racionalizar os consumos energéticos optimizando-os. A alta fiabilidade desejada nestes sistemas obriga-nos cada vez mais a descobrir formas de tornar a sua manutenção mais eficiente, pelo que é necessário prevenir de uma forma proactiva todas as falhas que possam prejudicar o bom desempenho destas instalações. Como tal, torna-se necessário detectar estas falhas/anomalias, sendo imprescíndivel que nos antecipemos a estes eventos prevendo o seu acontecimento num horizonte temporal pré-definido, permitindo actuar o mais cedo possível. É neste domínio que a presente dissertação tenta encontrar soluções para que a manutenção destes equipamentos aconteça de uma forma proactiva e o mais eficazmente possível. A ideia estruturante é a de tentar intervir ainda numa fase incipiente do problema, alterando o comportamento dos equipamentos monitorizados, de uma forma automática, com recursos a agentes inteligentes de diagnóstico de falhas. No caso em estudo tenta-se adaptar de forma automática o funcionamento de uma Unidade de Tratamento de Ar (UTA) aos desvios/anomalias detectadas, promovendo a paragem integral do sistema apenas como último recurso. A arquitectura aplicada baseia-se na utilização de técnicas de inteligência artificial, nomeadamente dos sistemas multiagente. O algoritmo utilizado e testado foi construído em Labview®, utilizando um kit de ferramentas de controlo inteligente para Labview®. O sistema proposto é validado através de um simulador com o qual se conseguem reproduzir as condições reais de funcionamento de uma UTA.

A checkpointing-enabled and resource-aware Java Virtual Machine for efficient and robust e-Science applications in grid environments

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Object-oriented programming languages presently are the dominant paradigm of application development (e. g., Java,. NET). Lately, increasingly more Java applications have long (or very long) execution times and manipulate large amounts of data/information, gaining relevance in fields related with e-Science (with Grid and Cloud computing). Significant examples include Chemistry, Computational Biology and Bio-informatics, with many available Java-based APIs (e. g., Neobio). Often, when the execution of such an application is terminated abruptly because of a failure (regardless of the cause being a hardware of software fault, lack of available resources, etc.), all of its work already performed is simply lost, and when the application is later re-initiated, it has to restart all its work from scratch, wasting resources and time, while also being prone to another failure and may delay its completion with no deadline guarantees. Our proposed solution to address these issues is through incorporating mechanisms for checkpointing and migration in a JVM. These make applications more robust and flexible by being able to move to other nodes, without any intervention from the programmer. This article provides a solution to Java applications with long execution times, by extending a JVM (Jikes research virtual machine) with such mechanisms. Copyright (C) 2011 John Wiley & Sons, Ltd.

Augmented LDPC Graph for Distributed Video Coding with Multiple Side Information

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The advances made in channel-capacity codes, such as turbo codes and low-density parity-check (LDPC) codes, have played a major role in the emerging distributed source coding paradigm. LDPC codes can be easily adapted to new source coding strategies due to their natural representation as bipartite graphs and the use of quasi-optimal decoding algorithms, such as belief propagation. This paper tackles a relevant scenario in distributedvideo coding: lossy source coding when multiple side information (SI) hypotheses are available at the decoder, each one correlated with the source according to different correlation noise channels. Thus, it is proposed to exploit multiple SI hypotheses through an efficient joint decoding technique withmultiple LDPC syndrome decoders that exchange information to obtain coding efficiency improvements. At the decoder side, the multiple SI hypotheses are created with motion compensated frame interpolation and fused together in a novel iterative LDPC based Slepian-Wolf decoding algorithm. With the creation of multiple SI hypotheses and the proposed decoding algorithm, bitrate savings up to 8.0% are obtained for similar decoded quality.

IOPT Petri Net State Space Generation Algorithm with Maximal-Step Execution Semantics

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an algorithm to efficiently generate the state-space of systems specified using the IOPT Petri-net modeling formalism. IOPT nets are a non-autonomous Petri-net class, based on Place-Transition nets with an extended set of features designed to allow the rapid prototyping and synthesis of system controllers through an existing hardware-software co-design framework. To obtain coherent and deterministic operation, IOPT nets use a maximal-step execution semantics where, in a single execution step, all enabled transitions will fire simultaneously. This fact increases the resulting state-space complexity and can cause an arc "explosion" effect. Real-world applications, with several million states, will reach a higher order of magnitude number of arcs, leading to the need for high performance state-space generator algorithms. The proposed algorithm applies a compilation approach to read a PNML file containing one IOPT model and automatically generate an optimized C program to calculate the corresponding state-space.

Comparação do cálculo de dose com os algoritmos Analytical Anisotropic Algorithm e Pencil Beam Convolution na patologia de pulmão

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mestrado em Radioterapia.

«
1
2
3
»