873 resultados para Audio-visual Speech Recognition, Visual Feature Extraction, Free-parts, Monolithic, ROI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Com o crescimento da informação disponível na Web, arquivos pessoais e profissionais, protagonizado tanto pelo aumento da capacidade de armazenamento de dados, como pelo aumento exponencial da capacidade de processamento dos computadores, e do fácil acesso a essa mesma informação, um enorme fluxo de produção e distribuição de conteúdos audiovisuais foi gerado. No entanto, e apesar de existirem mecanismos para a indexação desses conteúdos com o objectivo de permitir a pesquisa e acesso aos mesmos, estes apresentam normalmente uma grande complexidade algorítmica ou exigem a contratação de pessoal altamente qualificado, para a verificação e categorização dos conteúdos. Nesta dissertação pretende-se estudar soluções de anotação colaborativa de conteúdos e desenvolver uma ferramenta que facilite a anotação de um arquivo de conteúdos audiovisuais. A abordagem implementada é baseada no conceito dos “Jogos com Propósito” (GWAP – Game With a Purpose) e permite que os utilizadores criem tags (metadatos na forma de palavras-chave) de forma a atribuir um significado a um objecto a ser categorizado. Assim, e como primeiro objectivo, foi desenvolvido um jogo com o propósito não só de entretenimento, mas também que permita a criação de anotações audiovisuais perante os vídeos que são apresentados ao jogador e, que desta forma, se melhore a indexação e categorização dos mesmos. A aplicação desenvolvida permite ainda a visualização dos conteúdos e metadatos categorizados, e com o objectivo de criação de mais um elemento informativo, permite a inserção de um like num determinado instante de tempo do vídeo. A grande vantagem da aplicação desenvolvida reside no facto de adicionar anotações a pontos específicos do vídeo, mais concretamente aos seus instantes de tempo. Trata-se de uma funcionalidade nova, não disponível em outras aplicações de anotação colaborativa de conteúdos audiovisuais. Com isto, o acesso aos conteúdos será bastante mais eficaz pois será possível aceder, por pesquisa, a pontos específicos no interior de um vídeo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática, Área de Especialização em Tecnologias do Conhecimento e da Decisão

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech interfaces for Assistive Technologies are not common and are usually replaced by others. The market they are targeting is not considered attractive and speech technologies are still not well spread. Industry still thinks they present some performance risks, especially Speech Recognition systems. As speech is the most elemental and natural way for communication, it has strong potential for enhancing inclusion and quality of life for broader groups of users with special needs, such as people with cerebral palsy and elderly staying at their homes. This work is a position paper in which the authors argue for the need to make speech become the basic interface in assistive technologies. Among the main arguments, we can state: speech is the easiest way to interact with machines; there is a growing market for embedded speech in assistive technologies, since the number of disabled and elderly people is expanding; speech technology is already mature to be used but needs adaptation to people with special needs; there is still a lot of R&D to be done in this area, especially when thinking about the Portuguese market. The main challenges are presented and future directions are proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertation to Obtain Master Degree in Biomedical Engineering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O que significa hoje um pensamento do figural? Qual a sua importância no quadro hegemónico do Arquivo Audiovisual e Multimédia Contemporâneo? Qual a relação que o cinema, enquanto assimilável ao pensamento do figural, tem com o Arquivo, tendo em conta a recente apropriação, interpretação, reconfiguração e interrogação das estruturas arquivísticas e materiais de arquivo, por parte de cineastas que usam o cinema, ele próprio, uma forma de arquivagem e material de arquivo, como ferramenta privilegiada de interpelação do Arquivo? Qual a potencial relevância do cinema, encarado nesta perspectiva, face à necessidade política de conceber um exterior do Arquivo? Estas são algumas das perguntas que estão na origem desta tese e a que ela procura responder, através da proposta de (re)corte de um arquivo de filmes e enunciados teóricos e da sua interpelação mútua. Com efeito, a imagem ou ideia de figural supõe uma reconfiguração das relações entre visível e dizível que não só nos serve de topos inspirador da metodologia da tese, num esforço de procurar inscrever a espacialização exigida pelo pensamento do figural na sua própria estrutura, como, sobretudo, nos fornece o quadro teórico de partida para pensar hoje, na senda de autores como Jean- François Lyotard, Michel Foucault e Gilles Deleuze, a relevância simultaneamente epistemológica e política da assimilação de certos gestos cinematográficos contemporâneos a uma imagem do pensamento com estes contornos. Assim, o cinema, sobretudo na sua forma ensaística, é identificado com a possibilidade de pôr em prática, um pensamento do interstício figural, que contraria a identidade dominante do ver e do falar, que rege o paradigma contemporâneo da comunicação, assente na respectiva conversão mútua - as imagens reduzem-se à sua significação ou conteúdo e as palavras convertem-se em imagens legíveis. Através das possibilidades oferecidas pela montagem cinematográfica, trata-se, então, de reenviar as imagens a uma leitura que só elas podem dar, e as palavras a um novo tipo de escuta e entendimento, o que se traduz, em termos da relação do cinema ao Arquivo contemporâneo, na sugestão de que o cinema é uma ferramenta de requalificação do saber que aquele supõe. A nossa hipótese é, pois, a de que o cinema em geral, e certos filmes em particular, ao permitirem a perscrutação arqueológica do Arquivo, introduzem delay na nossa relação aos “documentos”, sendo que é aí, nesse intervalo entre o registo e a sua retoma, que se joga a possibilidade de resistência face ao poder difuso do Arquivo, tal como se manifesta na internet, na televisão, nas redes que hoje geram a regulação, tratamento e transmissão da informação; porque possui uma dimensão audiovisual que lhe permite articular e desarticular arquivos e corpos, e dado que as relações entre dizível e visível não estão estabilizadas, o cinema torna possível a reescrita das figuras, ou seja, um pensamento que não dispõe de uma forma já feita de verdade para o encontro das frases e das experiências, mas que extrai relações essenciais e verdadeiras dos acontecimentos do nosso presente e da nossa história, precisamente a partir da exploração do intervalo instável entre discurso e figura, e da experimentação ao nível da recolagem entre enunciados e visibilidades.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research aims to advance blinking detection in the context of work activity. Rather than patients having to attend a clinic, blinking videos can be acquired in a work environment, and further automatically analyzed. Therefore, this paper presents a methodology to perform the automatic detection of eye blink using consumer videos acquired with low-cost web cameras. This methodology includes the detection of the face and eyes of the recorded person, and then it analyzes the low-level features of the eye region to create a quantitative vector. Finally, this vector is classified into one of the two categories considered —open and closed eyes— by using machine learning algorithms. The effectiveness of the proposed methodology was demonstrated since it provides unbiased results with classification errors under 5%

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relatório de estágio de mestrado em Ciências da Comunicação (área de especialização em Audiovisual e Multimédia)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

feature extraction, feature tracking, vector field visualization

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El departament d'Economia i Organització d'Empreses, de la Facultat d'Economia i Empresa de la Universitat de Barcelona, sempre ha mostrat una preocupació perquè els alumnes siguin capaços d'entendre i, sobretot, de veure aplicables a la vida laboral els conceptes de les diverses assignatures de la nostra àrea -organització i direcció d'empreses-. Donada la falta d'experiència laboral que tenen, sovint entenen els conceptes d'una manera superficial i teòrica ja que els queden una mica lluny de la seva vida diària. A més, la falta d'assistència, el baix rendiment acadèmic i la falta de motivació dels estudiants són temes habituals entre el col·lectiu de professors. Per tot això sorgeix la necessitat de millorar el rendiment dels estudiants, en el marc de l'Espai d'Educació Europeu Superior, amb una metodologia docent generalizable a la resta d'assignatures que aconsegueixi motivar-les tant per estudiar com per assistir a classe. La finalitat principal del projecte és millorar el rendiment acadèmic dels estudiants mitjançant una metodologia docent basada en l'anàlisi de casos audiovisuals. En concret es volen arribar a aconseguir els objectius formatius fonamentals de l'àrea d'Organització d'Empreses així com també es pretén que els alumnes adquireixin capacitats desitjables per dur a terme les funcions administratives. La durada del projecte ha estat d'un any. Concretament des de novembre de 2007 a octubre de 2008.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El tema de la investigación propuesto aquí, se centra en el malestar social, y la representación de éste desde un punto de vista personal y autobiográfico, algo que en estas página denomino Autorepresentación del malestar social. ¿Cómo se autorepresenta el malestar y qué prácticas artísticas se emplean para ello? ¿Cuáles son la transformaciones que provocan estas prácticas artísticas en el terreno audiovisual? En este sentido, exploraremos dos vías de análisis: por un lado nos interesa observar cuales han sido las modificaciones que el poder ha desarrollado para establecer nuevas formas de explotación, y por otro lado, veremos como estas modificaciones están generando una nueva praxis social donde las prácticas artísticas cobran un nuevo y reforzado sentido, así como una nueva capacidad política, individual y colectiva a un mismo tiempo, cargada de una fuerza transformadora capaz de componer nuevos espacios de sujeto. Analizaremos, desde la representación del yo en la vida cotidiana, el género y las relaciones interpersonales, hasta las transformaciones contemporáneas del trabajo, y los cambios en la construcción de la subjetividad. El principal elemento en el que se apoyaran nuestras investigaciones será el anàlisis de producciones audiovisuales contemporáneas y la distribución de estas en algunas de las redes de comunicación contemporáneas, intentando mostrar, de esta manera, la interacción y los efectos directos que provocan en la realidad social.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En el presente artículo se exponen los resultados de una investigación realizada con una muestra de adolescentes (N = 1211) entre 12 y 16 años y otra de sus progenitores (N = 462) para explorar cómo las diferentes respuestas generacionales ante la presencia de diversos medios audiovisuales en su entorno próximo afectan las interacciones familiares entre progenitores e hijos/as. Los resultados apuntan al hecho que los progenitores tienden a sobredimensionar tanto el interés como las informaciones de que dispone su propio hijo o hija acerca de la mayor parte de los medios audiovisuales explorados, así como la satisfacción que proporcionan las conversaciones con los adultos acerca de cualquier actividad con estos medios. Los progenitores realizan atribuciones diferentes sobre el uso de medios audiovisuales según se refieran a un hijo o a una hija. Se aprecia una importante diferencia entre la satisfacción con las conversaciones que los y las adolescentes mantienen con sus iguales y la que proporcionan las conversaciones con los adultos respecto a cualquiera de sus actividades con los medios