929 resultados para speaker recognition systems


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a method for rational behaviour recognition that combines vision-based pose estimation with knowledge modeling and reasoning. The proposed method consists of two stages. First, RGB-D images are used in the estimation of the body postures. Then, estimated actions are evaluated to verify that they make sense. This method requires rational behaviour to be exhibited. To comply with this requirement, this work proposes a rational RGB-D dataset with two types of sequences, some for training and some for testing. Preliminary results show the addition of knowledge modeling and reasoning leads to a significant increase of recognition accuracy when compared to a system based only on computer vision.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

O objeto principal desta tese é o estudo de algoritmos de processamento e representação automáticos de dados, em particular de informação obtida por sensores montados a bordo de veículos (2D e 3D), com aplicação em contexto de sistemas de apoio à condução. O trabalho foca alguns dos problemas que, quer os sistemas de condução automática (AD), quer os sistemas avançados de apoio à condução (ADAS), enfrentam hoje em dia. O documento é composto por duas partes. A primeira descreve o projeto, construção e desenvolvimento de três protótipos robóticos, incluindo pormenores associados aos sensores montados a bordo dos robôs, algoritmos e arquitecturas de software. Estes robôs foram utilizados como plataformas de ensaios para testar e validar as técnicas propostas. Para além disso, participaram em várias competições de condução autónoma tendo obtido muito bons resultados. A segunda parte deste documento apresenta vários algoritmos empregues na geração de representações intermédias de dados sensoriais. Estes podem ser utilizados para melhorar técnicas já existentes de reconhecimento de padrões, deteção ou navegação, e por este meio contribuir para futuras aplicações no âmbito dos AD ou ADAS. Dado que os veículos autónomos contêm uma grande quantidade de sensores de diferentes naturezas, representações intermédias são particularmente adequadas, pois podem lidar com problemas relacionados com as diversas naturezas dos dados (2D, 3D, fotométrica, etc.), com o carácter assíncrono dos dados (multiplos sensores a enviar dados a diferentes frequências), ou com o alinhamento dos dados (problemas de calibração, diferentes sensores a disponibilizar diferentes medições para um mesmo objeto). Neste âmbito, são propostas novas técnicas para a computação de uma representação multi-câmara multi-modal de transformação de perspectiva inversa, para a execução de correcção de côr entre imagens de forma a obter mosaicos de qualidade, ou para a geração de uma representação de cena baseada em primitivas poligonais, capaz de lidar com grandes quantidades de dados 3D e 2D, tendo inclusivamente a capacidade de refinar a representação à medida que novos dados sensoriais são recebidos.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Models of visual perception are based on image representations in cortical area V1 and higher areas which contain many cell layers for feature extraction. Basic simple, complex and end-stopped cells provide input for line, edge and keypoint detection. In this paper we present an improved method for multi-scale line/edge detection based on simple and complex cells. We illustrate the line/edge representation for object reconstruction, and we present models for multi-scale face (object) segregation and recognition that can be embedded into feedforward dorsal and ventral data streams (the “what” and “where” subsystems) with feedback streams from higher areas for obtaining translation, rotation and scale invariance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are roughly two processing systems: (1) very fast gist vision of entire scenes, completely bottom-up and data driven, and (2) Focus-of-Attention (FoA) with sequential screening of specific image regions and objects. The latter system has to be sequential because unnormalised input objects must be matched against normalised templates of canonical object views stored in memory, which involves dynamic routing of features in the visual pathways.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human-robot interaction is an interdisciplinary research area which aims at integrating human factors, cognitive psychology and robot technology. The ultimate goal is the development of social robots. These robots are expected to work in human environments, and to understand behavior of persons through gestures and body movements. In this paper we present a biological and realtime framework for detecting and tracking hands. This framework is based on keypoints extracted from cortical V1 end-stopped cells. Detected keypoints and the cells’ responses are used to classify the junction type. By combining annotated keypoints in a hierarchical, multi-scale tree structure, moving and deformable hands can be segregated, their movements can be obtained, and they can be tracked over time. By using hand templates with keypoints at only two scales, a hand’s gestures can be recognized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2015

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Face recognition from images or video footage requires a certain level of recorded image quality. This paper derives acceptable bitrates (relating to levels of compression and consequently quality) of footage with human faces, using an industry implementation of the standard H.264/MPEG-4 AVC and the Closed-Circuit Television (CCTV) recording systems on London buses. The London buses application is utilized as a case study for setting up a methodology and implementing suitable data analysis for face recognition from recorded footage, which has been degraded by compression. The majority of CCTV recorders on buses use a proprietary format based on the H.264/MPEG-4 AVC video coding standard, exploiting both spatial and temporal redundancy. Low bitrates are favored in the CCTV industry for saving storage and transmission bandwidth, but they compromise the image usefulness of the recorded imagery. In this context, usefulness is determined by the presence of enough facial information remaining in the compressed image to allow a specialist to recognize a person. The investigation includes four steps: (1) Development of a video dataset representative of typical CCTV bus scenarios. (2) Selection and grouping of video scenes based on local (facial) and global (entire scene) content properties. (3) Psychophysical investigations to identify the key scenes, which are most affected by compression, using an industry implementation of H.264/MPEG-4 AVC. (4) Testing of CCTV recording systems on buses with the key scenes and further psychophysical investigations. The results showed a dependency upon scene content properties. Very dark scenes and scenes with high levels of spatial–temporal busyness were the most challenging to compress, requiring higher bitrates to maintain useful information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning and teaching processes, like all human activities, can be mediated through the use of tools. Information and communication technologies are now widespread within education. Their use in the daily life of teachers and learners affords engagement with educational activities at any place and time and not necessarily linked to an institution or a certificate. In the absence of formal certification, learning under these circumstances is known as informal learning. Despite the lack of certification, learning with technology in this way presents opportunities to gather information about and present new ways of exploiting an individual’s learning. Cloud technologies provide ways to achieve this through new architectures, methodologies, and workflows that facilitate semantic tagging, recognition, and acknowledgment of informal learning activities. The transparency and accessibility of cloud services mean that institutions and learners can exploit existing knowledge to their mutual benefit. The TRAILER project facilitates this aim by providing a technological framework using cloud services, a workflow, and a methodology. The services facilitate the exchange of information and knowledge associated with informal learning activities ranging from the use of social software through widgets, computer gaming, and remote laboratory experiments. Data from these activities are shared among institutions, learners, and workers. The project demonstrates the possibility of gathering information related to informal learning activities independently of the context or tools used to carry them out.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A personalização é um aspeto chave de uma interação homem-computador efetiva. Numa era em que existe uma abundância de informação e tantas pessoas a interagir com ela, de muitas maneiras, a capacidade de se ajustar aos seus utilizadores é crucial para qualquer sistema moderno. A criação de sistemas adaptáveis é um domínio bastante complexo que necessita de métodos muito específicos para ter sucesso. No entanto, nos dias de hoje ainda não existe um modelo ou arquitetura padrão para usar nos sistemas adaptativos modernos. A principal motivação desta tese é a proposta de uma arquitetura para modelação do utilizador que seja capaz de incorporar diferentes módulos necessários para criar um sistema com inteligência escalável com técnicas de modelação. Os módulos cooperam de forma a analisar os utilizadores e caracterizar o seu comportamento, usando essa informação para fornecer uma experiência de sistema customizada que irá aumentar não só a usabilidade do sistema mas também a produtividade e conhecimento do utilizador. A arquitetura proposta é constituída por três componentes: uma unidade de informação do utilizador, uma estrutura matemática capaz de classificar os utilizadores e a técnica a usar quando se adapta o conteúdo. A unidade de informação do utilizador é responsável por conhecer os vários tipos de indivíduos que podem usar o sistema, por capturar cada detalhe de interações relevantes entre si e os seus utilizadores e também contém a base de dados que guarda essa informação. A estrutura matemática é o classificador de utilizadores, e tem como tarefa a sua análise e classificação num de três perfis: iniciado, intermédio ou avançado. Tanto as redes de Bayes como as neuronais são utilizadas, e uma explicação de como as preparar e treinar para lidar com a informação do utilizador é apresentada. Com o perfil do utilizador definido torna-se necessária uma técnica para adaptar o conteúdo do sistema. Nesta proposta, uma abordagem de iniciativa mista é apresentada tendo como base a liberdade de tanto o utilizador como o sistema controlarem a comunicação entre si. A arquitetura proposta foi desenvolvida como parte integrante do projeto ADSyS - um sistema de escalonamento dinâmico - utilizado para resolver problemas de escalonamento sujeitos a eventos dinâmicos. Possui uma complexidade elevada mesmo para utilizadores frequentes, daí a necessidade de adaptar o seu conteúdo de forma a aumentar a sua usabilidade. Com o objetivo de avaliar as contribuições deste trabalho, um estudo computacional acerca do reconhecimento dos utilizadores foi desenvolvido, tendo por base duas sessões de avaliação de usabilidade com grupos de utilizadores distintos. Foi possível concluir acerca dos benefícios na utilização de técnicas de modelação do utilizador com a arquitetura proposta.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Development of organic molecules that exhibit selective interactions with different biomolecules has immense significance in biochemical and medicinal applications. In this context, our main objective has been to design a few novel functionaIized molecules that can selectively bind and recognize nucleotides and DNA in the aqueous medium through non-covalent interactions. Our strategy was to design novel cycIophane receptor systems based on the anthracene chromophore linked through different bridging moieties and spacer groups. It was proposed that such systems would have a rigid structure with well defined cavity, wherein the aromatic chromophore can undergo pi-stacking interactions with the guest molecules. The viologen and imidazolium moieties have been chosen as bridging units, since such groups, can in principle, could enhance the solubility of these derivatives in the aqueous medium as well as stabilize the inclusion complexes through electrostatic interactions.We synthesized a series of water soluble novel functionalized cyclophanes and have investigated their interactions with nucleotides, DNA and oligonucIeotides through photophysical. chiroptical, electrochemical and NMR techniques. Results indicate that these systems have favorable photophysical properties and exhibit selective interactions with ATP, GTP and DNA involving electrostatic. hydrophobic and pi-stacking interactions inside the cavity and hence can have potential use as probes in biology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Identification and Control of Non‐linear dynamical systems are challenging problems to the control engineers.The topic is equally relevant in communication,weather prediction ,bio medical systems and even in social systems,where nonlinearity is an integral part of the system behavior.Most of the real world systems are nonlinear in nature and wide applications are there for nonlinear system identification/modeling.The basic approach in analyzing the nonlinear systems is to build a model from known behavior manifest in the form of system output.The problem of modeling boils down to computing a suitably parameterized model,representing the process.The parameters of the model are adjusted to optimize a performanace function,based on error between the given process output and identified process/model output.While the linear system identification is well established with many classical approaches,most of those methods cannot be directly applied for nonlinear system identification.The problem becomes more complex if the system is completely unknown but only the output time series is available.Blind recognition problem is the direct consequence of such a situation.The thesis concentrates on such problems.Capability of Artificial Neural Networks to approximate many nonlinear input-output maps makes it predominantly suitable for building a function for the identification of nonlinear systems,where only the time series is available.The literature is rich with a variety of algorithms to train the Neural Network model.A comprehensive study of the computation of the model parameters,using the different algorithms and the comparison among them to choose the best technique is still a demanding requirement from practical system designers,which is not available in a concise form in the literature.The thesis is thus an attempt to develop and evaluate some of the well known algorithms and propose some new techniques,in the context of Blind recognition of nonlinear systems.It also attempts to establish the relative merits and demerits of the different approaches.comprehensiveness is achieved in utilizing the benefits of well known evaluation techniques from statistics. The study concludes by providing the results of implementation of the currently available and modified versions and newly introduced techniques for nonlinear blind system modeling followed by a comparison of their performance.It is expected that,such comprehensive study and the comparison process can be of great relevance in many fields including chemical,electrical,biological,financial and weather data analysis.Further the results reported would be of immense help for practical system designers and analysts in selecting the most appropriate method based on the goodness of the model for the particular context.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Design and study of molecular receptors capable of mimicking natural processes has found applications in basic research as well as in the development of potentially useful technologies. Of the various receptors reported, the cyclophanes are known to encapsulate guest molecules in their cavity utilizing various non–covalent interactions resulting in significant changes in their optical properties. This unique property of the cyclophanes has been widely exploited for the development of selective and sensitive probes for a variety of guest molecules including complex biomolecules. Further, the incorporation of metal centres into these systems added new possibilities for designing receptors such as the metallocyclophanes and transition metal complexes, which can target a large variety of Lewis basic functional groups that act as selective synthetic receptors. The ligands that form complexes with the metal ions, and are capable of further binding to Lewis-basic substrates through open coordination sites present in various biomolecules are particularly important as biomolecular receptors. In this context, we synthesized a few anthracene and acridine based metal complexes and novel metallocyclophanes and have investigated their photophysical and biomolecular recognition properties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and Hidden Markov model (HMM) for recognition. The system is trained with 21 male and female voices in the age group of 20 to 40 years and there was 98.5% word recognition accuracy (94.8% sentence recognition accuracy) on a test set of continuous digit recognition task.