908 resultados para automatic speech recognition
Resumo:
The paper presents a fast and robust stereo object recognition method. The method is currently unable to identify the rotation of objects. This makes it very good at locating spheres which are rotationally independent. Approximate methods for located non-spherical objects have been developed. Fundamental to the method is that the correspondence problem is solved using information about the dimensions of the object being located. This is in contrast to previous stereo object recognition systems where the scene is first reconstructed by point matching techniques. The method is suitable for real-time application on low-power devices.
Resumo:
Calibration of movement tracking systems is a difficult problem faced by both animals and robots. The ability to continuously calibrate changing systems is essential for animals as they grow or are injured, and highly desirable for robot control or mapping systems due to the possibility of component wear, modification, damage and their deployment on varied robotic platforms. In this paper we use inspiration from the animal head direction tracking system to implement a self-calibrating, neurally-based robot orientation tracking system. Using real robot data we demonstrate how the system can remove tracking drift and learn to consistently track rotation over a large range of velocities. The neural tracking system provides the first steps towards a fully neural SLAM system with improved practical applicability through selftuning and adaptation.
Resumo:
Several approaches have been proposed to recognize handwritten Bengali characters using different curve fitting algorithms and curvature analysis. In this paper, a new algorithm (Curve-fitting Algorithm) to identify various strokes of a handwritten character is developed. The curve-fitting algorithm helps recognizing various strokes of different patterns (line, quadratic curve) precisely. This reduces the error elimination burden heavily. Implementation of this Modified Syntactic Method demonstrates significant improvement in the recognition of Bengali handwritten characters.
Resumo:
Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detection) due to their biological relevance and computational properties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Gabor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area underneath the ROC curve (A`) of 92.60% across 17 AUs for the Gabor energy filters, while the Log-Gabor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more useful for AU detection than the differences in contours and edges that the Gabor energy filters extract.
Resumo:
When classifying a signal, ideally we want our classifier to trigger a large response when it encounters a positive example and have little to no response for all other examples. Unfortunately in practice this does not occur with responses fluctuating, often causing false alarms. There exists a myriad of reasons why this is the case, most notably not incorporating the dynamics of the signal into the classification. In facial expression recognition, this has been highlighted as one major research question. In this paper we present a novel technique which incorporates the dynamics of the signal which can produce a strong response when the peak expression is found and essentially suppresses all other responses as much as possible. We conducted preliminary experiments on the extended Cohn-Kanade (CK+) database which shows its benefits. The ability to automatically and accurately recognize facial expressions of drivers is highly relevant to the automobile. For example, the early recognition of “surprise” could indicate that an accident is about to occur; and various safeguards could immediately be deployed to avoid or minimize injury and damage. In this paper, we conducted initial experiments on the extended Cohn-Kanade (CK+) database which shows its benefits.
Resumo:
This paper presents an automated system for 3D assembly of tissue engineering (TE) scaffolds made from biocompatible microscopic building blocks with relatively large fabrication error. It focuses on the pin-into-hole force control developed for this demanding microassembly task. A beam-like gripper with integrated force sensing at a 3 mN resolution with a 500 mN measuring range is designed, and is used to implement an admittance force-controlled insertion using commercial precision stages. Visual-based alignment followed by an insertion is complemented by a haptic exploration strategy using force and position information. The system demonstrates fully automated construction of TE scaffolds with 50 microparts whose dimension error is larger than 5%.
Resumo:
This paper investigates the automatic atti- tude and depth control of a torpedo shaped submarine. Both experimental results and dynamic simulations are used to tune feed- back control loops in order to obtain stable control of yaw, pitch and roll of the craft.
Resumo:
What really changed for Australian Aboriginal and Torres Strait Islander people between Paul Keating’s Redfern Park Speech (Keating 1992) and Kevin Rudd’s Apology to the stolen generations (Rudd 2008)? What will change between the Apology and the next speech of an Australian Prime Minister? The two speeches were intricately linked, and they were both personal and political. But do they really signify change at the political level? This paper reflects my attempt to turn the gaze away from Aboriginal and Torres Strait Islander people, and back to where the speeches originated: the Australian Labor Party (ALP). I question whether the changes foreshadowed in the two speeches – including changes by the Australian public and within Australian society – are evident in the internal mechanisms of the ALP. I also seek to understand why non-Indigenous women seem to have given in to the existing ways of the ALP instead of challenging the status quo which keeps Aboriginal and Torres Strait Islander peoples marginalised. I believe that, without a thorough examination and a change in the ALP’s practices, the domination and subjugation of Indigenous peoples will continue – within the Party, through the Australian political process and, therefore, through governments.
Resumo:
This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.