932 resultados para Automatic Editing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Calibration of movement tracking systems is a difficult problem faced by both animals and robots. The ability to continuously calibrate changing systems is essential for animals as they grow or are injured, and highly desirable for robot control or mapping systems due to the possibility of component wear, modification, damage and their deployment on varied robotic platforms. In this paper we use inspiration from the animal head direction tracking system to implement a self-calibrating, neurally-based robot orientation tracking system. Using real robot data we demonstrate how the system can remove tracking drift and learn to consistently track rotation over a large range of velocities. The neural tracking system provides the first steps towards a fully neural SLAM system with improved practical applicability through selftuning and adaptation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an automated system for 3D assembly of tissue engineering (TE) scaffolds made from biocompatible microscopic building blocks with relatively large fabrication error. It focuses on the pin-into-hole force control developed for this demanding microassembly task. A beam-like gripper with integrated force sensing at a 3 mN resolution with a 500 mN measuring range is designed, and is used to implement an admittance force-controlled insertion using commercial precision stages. Visual-based alignment followed by an insertion is complemented by a haptic exploration strategy using force and position information. The system demonstrates fully automated construction of TE scaffolds with 50 microparts whose dimension error is larger than 5%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the automatic atti- tude and depth control of a torpedo shaped submarine. Both experimental results and dynamic simulations are used to tune feed- back control loops in order to obtain stable control of yaw, pitch and roll of the craft.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis is a work of creative practice-led research comprising two components. The first component is a speculative thriller novel, entitled Diamond Eyes. (Contracted for publication in 2009 by Harper Collins: Voyager as the first in a trilogy, under the name AA Bell.) The second component is an exegesis exploring the notion of re-visioning a novel. Re-visioning, not to be confused with revision, refers to advance editing strategies required when the original vision of a novel changes during development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Type unions, pointer variables and function pointers are a long standing source of subtle security bugs in C program code. Their use can lead to hard-to-diagnose crashes or exploitable vulnerabilities that allow an attacker to attain privileged access over classified data. This paper describes an automatable framework for detecting such weaknesses in C programs statically, where possible, and for generating assertions that will detect them dynamically, in other cases. Exclusively based on analysis of the source code, it identifies required assertions using a type inference system supported by a custom made symbol table. In our preliminary findings, our type system was able to infer the correct type of unions in different scopes, without manual code annotations or rewriting. Whenever an evaluation is not possible or is difficult to resolve, appropriate runtime assertions are formed and inserted into the source code. The approach is demonstrated via a prototype C analysis tool.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a method for calculating the in-bucket payload volume on a dragline for the purpose of estimating the material’s bulk density in real-time. Knowledge of the bulk density can provide instant feedback to mine planning and scheduling to improve blasting and in turn provide a more uniform bulk density across the excavation site. Furthermore costs and emissions in dragline operation, maintenance and downstream material processing can be reduced. The main challenge is to determine an accurate position and orientation of the bucket with the constraint of real-time performance. The proposed solution uses a range bearing and tilt sensor to locate and scan the bucket between the lift and dump stages of the dragline cycle. Various scanning strategies are investigated for their benefits in this real-time application. The bucket is segmented from the scene using cluster analysis while the pose of the bucket is calculated using the iterative closest point (ICP) algorithm. Payload points are segmented from the bucket by a fixed distance neighbour clustering method to preserve boundary points and exclude low density clusters introduced by overhead chains and the spreader bar. A height grid is then used to represent the payload from which the volume can be calculated by summing over the grid cells. We show volume calculated on a scaled system with an accuracy of greater than 95 per cent.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline.