954 resultados para Digit speech recognition


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel shape recognition algorithm was developed to autonomously classify the Northern Pacific Sea Star (Asterias amurenis) from benthic images that were collected by the Starbug AUV during 6km of transects in the Derwent estuary. Despite the effects of scattering, attenuation, soft focus and motion blur within the underwater images, an optimal joint classification rate of 77.5% and misclassification rate of 13.5% was achieved. The performance of algorithm was largely attributed to its ability to recognise locally deformed sea star shapes that were created during the segmentation of the distorted images.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an approach to mobile robot localization, place recognition and loop closure using a monostatic ultra-wide band (UWB) radar system. The UWB radar is a time-of-flight based range measurement sensor that transmits short pulses and receives reflected waves from objects in the environment. The main idea of the poposed localization method is to treat the received waveform as a signature of place. The resulting echo waveform is very complex and highly depends on the position of the sensor with respect to surrounding objects. On the other hand, the sensor receives similar waveforms from the same positions.Moreover, the directional characteristics of dipole antenna is almost omnidirectional. Therefore, we can localize the sensor position to find similar waveform from waveform database. This paper proposes a place recognitionmethod based on waveform matching, presents a number of experiments that illustrate the high positon estimation accuracy of our UWB radar-based localization system, and shows the resulting loop detection performance in a typical indoor office environment and a forest.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experimental studies have found that when the state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are trained using out-domain data, it significantly affects speaker verification performance due to the mismatch between development data and evaluation data. To overcome this problem we propose a novel unsupervised inter dataset variability (IDV) compensation approach to compensate the dataset mismatch. IDV-compensated PLDA system achieves over 10% relative improvement in EER values over out-domain PLDA system by effectively compensating the mismatch between in-domain and out-domain data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Robustness to variations in environmental conditions and camera viewpoint is essential for long-term place recognition, navigation and SLAM. Existing systems typically solve either of these problems, but invariance to both remains a challenge. This paper presents a training-free approach to lateral viewpoint- and condition-invariant, vision-based place recognition. Our successive frame patch-tracking technique infers average scene depth along traverses and automatically rescales views of the same place at different depths to increase their similarity. We combine our system with the condition-invariant SMART algorithm and demonstrate place recognition between day and night, across entire 4-lane-plus-median-strip roads, where current algorithms fail.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The benefits for university graduates in growing skills and capabilities through volunteering experiences are gaining increased attention. Building leadership self-efficacy supports students develop their capacity for understanding, articulating and evidencing their learning. Reward and recognition is fundamental in the student’s journey to build self-efficacy. Through this research, concepts of reward and recognition have been explored and articulated through the experiences and perceptions of actively engaged student peer leaders. The research methodology has enabled a collaborative, student-centred approach in shaping an innovative Rewards Framework, which supports, recognises and rewards the learning journey from beginning peer leader to competent and confident graduate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Place recognition has long been an incompletely solved problem in that all approaches involve significant compromises. Current methods address many but never all of the critical challenges of place recognition – viewpoint-invariance, condition-invariance and minimizing training requirements. Here we present an approach that adapts state-of-the-art object proposal techniques to identify potential landmarks within an image for place recognition. We use the astonishing power of convolutional neural network features to identify matching landmark proposals between images to perform place recognition over extreme appearance and viewpoint variations. Our system does not require any form of training, all components are generic enough to be used off-the-shelf. We present a range of challenging experiments in varied viewpoint and environmental conditions. We demonstrate superior performance to current state-of-the- art techniques. Furthermore, by building on existing and widely used recognition frameworks, this approach provides a highly compatible place recognition system with the potential for easy integration of other techniques such as object detection and semantic scene interpretation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

2,4,6-trinitrotoluene (TNT) is one of the most commonly used nitro aromatic explosives in landmine, military and mining industry. This article demonstrates rapid and selective identification of TNT by surface-enhanced Raman spectroscopy (SERS) using 6-aminohexanethiol (AHT) as a new recognition molecule. First, Meisenheimer complex formation between AHT and TNT is confirmed by the development of pink colour and appearance of new band around 500 nm in UV-visible spectrum. Solution Raman spectroscopy study also supported the AHT:TNT complex formation by demonstrating changes in the vibrational stretching of AHT molecule between 2800-3000 cm−1. For surface enhanced Raman spectroscopy analysis, a self-assembled monolayer (SAM) of AHT is formed over the gold nanostructure (AuNS) SERS substrate in order to selectively capture TNT onto the surface. Electrochemical desorption and X-ray photoelectron studies are performed over AHT SAM modified surface to examine the presence of free amine groups with appropriate orientation for complex formation. Further, AHT and butanethiol (BT) mixed monolayer system is explored to improve the AHT:TNT complex formation efficiency. Using a 9:1 AHT:BT mixed monolayer, a very low detection limit (LOD) of 100 fM TNT was realized. The new method delivers high selectivity towards TNT over 2,4 DNT and picric acid. Finally, real sample analysis is demonstrated by the extraction and SERS detection of 302 pM of TNT from spiked.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper analyzes the limitations upon the amount of in- domain (NIST SREs) data required for training a probabilistic linear discriminant analysis (PLDA) speaker verification system based on out-domain (Switchboard) total variability subspaces. By limiting the number of speakers, the number of sessions per speaker and the length of active speech per session available in the target domain for PLDA training, we investigated the relative effect of these three parameters on PLDA speaker verification performance in the NIST 2008 and NIST 2010 speaker recognition evaluation datasets. Experimental results indicate that while these parameters depend highly on each other, to beat out-domain PLDA training, more than 10 seconds of active speech should be available for at least 4 sessions/speaker for a minimum of 800 speakers. If further data is available, considerable improvement can be made over solely out-domain PLDA training.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For most people, speech production is relatively effortless and error-free. Yet it has long been recognized that we need some type of control over what we are currently saying and what we plan to say. Precisely how we monitor our internal and external speech has been a topic of research interest for several decades. The predominant approach in psycholinguistics has assumed monitoring of both is accomplished via systems responsible for comprehending others' speech. This special topic aimed to broaden the field, firstly by examining proposals that speech production might also engage more general systems, such as those involved in action monitoring. A second aim was to examine proposals for a production-specific, internal monitor. Both aims require that we also specify the nature of the representations subject to monitoring.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We used event-related fMRI to investigate the neural correlates of encoding strength and word frequency effects in recognition memory. At test, participants made Old/New decisions to intermixed low (LF) and high frequency (HF) words that had been presented once or twice at study and to new, unstudied words. The Old/New effect for all hits vs. correctly rejected unstudied words was associated with differential activity in multiple cortical regions, including the anterior medial temporal lobe (MTL), hippocampus, left lateral parietal cortex and anterior left inferior prefrontal cortex (LIPC). Items repeated at study had superior hit rates (HR) compared to items presented once and were associated with reduced activity in the right anterior MTL. By contrast, other regions that had shown conventional Old/New effects did not demonstrate modulation according to memory strength. A mirror effect for word frequency was demonstrated, with the LF word HR advantage associated with increased activity in the left lateral temporal cortex. However, none of the regions that had demonstrated Old/New item retrieval effects showed modulation according to word frequency. These findings are interpreted as supporting single-process memory models proposing a unitary strength-like memory signal and models attributing the LF word HR advantage to the greater lexico-semantic context-noise associated with HF words due to their being experienced in many pre-experimental contexts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present study, items pre-exposed in a familiarization series were included in a list discrimination task to manipulate memory strength. At test, participants were required to discriminate strong targets and strong lures from weak targets and new lures. This resulted in a concordant pattern of increased "old" responses to strong targets and lures. Model estimates attributed this pattern to either equivalent increases in memory strength across the two types of items (unequal variance signal detection model) or equivalent increases in both familiarity and recollection (dual process signal detection [DPSD] model). Hippocampal activity associated with strong targets and lures showed equivalent increases compared with missed items. This remained the case when analyses were restricted to high-confidence responses considered by the DPSD model to reflect predominantly recollection. A similar pattern of activity was observed in parahippocampal cortex for high-confidence responses. The present results are incompatible with "noncriterial" or "false" recollection being reflected solely in inflated DPSD familiarity estimates and support a positive correlation between hippocampal activity and memory strength irrespective of the accuracy of list discrimination, consistent with the unequal variance signal detection model account.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This large-scale longitudinal population study provided a rare opportunity to consider the interface between multilingualism and speech-language competence on children’s academic and social-emotional outcomes and to determine whether differences between groups at 4 to 5 years persist, deepen, or disappear with time and schooling. Four distinct groups were identified from the Kindergarten cohort of the Longitudinal Study of Australian Children (LSAC) (1) English-only + typical speech and language (n = 2,012); (2) multilingual + typical speech and language (n = 476); (3) English-only + speech and language concern (n = 643); and (4) multilingual + speech and language concern (n = 109). Two analytic approaches were used to compare these groups. First, a matched case-control design was used to randomly match multilingual children with speech and language concern (group 4, n = 109) to children in groups 1, 2, and 3 on gender, age, and family socio-economic position in a cross-sectional comparison of vocabulary, school readiness, and behavioral adjustment. Next, analyses were applied to the whole sample to determine longitudinal effects of group membership on teachers’ ratings of literacy, numeracy, and behavioral adjustment at ages 6 to 7 and 8 to 9 years. At 4 to 5 years, multilingual children with speech and language concern did equally well or better than English-only children (with or without speech and language concern) on school readiness tests but performed more poorly on measures of English vocabulary and behavior. At ages 6 to 7 and 8 to 9, the early gap between English-only and multilingual children had closed. Multilingualism was not found to contribute to differences in literacy and numeracy outcomes at school; instead, outcomes were more related to concerns about children’s speech and language in early childhood. There were no group differences for socio-emotional outcomes. Early evidence for the combined risks of multilingualism plus speech and language concern was not upheld into the school years.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose Optical blur and ageing are known to affect driving performance but their effects on drivers' eye movements are poorly understood. This study examined the effects of optical blur and age on eye movement patterns and performance on the DriveSafe slide recognition test which is purported to predict fitness to drive. Methods Twenty young (27.1 ± 4.6 years) and 20 older (73.3 ± 5.7 years) visually normal drivers performed the DriveSafe under two visual conditions: best-corrected vision and with +2.00 DS blur. The DriveSafe is a Visual Recognition Slide Test that consists of brief presentations of static, real-world driving scenes containing different road users (pedestrians, bicycles and vehicles). Participants reported the types, relative positions and direction of travel of the road users in each image; the score was the number of correctly reported items (maximum score of 128). Eye movements were recorded while participants performed the DriveSafe test using a Tobii TX300 eye tracking system. Results There was a significant main effect of blur on DriveSafe scores (best-corrected: 114.9 vs blur: 93.2; p < 0.001). There was also a significant age and blur interaction on the DriveSafe scores (p < 0.001) such that the young drivers were more negatively affected by blur than the older drivers (reductions of 22% and 13% respectively; p < 0.001): with best-corrected vision, the young drivers performed better than the older drivers (DriveSafe scores: 118.4 vs 111.5; p = 0.001), while with blur, the young drivers performed worse than the older drivers (88.6 vs 95.9; p = 0.009). For the eye movement patterns, blur significantly reduced the number of fixations on road users (best-corrected: 5.1 vs blur: 4.5; p < 0.001), fixation duration on road users (2.0 s vs 1.8 s; p < 0.001) and saccade amplitudes (7.4° vs 6.7°; p < 0.001). A main effect of age on eye movements was also found where older drivers made smaller saccades than the young drivers (6.7° vs 7.4°; p < 0.001). Conclusions Blur reduced DriveSafe scores for both age groups and this effect was greater for the young drivers. The decrease in number of fixations and fixation duration on road users, as well as the reduction in saccade amplitudes under the blurred condition, highlight the difficulty experienced in performing the task in the presence of optical blur, which suggests that uncorrected refractive errors may have a detrimental impact on aspects of driving performance.