323 resultados para Digit speech recognition


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a novel framework for facial expression recognition from still images by selecting, optimizing and fusing ‘salient’ Gabor feature layers to recognize six universal facial expressions using the K nearest neighbor classifier. The recognition comparisons with all layer approach using JAFFE and Cohn-Kanade (CK) databases confirm that using ‘salient’ Gabor feature layers with optimized sizes can achieve better recognition performance and dramatically reduce computational time. Moreover, comparisons with the state of the art performances demonstrate the effectiveness of our approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper looks at the challenges presented for the Australian Library and Information Association by its role as the professional association responsible for ensuring the quality of Australian library technician graduates. There is a particular focus on the issue of course recognition, where the Association's role is complicated by the need to work alongside the national quality assurance processes that have been established by the relevant technical education authorities. The paper describes the history of course recognition in Australia; examines the relationship between course recognition and other quality measures; and describes the process the Association has undertaken recently to ensure appropriate professional scrutiny in a changing environment of accountability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents Scatter Difference Nuisance Attribute Projection (SD-NAP) as an enhancement to NAP for SVM-based speaker verification. While standard NAP may inadvertently remove desirable speaker variability, SD-NAP explicitly de-emphasises this variability by incorporating a weighted version of the between-class scatter into the NAP optimisation criterion. Experimental evaluation of SD-NAP with a variety of SVM systems on the 2006 and 2008 NIST SRE corpora demonstrate that SD-NAP provides improved verification performance over standard NAP in most cases, particularly at the EER operating point.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents an extended Joint Factor Analysis model including explicit modelling of unwanted within-session variability. The goals of the proposed extended JFA model are to improve verification performance with short utterances by compensating for the effects of limited or imbalanced phonetic coverage, and to produce a flexible JFA model that is effective over a wide range of utterance lengths without adjusting model parameters such as retraining session subspaces. Experimental results on the 2006 NIST SRE corpus demonstrate the flexibility of the proposed model by providing competitive results over a wide range of utterance lengths without retraining and also yielding modest improvements in a number of conditions over current state-of-the-art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Autistic Behavioural Indicators Instrument (ABII) is an 18-item instrument developed to identify children with Autistic Disorder (AD) based on the presence of unique autistic behavioural indicators. The ABII was administered to 20 children with AD, 20 children with speech and language impairment (SLI) and 20 typically developing (TD) children aged 2-6 years. Results indicated that the ABII discriminated children diagnosed with AD from those diagnosed with SLI and those who were TD, based on the presence of specific social attention, sensory, and behavioural symptoms. A combination of symptomology across these domains correctly classified 100% of children with and without AD. The paper concludes that the ABII shows considerable promise as an instrument for the early identification of AD.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method of improving the security of biometric templates which satisfies desirable properties such as (a) irreversibility of the template, (b) revocability and assignment of a new template to the same biometric input, (c) matching in the secure transformed domain is presented. It makes use of an iterative procedure based on the bispectrum that serves as an irreversible transformation for biometric features because signal phase is discarded each iteration. Unlike the usual hash function, this transformation preserves closeness in the transformed domain for similar biometric inputs. A number of such templates can be generated from the same input. These properties are illustrated using synthetic data and applied to images from the FRGC 3D database with Gabor features. Verification can be successfully performed using these secure templates with an EER of 5.85%

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identifying an individual from surveillance video is a difficult, time consuming and labour intensive process. The proposed system aims to streamline this process by filtering out unwanted scenes and enhancing an individual's face through super-resolution. An automatic face recognition system is then used to identify the subject or present the human operator with likely matches from a database. A person tracker is used to speed up the subject detection and super-resolution process by tracking moving subjects and cropping a region of interest around the subject's face to reduce the number and size of the image frames to be super-resolved respectively. In this paper, experiments have been conducted to demonstrate how the optical flow super-resolution method used improves surveillance imagery for visual inspection as well as automatic face recognition on an Eigenface and Elastic Bunch Graph Matching system. The optical flow based method has also been benchmarked against the ``hallucination'' algorithm, interpolation methods and the original low-resolution images. Results show that both super-resolution algorithms improved recognition rates significantly. Although the hallucination method resulted in slightly higher recognition rates, the optical flow method produced less artifacts and more visually correct images suitable for human consumption.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic recognition of people is an active field of research with important forensic and security applications. In these applications, it is not always possible for the subject to be in close proximity to the system. Voice represents a human behavioural trait which can be used to recognise people in such situations. Automatic Speaker Verification (ASV) is the process of verifying a persons identity through the analysis of their speech and enables recognition of a subject at a distance over a telephone channel { wired or wireless. A significant amount of research has focussed on the application of Gaussian mixture model (GMM) techniques to speaker verification systems providing state-of-the-art performance. GMM's are a type of generative classifier trained to model the probability distribution of the features used to represent a speaker. Recently introduced to the field of ASV research is the support vector machine (SVM). An SVM is a discriminative classifier requiring examples from both positive and negative classes to train a speaker model. The SVM is based on margin maximisation whereby a hyperplane attempts to separate classes in a high dimensional space. SVMs applied to the task of speaker verification have shown high potential, particularly when used to complement current GMM-based techniques in hybrid systems. This work aims to improve the performance of ASV systems using novel and innovative SVM-based techniques. Research was divided into three main themes: session variability compensation for SVMs; unsupervised model adaptation; and impostor dataset selection. The first theme investigated the differences between the GMM and SVM domains for the modelling of session variability | an aspect crucial for robust speaker verification. Techniques developed to improve the robustness of GMMbased classification were shown to bring about similar benefits to discriminative SVM classification through their integration in the hybrid GMM mean supervector SVM classifier. Further, the domains for the modelling of session variation were contrasted to find a number of common factors, however, the SVM-domain consistently provided marginally better session variation compensation. Minimal complementary information was found between the techniques due to the similarities in how they achieved their objectives. The second theme saw the proposal of a novel model for the purpose of session variation compensation in ASV systems. Continuous progressive model adaptation attempts to improve speaker models by retraining them after exploiting all encountered test utterances during normal use of the system. The introduction of the weight-based factor analysis model provided significant performance improvements of over 60% in an unsupervised scenario. SVM-based classification was then integrated into the progressive system providing further benefits in performance over the GMM counterpart. Analysis demonstrated that SVMs also hold several beneficial characteristics to the task of unsupervised model adaptation prompting further research in the area. In pursuing the final theme, an innovative background dataset selection technique was developed. This technique selects the most appropriate subset of examples from a large and diverse set of candidate impostor observations for use as the SVM background by exploiting the SVM training process. This selection was performed on a per-observation basis so as to overcome the shortcoming of the traditional heuristic-based approach to dataset selection. Results demonstrate the approach to provide performance improvements over both the use of the complete candidate dataset and the best heuristically-selected dataset whilst being only a fraction of the size. The refined dataset was also shown to generalise well to unseen corpora and be highly applicable to the selection of impostor cohorts required in alternate techniques for speaker verification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Principal Topic: Project structures are often created by entrepreneurs and large corporate organizations to develop new products. Since new product development projects (NPDP) are more often situated within a larger organization, intrapreneurship or corporate entrepreneurship plays an important role in bringing these projects to fruition. Since NPDP often involves the development of a new product using immature technology, we describe development of an immature technology. The Joint Strike Fighter (JSF) F-35 aircraft is being developed by the U.S. Department of Defense and eight allied nations. In 2001 Lockheed Martin won a $19 billion contract to develop an affordable, stealthy and supersonic all-weather strike fighter designed to replace a wide range of aging fighter aircraft. In this research we define a complex project as one that demonstrates a number of sources of uncertainty to a degree, or level of severity, that makes it extremely difficult to predict project outcomes, to control or manage project (Remington & Zolin, Forthcoming). Project complexity has been conceptualized by Remington and Pollock (2007) in terms of four major sources of complexity; temporal, directional, structural and technological complexity (See Figure 1). Temporal complexity exists when projects experience significant environmental change outside the direct influence or control of the project. The Global Economic Crisis of 2008 - 2009 is a good example of the type of environmental change that can make a project complex as, for example in the JSF project, where project managers attempt to respond to changes in interest rates, international currency exchange rates and commodity prices etc. Directional complexity exists in a project where stakeholders' goals are unclear or undefined, where progress is hindered by unknown political agendas, or where stakeholders disagree or misunderstand project goals. In the JSF project all the services and all non countries have to agree to the specifications of the three variants of the aircraft; Conventional Take Off and Landing (CTOL), Short Take Off/Vertical Landing (STOVL) and the Carrier Variant (CV). Because the Navy requires a plane that can take off and land on an aircraft carrier, that required a special variant of the aircraft design, adding complexity to the project. Technical complexity occurs in a project using technology that is immature or where design characteristics are unknown or untried. Developing a plane that can take off on a very short runway and land vertically created may highly interdependent technological challenges to correctly locate, direct and balance the lift fans, modulate the airflow and provide equivalent amount of thrust from the downward vectored rear exhaust to lift the aircraft and at the same time control engine temperatures. These technological challenges make costing and scheduling equally challenging. Structural complexity in a project comes from the sheer numbers of elements such as the number of people, teams or organizations involved, ambiguity regarding the elements, and the massive degree of interconnectedness between them. While Lockheed Martin is the prime contractor, they are assisted in major aspects of the JSF development by Northrop Grumman, BAE Systems, Pratt & Whitney and GE/Rolls-Royce Fighter Engineer Team and innumerable subcontractors. In addition to identifying opportunities to achieve project goals, complex projects also need to identify and exploit opportunities to increase agility in response to changing stakeholder demands or to reduce project risks. Complexity Leadership Theory contends that in complex environments adaptive and enabling leadership are needed (Uhl-Bien, Marion and McKelvey, 2007). Adaptive leadership facilitates creativity, learning and adaptability, while enabling leadership handles the conflicts that inevitably arise between adaptive leadership and traditional administrative leadership (Uhl-Bien and Marion, 2007). Hence, adaptive leadership involves the recognition and opportunities to adapt, while and enabling leadership involves the exploitation of these opportunities. Our research questions revolve around the type or source of complexity and its relationship to opportunity recognition and exploitation. For example, is it only external environmental complexity that creates the need for the entrepreneurial behaviours, such as opportunity recognition and opportunity exploitation? Do the internal dimensions of project complexity, such as technological and structural complexity, also create the need for opportunity recognition and opportunity exploitation? The Kropp, Zolin and Lindsay model (2009) describes a relationship between entrepreneurial orientation (EO), opportunity recognition (OR), and opportunity exploitation (OX) in complex projects, with environmental and organizational contextual variables as moderators. We extend their model by defining the affects of external complexity and internal complexity on OR and OX. ---------- Methodology/Key Propositions: When the environment complex EO is more likely to result in OR because project members will be actively looking for solutions to problems created by environmental change. But in projects that are technologically or structurally complex project leaders and members may try to make the minimum changes possible to reduce the risk of creating new problems due to delays or schedule changes. In projects with environmental or technological complexity project leaders who encourage the innovativeness dimension of EO will increase OR in complex projects. But projects with technical or structural complexity innovativeness will not necessarily result in the recognition and exploitation of opportunities due to the over-riding importance of maintaining stability in the highly intricate and interconnected project structure. We propose that in projects with environmental complexity creating the need for change and innovation project leaders, who are willing to accept and manage risk, are more likely to identify opportunities to increase project effectiveness and efficiency. In contrast in projects with internal complexity a much higher willingness to accept risk will be necessary to trigger opportunity recognition. In structurally complex projects we predict it will be less likely to find a relationship between risk taking and OP. When the environment is complex, and a project has autonomy, they will be motivated to execute opportunities to improve the project's performance. In contrast, when the project has high internal complexity, they will be more cautious in execution. When a project experiences high competitive aggressiveness and their environment is complex, project leaders will be motivated to execute opportunities to improve the project's performance. In contrast, when the project has high internal complexity, they will be more cautious in execution. This paper reports the first stage of a three year study into the behaviours of managers, leaders and team members of complex projects. We conduct a qualitative study involving a Group Discussion with experienced project leaders. The objective is to determine how leaders of large and potentially complex projects perceive that external and internal complexity will influence the affects of EO on OR. ---------- Results and Implications: These results will help identify and distinguish the impact of external and internal complexity on entrepreneurial behaviours in NPDP. Project managers will be better able to quickly decide how and when to respond to changes in the environment and internal project events.