48 resultados para speaker recognition systems
Resumo:
A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133?MHz.
Resumo:
This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Resumo:
For the first time in this paper we present results showing the effect of speaker head pose angle on automatic lip-reading performance over a wide range of closely spaced angles. We analyse the effect head pose has upon the features themselves and show that by selecting coefficients with minimum variance w.r.t. pose angle, recognition performance can be improved when train-test pose angles differ. Experiments are conducted using the initial phase of a unique multi view Audio-Visual database designed specifically for research and development of pose-invariant lip-reading systems. We firstly show that it is the higher order horizontal spatial frequency components that become most detrimental as the pose deviates. Secondly we assess the performance of different feature selection masks across a range of pose angles including a new mask based on Minimum Cross-Pose Variance coefficients. We report a relative improvement of 50% in Word Error Rate when using our selection mask over a common energy based selection during profile view lip-reading.
Resumo:
This paper considers the separation and recognition of overlapped speech sentences assuming single-channel observation. A system based on a combination of several different techniques is proposed. The system uses a missing-feature approach for improving crosstalk/noise robustness, a Wiener filter for speech enhancement, hidden Markov models for speech reconstruction, and speaker-dependent/-independent modeling for speaker and speech recognition. We develop the system on the Speech Separation Challenge database, involving a task of separating and recognizing two mixing sentences without assuming advanced knowledge about the identity of the speakers nor about the signal-to-noise ratio. The paper is an extended version of a previous conference paper submitted for the challenge.
Resumo:
Ear recognition, as a biometric, has several advantages. In particular, ears can be measured remotely and are also relatively static in size and structure for each individual. Unfortunately, at present, good recognition rates require controlled conditions. For commercial use, these systems need to be much more robust. In particular, ears have to be recognized from different angles ( poses), under different lighting conditions, and with different cameras. It must also be possible to distinguish ears from background clutter and identify them when partly occluded by hair, hats, or other objects. The purpose of this paper is to suggest how progress toward such robustness might be achieved through a technique that improves ear registration. The approach focuses on 2-D images, treating the ear as a planar surface that is registered to a gallery using a homography transform calculated from scale-invariant feature-transform feature matches. The feature matches reduce the gallery size and enable a precise ranking using a simple 2-D distance algorithm. Analysis on a range of data sets demonstrates the technique to be robust to background clutter, viewing angles up to +/- 13 degrees, and up to 18% occlusion. In addition, recognition remains accurate with masked ear images as small as 20 x 35 pixels.
Resumo:
Significant recent progress has shown ear recognition to be a viable biometric. Good recognition rates have been demonstrated under controlled conditions, using manual registration or with specialised equipment. This paper describes a new technique which improves the robustness of ear registration and recognition, addressing issues of pose variation, background clutter and occlusion. By treating the ear as a planar surface and creating a homography transform using SIFT feature matches, ears can be registered accurately. The feature matches reduce the gallery size and enable a precise ranking using a simple 2D distance algorithm. When applied to the XM2VTS database it gives results comparable to PCA with manual registration. Further analysis on more challenging datasets demonstrates the technique to be robust to background clutter, viewing angles up to +/- 13 degrees and with over 20% occlusion.
Resumo:
This paper provides an integrated overview of the factors which control gelation in a family of dendritic gelators based on lysine building blocks. In particular, we establish that higher generation systems are more effective gelators, amide linkages in the dendron are better than carbamates, and long alkyl chain surface groups and a carboxylic acid at the focal point enhance gelation. The gels are best formed in relatively low polarity solvents with no hydrogen bond donor ability and limited hydrogen bond acceptor capacity. The dendrons with acid groups at the focal point can form two component gels with diaminododecane, and in this case, it is the lower generation dendrons which can avoid steric hindrance and form more effective gels. The stereochemistry of lysine is crucial in self-assembly, with opposite enantiomers disrupting each other's molecular recognition pathways. For the two-component system, stoichiometry is key, if too much diamine is present, dendron-stabilised microcrystals of the diamine begin to form. Interestingly, gelation still occurs in this case, and the systems with amides/alkyl chains are more effective gels, as a consequence of enhanced dendron-dendron intermolecular interactions allowing the microcrystals to form an interconnected network.
Resumo:
Background Rapid Response Systems (RRS) consist of four interrelated and interdependent components; an event detection and trigger mechanism, a response strategy, a governance structure and process improvement system. These multiple components of the RRS pose problems in evaluation as the intervention is complex and cannot be evaluated using a traditional systematic review. Complex interventions in healthcare aimed at changing service delivery and related behaviour of health professionals require a different approach to summarising the evidence. Realist synthesis is such an approach to reviewing research evidence on complex interventions to provide an explanatory analysis of how and why an intervention works or doesn’t work in practice. The core principle is to make explicit the underlying assumptions about how an intervention is suppose to work (ie programme theory) and then use this theory to guide evaluation. Methods A realist synthesis process was used to explain those factors that enable or constrain the success of RRS programmes. Results The findings from the review include the articulation of the RRS programme theories, evaluation of whether these theories are supported or refuted by the research evidence and an evaluation of evidence to explain the underlying reasons why RRS works or doesn’t work in practice. Rival conjectured RRS programme theories were identified to explain the constraining factors regarding implementation of RRS in practice. These programme theories are presented using a logic model to highlight all the components which impact or influence the delivery of RRS programmes in the practice setting. The evidence from the realist synthesis provided the foundation for the development of hypothesis to test and refine the theories in the subsequent stages of the Realist Evaluation PhD study [1]. This information will be useful in providing evidence and direction for strategic and service planning of acute care to improve patient safety in hospital. References: McGaughey J, Blackwood B, O’Halloran P, Trinder T. J. & Porter S. (2010) Realistic Evaluation of Early Warning Systems and the Acute Life-threatening Events – Recognition and Treatment training course for early recognition and management of deteriorating ward-based patients: research protocol. Journal of Advanced Nursing 66 (4), 923-932.
Resumo:
Statement of purpose The purpose of this concurrent session is to present the main findings and recommendations from a five year study evaluating the implementation of Early Warning Systems (EWS) and the Acute Life-threatening Events: Recognition and Treatment (ALERT) course in Northern Ireland. The presentation will provide delegates with an understanding of those factors that enable and constrain successful implementation of EWS and ALERT in practice in order to provide an impetus for change. Methods The research design was a multiple case study approach of four wards in two hospitals in Northern Ireland. It followed the principles of realist evaluation research which allowed empirical data to be gathered to test and refine RRS programme theory [1]. The stages included identifying the programme theories underpinning EWS and ALERT, generating hypotheses, gathering empirical evidence and refining the programme theories. This approach used a variety of mixed methods including individual and focus group interviews, observation and documentary analysis of EWS compliance data and ALERT training records. A within and across case comparison facilitated the development of mid-range theories from the research evidence. Results The official RRS theories developed from the realist synthesis were critically evaluated and compared with the study findings to develop a mid-range theory to explain what works, for whom in what circumstances. The findings of what works suggests that clinical experience, established working relationships, flexible implementation of protocols, ongoing experiential learning, empowerment and pre-emptive management are key to the success of EWS and ALERT implementation. Each concept is presented as ‘context, mechanism and outcome configurations’ to provide an understanding of how the context impacts on individual reasoning or behaviour to produce certain outcomes. Conclusion These findings highlight the combination of factors that can improve the implementation and sustainability of EWS and ALERT and in light of this evidence several recommendations are made to provide policymakers with guidance and direction for future policy development. References: 1. Pawson R and Tilley N. (1997) Realistic Evaluation. Sage Publications; London Type of submission: Concurrent session Source of funding: Sandra Ryan Fellowship funded by the School of Nursing & Midwifery, Queen’s University of Belfast
Resumo:
Symposium Chair: Dr Jennifer McGaughey
Title: Early Warning Systems: problems, pragmatics and potential
Early Warning Systems (EWS) provide a mechanism for staff to recognise, refer and manage deteriorating patients on general hospital wards. Implementation of EWS in practice has required considerable change in the delivery of critical care across hospitals. Drawing their experience of these changes the authors will demonstrate the problems and potential of using EWS to improve patient outcomes.
The first paper (Dr Jennifer McGaughey: Early Warning Systems: what works?) reviews the research evidence regarding the factors that support or constrain the implementation of Early Warning System (EWS) in practice. These findings explain those processes which impact on the successful achievement of patient outcomes. In order to improve detection and standardise practice National EWS have been implemented in the United Kingdom. The second paper (Catherine Plowright: The implementation of the National EWS in a District General Hospital) focuses on the process of implementing and auditing a National EWS. This process improvement is essential to contribute to future collaborative research and collection of robust datasets to improve patient safety as recommended by the Royal College of Physicians (RCP 2012). To successfully implement NEWS in practice requires strategic planning and staff education. The practical issues of training staff is discussed in the third paper. This paper (Collette Laws-Chapman: Simulation as a modality to embed the use of Early Warning Systems) focuses on using simulation and structured debrief to enhance learning in the early recognition and management of deteriorating patients. This session emphasises the importance of cognitive and social skills developed alongside practical skills in the simulated setting.
Resumo:
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Resumo:
Studies have been carried out to recognize individuals from a frontal view using their gait patterns. In previous work, gait sequences were captured using either single or stereo RGB camera systems or the Kinect 1.0 camera system. In this research, we used a new frontal view gait recognition method using a laser based Time of Flight (ToF) camera. In addition to the new gait data set, other contributions include enhancement of the silhouette segmentation, gait cycle estimation and gait image representations. We propose four new gait image representations namely Gait Depth Energy Image (GDE), Partial GDE (PGDE), Discrete Cosine Transform GDE (DGDE) and Partial DGDE (PDGDE). The experimental results show that all the proposed gait image representations produce better accuracy than the previous methods. In addition, we have also developed Fusion GDEs (FGDEs) which achieve better overall accuracy and outperform the previous methods.