852 resultados para Automatic focus
Resumo:
Non-driving related cognitive load and variations of emotional state may impact a driver’s capability to control a vehicle and introduces driving errors. Availability of reliable cognitive load and emotion detection in drivers would benefit the design of active safety systems and other intelligent in-vehicle interfaces. In this study, speech produced by 68 subjects while driving in urban areas is analyzed. A particular focus is on speech production differences in two secondary cognitive tasks, interactions with a co-driver and calls to automated spoken dialog systems (SDS), and two emotional states during the SDS interactions - neutral/negative. A number of speech parameters are found to vary across the cognitive/emotion classes. Suitability of selected cepstral- and production-based features for automatic cognitive task/emotion classification is investigated. A fusion of GMM/SVM classifiers yields an accuracy of 94.3% in cognitive task and 81.3% in emotion classification.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
This paper presents an automated system for 3D assembly of tissue engineering (TE) scaffolds made from biocompatible microscopic building blocks with relatively large fabrication error. It focuses on the pin-into-hole force control developed for this demanding microassembly task. A beam-like gripper with integrated force sensing at a 3 mN resolution with a 500 mN measuring range is designed, and is used to implement an admittance force-controlled insertion using commercial precision stages. Visual-based alignment followed by an insertion is complemented by a haptic exploration strategy using force and position information. The system demonstrates fully automated construction of TE scaffolds with 50 microparts whose dimension error is larger than 5%.
Resumo:
This paper investigates the automatic atti- tude and depth control of a torpedo shaped submarine. Both experimental results and dynamic simulations are used to tune feed- back control loops in order to obtain stable control of yaw, pitch and roll of the craft.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
Speeding remains a significant contributing factor to road trauma internationally, despite increasingly sophisticated speed management strategies being adopted around the world. Increases in travel speed are associated with increases in crash risk and crash severity. As speed choice is a voluntary behaviour, driver perceptions are important to our understanding of speeding and, importantly, to designing effective behavioural countermeasures. The four studies conducted in this program of research represent a comprehensive approach to examining psychosocial influences on driving speeds in two countries that are at very different levels of road safety development: Australia and China. Akers’ social learning theory (SLT) was selected as the theoretical framework underpinning this research and guided the development of key research hypotheses. This theory was chosen because of its ability to encompass psychological, sociological, and criminological perspectives in understanding behaviour, each of which has relevance to speeding. A mixed-method design was used to explore the personal, social, and legal influences on speeding among car drivers in Queensland (Australia) and Beijing (China). Study 1 was a qualitative exploration, via focus group interviews, of speeding among 67 car drivers recruited from south east Queensland. Participants were assigned to groups based on their age and gender, and additionally, according to whether they self-identified as speeding excessively or rarely. This study aimed to elicit information about how drivers conceptualise speeding as well as the social and legal influences on driving speeds. The findings revealed a wide variety of reasons and circumstances that appear to be used as personal justifications for exceeding speed limits. Driver perceptions of speeding as personally and socially acceptable, as well as safe and necessary were common. Perceptions of an absence of danger associated with faster driving speeds were evident, particularly with respect to driving alone. An important distinction between the speed-based groups related to the attention given to the driving task. Rare speeders expressed strong beliefs about the need to be mindful of safety (self and others) while excessive speeders referred to the driving task as automatic, an absent-minded endeavour, and to speeding as a necessity in order to remain alert and reduce boredom. For many drivers in this study, compliance with speed limits was expressed as discretionary rather than mandatory. Social factors, such as peer and parental influence were widely discussed in Study 1 and perceptions of widespread community acceptance of speeding were noted. In some instances, the perception that ‘everybody speeds’ appeared to act as one rationale for the need to raise speed limits. Self-presentation, or wanting to project a positive image of self was noted, particularly with respect to concealing speeding infringements from others to protect one’s image as a trustworthy and safe driver. The influence of legal factors was also evident. Legal sanctions do not appear to influence all drivers to the same extent. For instance, fear of apprehension appeared to play a role in reducing speeding for many, although previous experiences of detection and legal sanctions seemed to have had limited influence on reducing speeding among some drivers. Disregard for sanctions (e.g., driving while suspended), fraudulent demerit point use, and other strategies to avoid detection and punishment were widely and openly discussed. In Study 2, 833 drivers were recruited from roadside service stations in metropolitan and regional locations in Queensland. A quantitative research strategy assessed the relative contribution of personal, social, and legal factors to recent and future self-reported speeding (i.e., frequency of speeding and intentions to speed in the future). Multivariate analyses examining a range of factors drawn from SLT revealed that factors including self-identity (i.e., identifying as someone who speeds), favourable definitions (attitudes) towards speeding, personal experiences of avoiding detection and punishment for speeding, and perceptions of family and friends as accepting of speeding were all significantly associated with greater self-reported speeding. Study 3 was an exploratory, qualitative investigation of psychosocial factors associated with speeding among 35 Chinese drivers who were recruited from the membership of a motoring organisation and a university in Beijing. Six focus groups were conducted to explore similar issues to those examined in Study 1. The findings of Study 3 revealed many similarities with respect to the themes that arose in Australia. For example, there were similarities regarding personal justifications for speeding, such as the perception that posted limits are unreasonably low, the belief that individual drivers are able to determine safe travel speeds according to personal comfort with driving fast, and the belief that drivers possess adequate skills to control a vehicle at high speed. Strategies to avoid detection and punishment were also noted, though they appeared more widespread in China and also appeared, in some cases, to involve the use of a third party, a topic that was not reported by Australian drivers. Additionally, higher perceived enforcement tolerance thresholds were discussed by Chinese participants. Overall, the findings indicated perceptions of a high degree of community acceptance of speeding and a perceived lack of risk associated with speeds that were well above posted speed limits. Study 4 extended the exploratory research phase in China with a quantitative investigation involving 299 car drivers recruited from car washes in Beijing. Results revealed a relatively inexperienced sample with less than 5 years driving experience, on average. One third of participants perceived that the certainty of penalties when apprehended was low and a similar proportion of Chinese participants reported having previously avoided legal penalties when apprehended for speeding. Approximately half of the sample reported that legal penalties for speeding were ‘minimally to not at all’ severe. Multivariate analyses revealed that past experiences of avoiding detection and punishment for speeding, as well as favourable attitudes towards speeding, and perceptions of strong community acceptance of speeding were most strongly associated with greater self-reported speeding in the Chinese sample. Overall, the results of this research make several important theoretical contributions to the road safety literature. Akers’ social learning theory was found to be robust across cultural contexts with respect to speeding; similar amounts of variance were explained in self-reported speeding in the quantitative studies conducted in Australia and China. Historically, SLT was devised as a theory of deviance and posits that deviance and conformity are learned in the same way, with the balance of influence stemming from the ways in which behaviour is rewarded and punished (Akers, 1998). This perspective suggests that those who speed and those who do not are influenced by the same mechanisms. The inclusion of drivers from both ends of the ‘speeding spectrum’ in Study 1 provided an opportunity to examine the wider utility of SLT across the full range of the behaviour. One may question the use of a theory of deviance to investigate speeding, a behaviour that could, arguably, be described as socially acceptable and prevalent. However, SLT seemed particularly relevant to investigating speeding because of its inclusion of association, imitation, and reinforcement variables which reflect the breadth of factors already found to be potentially influential on driving speeds. In addition, driving is a learned behaviour requiring observation, guidance, and practice. Thus, the reinforcement and imitation concepts are particularly relevant to this behaviour. Finally, current speed management practices are largely enforcement-based and rely on the principles of behavioural reinforcement captured within the reinforcement component of SLT. Thus, the application of SLT to a behaviour such as speeding offers promise in advancing our understanding of the factors that influence speeding, as well as extending our knowledge of the application of SLT. Moreover, SLT could act as a valuable theoretical framework with which to examine other illegal driving behaviours that may not necessarily be seen as deviant by the community (e.g., mobile phone use while driving). This research also made unique contributions to advancing our understanding of the key components and the overall structure of Akers’ social learning theory. The broader SLT literature is lacking in terms of a thorough structural understanding of the component parts of the theory. For instance, debate exists regarding the relevance of, and necessity for including broader social influences in the model as captured by differential association. In the current research, two alternative SLT models were specified and tested in order to better understand the nature and extent of the influence of differential association on behaviour. Importantly, the results indicated that differential association was able to make a unique contribution to explaining self-reported speeding, thereby negating the call to exclude it from the model. The results also demonstrated that imitation was a discrete theoretical concept that should also be retained in the model. The results suggest a need to further explore and specify mechanisms of social influence in the SLT model. In addition, a novel approach was used to operationalise SLT variables by including concepts drawn from contemporary social psychological and deterrence-based research to enhance and extend the way that SLT variables have traditionally been examined. Differential reinforcement was conceptualised according to behavioural reinforcement principles (i.e., positive and negative reinforcement and punishment) and incorporated concepts of affective beliefs, anticipated regret, and deterrence-related concepts. Although implicit in descriptions of SLT, little research has, to date, made use of the broad range of reinforcement principles to understand the factors that encourage or inhibit behaviour. This approach has particular significance to road user behaviours in general because of the deterrence-based nature of many road safety countermeasures. The concept of self-identity was also included in the model and was found to be consistent with the definitions component of SLT. A final theoretical contribution was the specification and testing of a full measurement model prior to model testing using structural equation modelling. This process is recommended in order to reduce measurement error by providing an examination of the psychometric properties of the data prior to full model testing. Despite calls for such work for a number of decades, the current work appears to be the only example of a full measurement model of SLT. There were also a number of important practical implications that emerged from this program of research. Firstly, perceptions regarding speed enforcement tolerance thresholds were highlighted as a salient influence on driving speeds in both countries. The issue of enforcement tolerance levels generated considerable discussion among drivers in both countries, with Australian drivers reporting lower perceived tolerance levels than Chinese drivers. It was clear that many drivers used the concept of an enforcement tolerance in determining their driving speed, primarily with the desire to drive faster than the posted speed limit, yet remaining within a speed range that would preclude apprehension by police. The quantitative results from Studies 2 and 4 added support to these qualitative findings. Together, the findings supported previous research and suggested that a travel speed may not be seen as illegal until that speed reaches a level over the prescribed enforcement tolerance threshold. In other words, the enforcement tolerance appears to act as a ‘de facto’ speed limit, replacing the posted limit in the minds of some drivers. The findings from the two studies conducted in China (Studies 2 and 4) further highlighted the link between perceived enforcement tolerances and a ‘de facto’ speed limit. Drivers openly discussed driving at speeds that were well above posted speed limits and some participants noted their preference for driving at speeds close to ‘50% above’ the posted limit. This preference appeared to be shaped by the perception that the same penalty would be imposed if apprehended, irrespective of what speed they travelling (at least up to 50% above the limit). Further research is required to determine whether the perceptions of Chinese drivers are mainly influenced by the Law of the People’s Republic of China or by operational practices. Together, the findings from both studies in China indicate that there may be scope to refine enforcement tolerance levels, as has happened in other jurisdictions internationally over time, in order to reduce speeding. Any attempts to do so would likely be assisted by the provision of information about the legitimacy and purpose of speed limits as well as risk factors associated with speeding because these issues were raised by Chinese participants in the qualitative research phase. Another important practical implication of this research for speed management in China is the way in which penalties are determined. Chinese drivers described perceptions of unfairness and a lack of transparency in the enforcement system because they were unsure of the penalty that they would receive if apprehended. Steps to enhance the perceived certainty and consistency of the system to promote a more equitable approach to detection and punishment would appear to be welcomed by the general driving public and would be more consistent with the intended theoretical (deterrence) basis that underpins the current speed enforcement approach. The use of mandatory, fixed penalties may assist in this regard. In many countries, speeding attracts penalties that are dependent on the severity of the offence. In China, there may be safety benefits gained from the introduction of a similar graduated scale of speeding penalties and fixed penalties might also help to address the issue of uncertainty about penalties and related perceptions of unfairness. Such advancements would be in keeping with the principles of best practice for speed management as identified by the World Health Organisation. Another practical implication relating to legal penalties, and applicable to both cultural contexts, relates to the issues of detection and punishment avoidance. These two concepts appeared to strongly influence speeding in the current samples. In Australia, detection avoidance strategies reported by participants generally involved activities that are not illegal (e.g., site learning and remaining watchful for police vehicles). The results from China were similar, although a greater range of strategies were reported. The most common strategy reported in both countries for avoiding detection when speeding was site learning, or familiarisation with speed camera locations. However, a range of illegal practices were also described by Chinese drivers (e.g., tampering with or removing vehicle registration plates so as to render the vehicle unidentifiable on camera and use of in-vehicle radar detectors). With regard to avoiding punishment when apprehended, a range of strategies were reported by drivers from both countries, although a greater range of strategies were reported by Chinese drivers. As the results of the current research indicated that detection avoidance was strongly associated with greater self-reported speeding in both samples, efforts to reduce avoidance opportunities are strongly recommended. The practice of randomly scheduling speed camera locations, as is current practice in Queensland, offers one way to minimise site learning. The findings of this research indicated that this practice should continue. However, they also indicated that additional strategies are needed to reduce opportunities to evade detection. The use of point-to-point speed detection (also known as sectio
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Resumo:
Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
Uncooperative iris identification systems at a distance and on the move often suffer from poor resolution and poor focus of the captured iris images. The lack of pixel resolution and well-focused images significantly degrades the iris recognition performance. This paper proposes a new approach to incorporate the focus score into a reconstruction-based super-resolution process to generate a high resolution iris image from a low resolution and focus inconsistent video sequence of an eye. A reconstruction-based technique, which can incorporate middle and high frequency components from multiple low resolution frames into one desired super-resolved frame without introducing false high frequency components, is used. A new focus assessment approach is proposed for uncooperative iris at a distance and on the move to improve performance for variations in lighting, size and occlusion. A novel fusion scheme is then proposed to incorporate the proposed focus score into the super-resolution process. The experiments conducted on the The Multiple Biometric Grand Challenge portal database shows that our proposed approach achieves an EER of 2.1%, outperforming the existing state-of-the-art averaging signal-level fusion approach by 19.2% and the robust mean super-resolution approach by 8.7%.
Resumo:
It is possible to estimate the depth of focus (DOF) of the eye directly from wavefront measurements using various retinal image quality metrics (IQMs). In such methods, DOF is defined as the range of defocus error that degrades the retinal image quality calculated from IQMs to a certain level of the maximum value. Although different retinal image quality metrics are used, currently there have been two arbitrary threshold levels adopted, 50% and 80%. There has been limited study of the relationship between these threshold levels and the actual measured DOF. We measured the subjective DOF in a group of 17 normal subjects, and used through-focus augmented visual Strehl ratio based on optical transfer function (VSOTF) derived from their wavefront aberrations as the IQM. For each subject, a VSOTF threshold level was derived that would match the subjectively measured DOF. Significant correlation was found between the subject’s estimated threshold level and the HOA RMS (Pearson’s r=0.88, p<0.001). The linear correlation can be used to estimate the threshold level for each individual subject, subsequently leading to a method for estimating individual’s DOF from a single measurement of their wavefront aberrations.
Resumo:
Type unions, pointer variables and function pointers are a long standing source of subtle security bugs in C program code. Their use can lead to hard-to-diagnose crashes or exploitable vulnerabilities that allow an attacker to attain privileged access over classified data. This paper describes an automatable framework for detecting such weaknesses in C programs statically, where possible, and for generating assertions that will detect them dynamically, in other cases. Exclusively based on analysis of the source code, it identifies required assertions using a type inference system supported by a custom made symbol table. In our preliminary findings, our type system was able to infer the correct type of unions in different scopes, without manual code annotations or rewriting. Whenever an evaluation is not possible or is difficult to resolve, appropriate runtime assertions are formed and inserted into the source code. The approach is demonstrated via a prototype C analysis tool.