992 resultados para Recognition (Psychology)
Resumo:
Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.
Resumo:
The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.
Resumo:
A system to segment and recognize Australian 4-digit postcodes from address labels on parcels is described. Images of address labels are preprocessed and adaptively thresholded to reduce noise. Projections are used to segment the line and then the characters comprising the postcode. Individual digits are recognized using bispectral features extracted from their parallel beam projections. These features are insensitive to translation, scaling and rotation, and robust to noise. Results on scanned images are presented. The system is currently being improved and implemented to work on-line.
Resumo:
Characteristics of surveillance video generally include low resolution and poor quality due to environmental, storage and processing limitations. It is extremely difficult for computers and human operators to identify individuals from these videos. To overcome this problem, super-resolution can be used in conjunction with an automated face recognition system to enhance the spatial resolution of video frames containing the subject and narrow down the number of manual verifications performed by the human operator by presenting a list of most likely candidates from the database. As the super-resolution reconstruction process is ill-posed, visual artifacts are often generated as a result. These artifacts can be visually distracting to humans and/or affect machine recognition algorithms. While it is intuitive that higher resolution should lead to improved recognition accuracy, the effects of super-resolution and such artifacts on face recognition performance have not been systematically studied. This paper aims to address this gap while illustrating that super-resolution allows more accurate identification of individuals from low-resolution surveillance footage. The proposed optical flow-based super-resolution method is benchmarked against Baker et al.’s hallucination and Schultz et al.’s super-resolution techniques on images from the Terrascope and XM2VTS databases. Ground truth and interpolated images were also tested to provide a baseline for comparison. Results show that a suitable super-resolution system can improve the discriminability of surveillance video and enhance face recognition accuracy. The experiments also show that Schultz et al.’s method fails when dealing surveillance footage due to its assumption of rigid objects in the scene. The hallucination and optical flow-based methods performed comparably, with the optical flow-based method producing less visually distracting artifacts that interfered with human recognition.
Resumo:
This paper argues that teachers’ recognition of children’s cultural practices is an important positive step in helping socio-economically disadvantaged children engage with school literacies. Based on twenty-one longitudinal case studies of children’s literacy development over a three-year period, the authors demonstrate that when children’s knowledges and practices assembled in home and community spheres are treated as valuable material for school learning, children are more likely to invest in the work of acquiring school literacies. However they show also that whilst some children benefit greatly from being allowed to draw on their knowledge of popular culture, sports and the outdoors, other children’s interests may be ignored or excluded. Some differences in teachers’ valuing of home and community cultures appeared to relate to gender dimensions.
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
In automatic facial expression recognition, an increasing number of techniques had been proposed for in the literature that exploits the temporal nature of facial expressions. As all facial expressions are known to evolve over time, it is crucially important for a classifier to be capable of modelling their dynamics. We establish that the method of sparse representation (SR) classifiers proves to be a suitable candidate for this purpose, and subsequently propose a framework for expression dynamics to be efficiently incorporated into its current formulation. We additionally show that for the SR method to be applied effectively, then a certain threshold on image dimensionality must be enforced (unlike in facial recognition problems). Thirdly, we determined that recognition rates may be significantly influenced by the size of the projection matrix \Phi. To demonstrate these, a battery of experiments had been conducted on the CK+ dataset for the recognition of the seven prototypic expressions - anger, contempt, disgust, fear, happiness, sadness and surprise - and comparisons have been made between the proposed temporal-SR against the static-SR framework and state-of-the-art support vector machine.
Resumo:
Few studies have investigated iatrogenic outcomes from the viewpoint of patient experience. To address this anomaly, the broad aim of this research is to explore the lived experience of patient harm. Patient harm is defined as major harm to the patient, either psychosocial or physical in nature, resulting from any aspect of health care. Utilising the method of Consensual Qualitative Research (CQR), in-depth interviews are conducted with twenty-four volunteer research participants who self-report having been severely harmed by an invasive medical procedure. A standardised measure of emotional distress, the Impact of Event Scale (IES), is additionally employed for purposes of triangulation. Thematic analysis of transcript data indicate numerous findings including: (i) difficulties regarding patients‘ prior understanding of risks involved with their medical procedure; (ii) the problematic response of the health system post-procedure; (iii) multiple adverse effects upon life functioning; (iv) limited recourse options for patients; and (v) the approach desired in terms of how patient harm should be systemically handled. In addition, IES results indicate a clinically significant level of distress in the sample as a whole. To discuss findings, a cross-disciplinary approach is adopted that draws upon sociology, medicine, medical anthropology, psychology, philosophy, history, ethics, law, and political theory. Furthermore, an overall explanatory framework is proposed in terms of the master themes of power and trauma. In terms of the theme of power, a postmodernist analysis explores the politics of patient harm, particularly the dynamics surrounding the politics of knowledge (e.g., notions of subjective versus objective knowledge, informed consent, and open disclosure). This analysis suggests that patient care is not the prime function of the health system, which appears more focussed upon serving the interests of those in the upper levels of its hierarchy. In terms of the master theme of trauma, current understandings of posttraumatic stress disorder (PTSD) are critiqued, and based on data from this research as well as the international literature, a new model of trauma is proposed. This model is based upon the principle of homeostasis observed in biology, whereby within every cell or organism a state of equilibrium is sought and maintained. The proposed model identifies several bio-psychosocial markers of trauma across its three main phases. These trauma markers include: (i) a profound sense of loss; (ii) a lack of perceived control; (iii) passive trauma processing responses; (iv) an identity crisis; (v) a quest to fully understand the trauma event; (vi) a need for social validation of the traumatic experience; and (vii) posttraumatic adaption with the possibility of positive change. To further explore the master themes of power and trauma, a natural group interview is carried out at a meeting of a patient support group for arachnoiditis. Observations at this meeting and members‘ stories in general support the homeostatic model of trauma, particularly the quest to find answers in the face of distressing experience, as well as the need for social recognition of that experience. In addition, the sociopolitical response to arachnoiditis highlights how public domains of knowledge are largely constructed and controlled by vested interests. Implications of the data overall are discussed in terms of a cultural revolution being needed in health care to position core values around a prime focus upon patients as human beings.
Resumo:
Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provide a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA). Speaker verification performance for utterances with as little as 2 sec of data taken from the NIST Speaker Recognition Evaluations are presented to provide a clearer picture of the current performance characteristics of these techniques in short utterance conditions.
Resumo:
Gait recognition approaches continue to struggle with challenges including view-invariance, low-resolution data, robustness to unconstrained environments, and fluctuating gait patterns due to subjects carrying goods or wearing different clothes. Although computationally expensive, model based techniques offer promise over appearance based techniques for these challenges as they gather gait features and interpret gait dynamics in skeleton form. In this paper, we propose a fast 3D ellipsoidal-based gait recognition algorithm using a 3D voxel model derived from multi-view silhouette images. This approach directly solves the limitations of view dependency and self-occlusion in existing ellipse fitting model-based approaches. Voxel models are segmented into four components (left and right legs, above and below the knee), and ellipsoids are fitted to each region using eigenvalue decomposition. Features derived from the ellipsoid parameters are modeled using a Fourier representation to retain the temporal dynamic pattern for classification. We demonstrate the proposed approach using the CMU MoBo database and show that an improvement of 15-20% can be achieved over a 2D ellipse fitting baseline.
Resumo:
Gait energy images (GEIs) and its variants form the basis of many recent appearance-based gait recognition systems. The GEI combines good recognition performance with a simple implementation, though it suffers problems inherent to appearance-based approaches, such as being highly view dependent. In this paper, we extend the concept of the GEI to 3D, to create what we call the gait energy volume, or GEV. A basic GEV implementation is tested on the CMU MoBo database, showing improvements over both the GEI baseline and a fused multi-view GEI approach. We also demonstrate the efficacy of this approach on partial volume reconstructions created from frontal depth images, which can be more practically acquired, for example, in biometric portals implemented with stereo cameras, or other depth acquisition systems. Experiments on frontal depth images are evaluated on an in-house developed database captured using the Microsoft Kinect, and demonstrate the validity of the proposed approach.
Resumo:
Compressive Sensing (CS) is a popular signal processing technique, that can exactly reconstruct a signal given a small number of random projections of the original signal, provided that the signal is sufficiently sparse. We demonstrate the applicability of CS in the field of gait recognition as a very effective dimensionality reduction technique, using the gait energy image (GEI) as the feature extraction process. We compare the CS based approach to the principal component analysis (PCA) and show that the proposed method outperforms this baseline, particularly under situations where there are appearance changes in the subject. Applying CS to the gait features also avoids the need to train the models, by using a generalised random projection.
Resumo:
Objective: In Australia and comparable countries, case management has become the dominant process by which public mental health services provide outpatient clinical services to people with severe mental illness. There is recognition that caseload size impacts on service provision and that management of caseloads is an important dimension of overall service management. There has been little empirical investigation, however, of caseload and its management. The present study was undertaken in the context of an industrial agreement in Victoria, Australia that required services to introduce standardized approaches to caseload management. The aims of the present study were therefore to (i) investigate caseload size and approaches to caseload management in Victoria's mental health services; and (ii) determine whether caseload size and/or approach to caseload management is associated with work-related stress or case manager self-efficacy among community mental health professionals employed in Victoria's mental health services. Method: A total of 188 case managers responded to an online cross-sectional survey with both purpose-developed items investigating methods of case allocation and caseload monitoring, and standard measures of work-related stress and case manager personal efficacy. Results: The mean caseload size was 20 per full-time case manager. Both work-related stress scores and case manager personal efficacy scores were broadly comparable with those reported in previous studies. Higher caseloads were associated with higher levels of work-related stress and lower levels of case manager personal efficacy. Active monitoring of caseload was associated with lower scores for work-related stress and higher scores for case manager personal efficacy, regardless of size of caseload. Although caseloads were most frequently monitored by the case manager, there was evidence that monitoring by a supervisor was more beneficial than self-monitoring. Conclusion: Routine monitoring of caseload, especially by a workplace supervisor, may be effective in reducing work-related stress and enhancing case manager personal efficacy. Keywords: case management, caseload, stress
Resumo:
A new approach to pattern recognition using invariant parameters based on higher order spectra is presented. In particular, invariant parameters derived from the bispectrum are used to classify one-dimensional shapes. The bispectrum, which is translation invariant, is integrated along straight lines passing through the origin in bifrequency space. The phase of the integrated bispectrum is shown to be scale and amplification invariant, as well. A minimal set of these invariants is selected as the feature vector for pattern classification, and a minimum distance classifier using a statistical distance measure is used to classify test patterns. The classification technique is shown to distinguish two similar, but different bolts given their one-dimensional profiles. Pattern recognition using higher order spectral invariants is fast, suited for parallel implementation, and has high immunity to additive Gaussian noise. Simulation results show very high classification accuracy, even for low signal-to-noise ratios.
Resumo:
An increased emphasis on community-based care has not ensured that people recovering from psychiatric disorders return to active and valued roles in their local communities. Although clinical recovery remains a priority for mental health services there is increasing recognition of the need for functional recovery to be attained and demonstrated in roles valued by the wider community. With this need in mind, a method for classifying socially-valued role functioning among people with schizophrenia or schizoaffective disorder was developed and trialed. Participants (n = 104) were recruited via mental health, psychosocial rehabilitation, and other community support services. Socially-valued roles were investigated via participation in five categories: (1) self-care and home duties; (2) caring for others; (3) self-development, voluntary work or rehabilitation; (4) formal education or training; and (5) employment. Activities were classified by primary role type and role status level at baseline, six, and 12 months. Current role status was assessed along with highest and lowest status in the previous year. Preliminary psychometric results were favorable. Research applications are now recommended for monitoring socially-valued role functioning in community settings.