288 resultados para Person Recognition
Resumo:
Several approaches have been proposed to recognize handwritten Bengali characters using different curve fitting algorithms and curvature analysis. In this paper, a new algorithm (Curve-fitting Algorithm) to identify various strokes of a handwritten character is developed. The curve-fitting algorithm helps recognizing various strokes of different patterns (line, quadratic curve) precisely. This reduces the error elimination burden heavily. Implementation of this Modified Syntactic Method demonstrates significant improvement in the recognition of Bengali handwritten characters.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
When classifying a signal, ideally we want our classifier to trigger a large response when it encounters a positive example and have little to no response for all other examples. Unfortunately in practice this does not occur with responses fluctuating, often causing false alarms. There exists a myriad of reasons why this is the case, most notably not incorporating the dynamics of the signal into the classification. In facial expression recognition, this has been highlighted as one major research question. In this paper we present a novel technique which incorporates the dynamics of the signal which can produce a strong response when the peak expression is found and essentially suppresses all other responses as much as possible. We conducted preliminary experiments on the extended Cohn-Kanade (CK+) database which shows its benefits. The ability to automatically and accurately recognize facial expressions of drivers is highly relevant to the automobile. For example, the early recognition of “surprise” could indicate that an accident is about to occur; and various safeguards could immediately be deployed to avoid or minimize injury and damage. In this paper, we conducted initial experiments on the extended Cohn-Kanade (CK+) database which shows its benefits.
Resumo:
In 2009, Religious Education is a designated key learning area in Catholic schools in the Archdiocese of Brisbane and, indeed, across Australia. Over the years, though, different conceptualisations of the nature and purpose of religious education have led to the construction of different approaches to the classroom teaching of religion. By investigating the development of religious education policy in the Archdiocese of Brisbane from 1984 to 2003, the study seeks to trace the emergence of new discourses on religious education. The study understands religious education to refer to a lifelong process that occurs through a variety of forms (Moran, 1989). In Catholic schools, it refers both to co-curricula activities, such as retreats and school liturgies, and the classroom teaching of religion. It is the policy framework for the classroom teaching of religion that this study explores. The research was undertaken using a policy case study approach to gain a detailed understanding of how new conceptualisations of religious education emerged at a particular site of policy production, in this case, the Archdiocese of Brisbane. The study draws upon Yeatman’s (1998) description of policy as occurring “when social actors think about what they are doing and why in relation to different and alternative possible futures” (p. 19) and views policy as consisting of more than texts themselves. Policy texts result from struggles over meaning (Taylor, 2004) in which specific discourses are mobilised to support particular views. The study has a particular interest in the analysis of Brisbane religious education policy texts, the discursive practices that surrounded them, and the contexts in which they arose. Policy texts are conceptualised in the study as representing “temporary settlements” (Gale, 1999). Such settlements are asymmetrical, temporary and dependent on context: asymmetrical in that dominant actors are favoured; temporary because dominant actors are always under challenge by other actors in the policy arena; and context - dependent because new situations require new settlements. To investigate the official policy documents, the study used Critical Discourse Analysis (hereafter referred to as CDA) as a research tool that affords the opportunity for researchers to map and chart the emergence of new discourses within the policy arena. As developed by Fairclough (2001), CDA is a three-dimensional application of critical analysis to language. In the Brisbane religious education arena, policy texts formed a genre chain (Fairclough, 2004; Taylor, 2004) which was a focus of the study. There are two features of texts that form genre chains: texts are systematically linked to one another; and, systematic relations of recontextualisation exist between the texts. Fairclough’s (2005) concepts of “imaginary space” and “frameworks for action” (p. 65) within the policy arena were applied to the Brisbane policy arena to investigate the relationship between policy statements and subsequent guidelines documents. Five key findings emerged from the study. First, application of CDA to policy documents revealed that a fundamental reconceptualisation of the nature and purpose of classroom religious education in Catholic schools occurred in the Brisbane policy arena over the last twenty-five years. Second, a disjuncture existed between catechetical discourses that continued to shape religious education policy statements, and educational discourses that increasingly shaped guidelines documents. Third, recontextualisation between policy documents was evident and dependent on the particular context in which religious education occurred. Fourth, at subsequent links in the chain, actors created their own “imaginary space”, thereby altering orders of discourse within the policy arena, with different actors being either foregrounded or marginalised. Fifth, intertextuality was more evident in the later links in the genre chain (i.e. 1994 policy statement and 1997 guidelines document) than in earlier documents. On the basis of the findings of the study, six recommendations are made. First, the institutional Church should carefully consider the contribution that the Catholic school can make to the overall pastoral mission of the diocese in twenty-first century Australia. Second, policymakers should articulate a nuanced understanding of the relationship between catechesis and education with regard to the religion classroom. Third, there should be greater awareness of the connections among policies relating to Catholic schools – especially the connection between enrolment policy and religious education policy. Fourth, there should be greater consistency between policy documents. Fifth, policy documents should be helpful for those to whom they are directed (i.e. Catholic schools, teachers). Sixth, “imaginary space” (Fairclough, 2005) in policy documents needs to be constructed in a way that allows for multiple “frameworks for action” (Fairclough, 2005) through recontextualisation. The findings of this study are significant in a number of ways. For religious educators, the study highlights the need to develop a shared understanding of the nature and purpose of classroom religious education. It argues that this understanding must take into account the multifaith nature of Australian society and the changing social composition of Catholic schools themselves. Greater recognition should be given to the contribution that religious studies courses such as Study of Religion make to the overall religious development of a person. In view of the social composition of Catholic schools, there is also an issue of ecclesiological significance concerning the conceptualisation of the relationship between the institutional Catholic Church and Catholic schools. Finally, the study is of significance because of its application of CDA to religious education policy documents. Use of CDA reveals the foregrounding, marginalising, or excluding of various actors in the policy arena.
Resumo:
This research study investigated the factors that influenced the development of teacher identity in a small cohort of mature-aged graduate pre-service teachers over the course of a one-year Graduate Diploma program (Middle Years). It sought to illuminate the social and relational dynamics of these pre-service teachers’ experiences as they began new ways of being and learning during a newly introduced one-year Graduate Diploma program. A relational-ontological perspective underpinned the relational-cultural framework that was applied in a workshop program as an integral part of this research. A relational-ontological perspective suggests that the development of teacher identity is to be construed more as an ontological process than an epistemological one. Its focus is more on questions surrounding the person and their ‘becoming’ a teacher than about the knowledge they have or will come to have. Hence, drawing on work by researchers such as Alsup (2006), Gilligan, (1982), Isaacs, (2007), Miller (1976), Noddings, (2005), Stout (2001), and Taylor, (1989), teacher identity was defined as an individual pre-service teacher’s unique sense of self as a teacher that included his or her beliefs about teaching and learning (Alsup, 2006; Stout, 2001; Walkington, 2005). Case-study was the preferred methodology within which this research project was framed, and narrative research was used as a method to document the way teacher identity was shaped and negotiated in discursive environments such as teacher education programs, prior experiences, classroom settings and the practicum. The data that was collected included student narratives, student email written reflections, and focus group dialogue. The narrative approach applied in this research context provided the depth of data needed to understand the nature of the mature-aged pre-service teachers’ emerging teacher identities and experiences in the graduate diploma program. Findings indicated that most of the mature-aged graduate pre-service teachers came in to the one-year graduate diploma program with a strong sense of personal and professional selves and well-established reasons why they had chosen to teach Middle Years. Their choice of program involved an expectation of support and welcome to a middle-school community and culture. Two critical issues that emerged from the pre-service teachers’ narratives were the importance they placed on the human support including the affirmation of themselves and their emerging teacher identities. Evidence from this study suggests that the lack of recognition of preservice teachers’ personal and professional selves during the graduate diploma program inhibited the development of a positive middle-school teacher identity. However, a workshop program developed for the participants in this research and addressing a range of practical concerns to beginning teachers offered them a space where they felt both a sense of belonging to a community and where their thoughts and beliefs were recognized and valued. Thus, the workshops provided participants with the positive social and relational dynamics necessary to support them in their developing teacher identities. The overall findings of this research study strongly indicate a need for a relational support structure based on a relational-ontological perspective to be built into the overall course structure of Graduate Pre-service Diplomas in Education to support the development of teacher identity. Such a support structure acknowledges that the pre-service teacher’s learning and formation is socially embedded, relational, and a continual, lifelong process.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.
Resumo:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.