800 resultados para Face recognition from video
Resumo:
This paper describes the real time global vision system for the robot soccer team the RoboRoos. It has a highly optimised pipeline that includes thresholding, segmenting, colour normalising, object recognition and perspective and lens correction. It has a fast ‘paint’ colour calibration system that can calibrate in any face of the YUV or HSI cube. It also autonomously selects both an appropriate camera gain and colour gains robot regions across the field to achieve colour uniformity. Camera geometry calibration is performed automatically from selection of keypoints on the field. The system achieves a position accuracy of better than 15mm over a 4m × 5.5m field, and orientation accuracy to within 1°. It processes 614 × 480 pixels at 60Hz on a 2.0GHz Pentium 4 microprocessor.
Resumo:
In 2009, Religious Education is a designated key learning area in Catholic schools in the Archdiocese of Brisbane and, indeed, across Australia. Over the years, though, different conceptualisations of the nature and purpose of religious education have led to the construction of different approaches to the classroom teaching of religion. By investigating the development of religious education policy in the Archdiocese of Brisbane from 1984 to 2003, the study seeks to trace the emergence of new discourses on religious education. The study understands religious education to refer to a lifelong process that occurs through a variety of forms (Moran, 1989). In Catholic schools, it refers both to co-curricula activities, such as retreats and school liturgies, and the classroom teaching of religion. It is the policy framework for the classroom teaching of religion that this study explores. The research was undertaken using a policy case study approach to gain a detailed understanding of how new conceptualisations of religious education emerged at a particular site of policy production, in this case, the Archdiocese of Brisbane. The study draws upon Yeatman’s (1998) description of policy as occurring “when social actors think about what they are doing and why in relation to different and alternative possible futures” (p. 19) and views policy as consisting of more than texts themselves. Policy texts result from struggles over meaning (Taylor, 2004) in which specific discourses are mobilised to support particular views. The study has a particular interest in the analysis of Brisbane religious education policy texts, the discursive practices that surrounded them, and the contexts in which they arose. Policy texts are conceptualised in the study as representing “temporary settlements” (Gale, 1999). Such settlements are asymmetrical, temporary and dependent on context: asymmetrical in that dominant actors are favoured; temporary because dominant actors are always under challenge by other actors in the policy arena; and context - dependent because new situations require new settlements. To investigate the official policy documents, the study used Critical Discourse Analysis (hereafter referred to as CDA) as a research tool that affords the opportunity for researchers to map and chart the emergence of new discourses within the policy arena. As developed by Fairclough (2001), CDA is a three-dimensional application of critical analysis to language. In the Brisbane religious education arena, policy texts formed a genre chain (Fairclough, 2004; Taylor, 2004) which was a focus of the study. There are two features of texts that form genre chains: texts are systematically linked to one another; and, systematic relations of recontextualisation exist between the texts. Fairclough’s (2005) concepts of “imaginary space” and “frameworks for action” (p. 65) within the policy arena were applied to the Brisbane policy arena to investigate the relationship between policy statements and subsequent guidelines documents. Five key findings emerged from the study. First, application of CDA to policy documents revealed that a fundamental reconceptualisation of the nature and purpose of classroom religious education in Catholic schools occurred in the Brisbane policy arena over the last twenty-five years. Second, a disjuncture existed between catechetical discourses that continued to shape religious education policy statements, and educational discourses that increasingly shaped guidelines documents. Third, recontextualisation between policy documents was evident and dependent on the particular context in which religious education occurred. Fourth, at subsequent links in the chain, actors created their own “imaginary space”, thereby altering orders of discourse within the policy arena, with different actors being either foregrounded or marginalised. Fifth, intertextuality was more evident in the later links in the genre chain (i.e. 1994 policy statement and 1997 guidelines document) than in earlier documents. On the basis of the findings of the study, six recommendations are made. First, the institutional Church should carefully consider the contribution that the Catholic school can make to the overall pastoral mission of the diocese in twenty-first century Australia. Second, policymakers should articulate a nuanced understanding of the relationship between catechesis and education with regard to the religion classroom. Third, there should be greater awareness of the connections among policies relating to Catholic schools – especially the connection between enrolment policy and religious education policy. Fourth, there should be greater consistency between policy documents. Fifth, policy documents should be helpful for those to whom they are directed (i.e. Catholic schools, teachers). Sixth, “imaginary space” (Fairclough, 2005) in policy documents needs to be constructed in a way that allows for multiple “frameworks for action” (Fairclough, 2005) through recontextualisation. The findings of this study are significant in a number of ways. For religious educators, the study highlights the need to develop a shared understanding of the nature and purpose of classroom religious education. It argues that this understanding must take into account the multifaith nature of Australian society and the changing social composition of Catholic schools themselves. Greater recognition should be given to the contribution that religious studies courses such as Study of Religion make to the overall religious development of a person. In view of the social composition of Catholic schools, there is also an issue of ecclesiological significance concerning the conceptualisation of the relationship between the institutional Catholic Church and Catholic schools. Finally, the study is of significance because of its application of CDA to religious education policy documents. Use of CDA reveals the foregrounding, marginalising, or excluding of various actors in the policy arena.
Resumo:
Microphone arrays have been used in various applications to capture conversations, such as in meetings and teleconferences. In many cases, the microphone and likely source locations are known \emph{a priori}, and calculating beamforming filters is therefore straightforward. In ad-hoc situations, however, when the microphones have not been systematically positioned, this information is not available and beamforming must be achieved blindly. In achieving this, a commonly neglected issue is whether it is optimal to use all of the available microphones, or only an advantageous subset of these. This paper commences by reviewing different approaches to blind beamforming, characterising them by the way they estimate the signal propagation vector and the spatial coherence of noise in the absence of prior knowledge of microphone and speaker locations. Following this, a novel clustered approach to blind beamforming is motivated and developed. Without using any prior geometrical information, microphones are first grouped into localised clusters, which are then ranked according to their relative distance from a speaker. Beamforming is then performed using either the closest microphone cluster, or a weighted combination of clusters. The clustered algorithms are compared to the full set of microphones in experiments on a database recorded on different ad-hoc array geometries. These experiments evaluate the methods in terms of signal enhancement as well as performance on a large vocabulary speech recognition task.
Resumo:
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the user. For this reason, microphone arrays are popular and have been used in a wide range of applications such as teleconferencing, hearing aids, speaker tracking, and as the front-end to speech recognition systems. With advances in sensor and sensor network technology, there is considerable potential for applications that employ ad-hoc networks of microphone-equipped devices collaboratively as a virtual microphone array. By allowing such devices to be distributed throughout the users’ environment, the microphone positions are no longer constrained to traditional fixed geometrical arrangements. This flexibility in the means of data acquisition allows different audio scenes to be captured to give a complete picture of the working environment. In such ad-hoc deployment of microphone sensors, however, the lack of information about the location of devices and active speakers poses technical challenges for array signal processing algorithms which must be addressed to allow deployment in real-world applications. While not an ad-hoc sensor network, conditions approaching this have in effect been imposed in recent National Institute of Standards and Technology (NIST) ASR evaluations on distant microphone recordings of meetings. The NIST evaluation data comes from multiple sites, each with different and often loosely specified distant microphone configurations. This research investigates how microphone array methods can be applied for ad-hoc microphone arrays. A particular focus is on devising methods that are robust to unknown microphone placements in order to improve the overall speech quality and recognition performance provided by the beamforming algorithms. In ad-hoc situations, microphone positions and likely source locations are not known and beamforming must be achieved blindly. There are two general approaches that can be employed to blindly estimate the steering vector for beamforming. The first is direct estimation without regard to the microphone and source locations. An alternative approach is instead to first determine the unknown microphone positions through array calibration methods and then to use the traditional geometrical formulation for the steering vector. Following these two major approaches investigated in this thesis, a novel clustered approach which includes clustering the microphones and selecting the clusters based on their proximity to the speaker is proposed. Novel experiments are conducted to demonstrate that the proposed method to automatically select clusters of microphones (ie, a subarray), closely located both to each other and to the desired speech source, may in fact provide a more robust speech enhancement and recognition than the full array could.
Resumo:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.
Resumo:
This paper investigates the elements which support innovative and entrepreneurial activity in New Zealand’s state owned enterprises (SOEs). An inductive case study design, involving interview data, textual analysis, and observation, was applied to three SOEs. Findings reveal that those aspects typically associated with entrepreneurship, such as innovation, risk acceptance, pro-activeness and growth, are often supported by a number of unexpected elements within the public sector. These elements include culture, branding, operational excellence, cost efficiency, and knowledge transfer. The implications are twofold. First, that innovative and entrepreneurial activity in the public sector can go beyond policy-making, with SOEs representing an important policy decision and sector of the New Zealand Government. And second, that the impact of several SOEs on international markets suggests competition on the global stage will increasingly come from both public and private sector organizations.
Resumo:
Research in the early years places increasing importance on participatory methods to engage children. The playback of video-recording to stimulate conversation is a research method that enables children’s accounts to be heard and attends to a participatory view. During video-stimulated sessions, participants watch an extract of video-recording of a specific event in which they were involved, and then account for their participation in that event. Using an interactional perspective, this paper draws distinctions between video-stimulated accounts and a similar research method, popular in education, that of video-stimulated recall. Reporting upon a study of young children’s interactions in a playground, video-stimulated accounts are explicated to show how the participants worked toward the construction of events in the video-stimulated session. This paper discusses how the children account for complex matters within their social worlds, and manage the accounting of others in the video-stimulated session. When viewed from an interactional perspective and used alongside fine grained analytic approaches, video-stimulated accounts are an effective method to provide the standpoint of the children involved and further the competent child paradigm.
Resumo:
Background Most questionnaires used for physical activity (PA) surveillance have been developed for adults aged ≤65 years. Given the health benefits of PA for older adults and the aging of the population, it is important to include adults aged 65+ years in PA surveillance. However, few studies have examined how well older adults understand PA surveillance questionnaires. This study aimed to document older adults’ understanding of questions from the International PA Questionnaire (IPAQ), which is used worldwide for PA surveillance. Methods Participants were 41 community-dwelling adults aged 65-89 years. They each completed IPAQ in a face-to-face semi-structured interview, using the “think-aloud” method, in which they expressed their thoughts out loud as they answered IPAQ questions. Interviews were transcribed and coded according to a three-stage model: understanding the intent of the question; performing the primary task (conducting the mental operations required to formulate a response); and response formatting (mapping the response into pre-specified response options). Results Most difficulties occurred during the understanding and performing the primary task stages. Errors included recalling PA in an “average” week, not in the previous 7 days; including PA lasting ≤10 minutes/session; reporting the same PA twice or thrice; and including the total time of an activity for which only a part of that time was at the intensity specified in the question. Participants were unclear what activities fitted within a question’s scope and used a variety of strategies for determining the frequency and duration of their activities. Participants experienced more difficulties with the moderate-intensity PA and walking questions than with the vigorous-intensity PA questions. The sitting time question, particularly difficult for many participants, required the use of an answer strategy different from that used to answer questions about PA. Conclusions These findings indicate a need for caution in administering IPAQ to adults aged ≥65 years. Most errors resulted in over-reporting, although errors resulting in under-reporting were also noted. Given the nature of the errors made by participants, it is possible that similar errors occur when IPAQ is used in younger populations and that the errors identified could be minimized with small modifications to IPAQ.
Resumo:
In this paper we propose a new method for utilising phase information by complementing it with traditional magnitude-only spectral subtraction speech enhancement through Complex Spectrum Subtraction (CSS). The proposed approach has the following advantages over traditional magnitude-only spectral subtraction: (a) it introduces complementary information to the enhancement algorithm; (b) it reduces the total number of algorithmic parameters, and; (c) is designed for improving clean speech magnitude spectra and is therefore suitable for both automatic speech recognition (ASR) and speech perception applications. Oracle-based ASR experiments verify this approach, showing an average of 20% relative word accuracy improvements when accurate estimates of the phase spectrum are available. Based on sinusoidal analysis and assuming stationarity between observations (which is shown to be better approximated as the frame rate is increased), this paper also proposes a novel method for acquiring the phase information called Phase Estimation via Delay Projection (PEDEP). Further oracle ASR experiments validate the potential for the proposed PEDEP technique in ideal conditions. Realistic implementation of CSS with PEDEP shows performance comparable to state of the art spectral subtraction techniques in a range of 15-20 dB signal-to-noise ratio environments. These results clearly demonstrate the potential for using phase spectra in spectral subtractive enhancement applications, and at the same time highlight the need for deriving more accurate phase estimates in a wider range of noise conditions.
Resumo:
This paper explores the idea of virtual participation through the historical example of the republic of letters in early modern Europe (circa 1500-1800). By reflecting on the construction of virtuality in a historical context, and more specifically in a pre-digital environment, this paper calls attention to accusations of technological determinism in ongoing research concerning the affordances of the Internet and related media of communication. It argues that ‘the virtual’ is not synonymous with ‘the digital’ and suggests that, in order to articulate what is novel about modern technologies, we must first understand the social interactions underpinning the relationships which are facilitated through those technologies. By analysing the construction of virtuality in a pre-digital environment, this paper thus offers a baseline from which scholars might consider what is different about the modes of interaction and communication being engaged in via modern media.
Resumo:
Web applications such as blogs, wikis, video and photo sharing sites, and social networking systems have been termed ‘Web 2.0’ to highlight an arguably more open, collaborative, personalisable, and therefore more participatory internet experience than what had previously been possible. Giving rise to a culture of participation, an increasing number of these social applications are now available on mobile phones where they take advantage of device-specific features such as sensors, location and context awareness. This international volume of book chapters will make a contribution towards exploring and better understanding the opportunities and challenges provided by tools, interfaces, methods and practices of social and mobile technology that enable participation and engagement. It brings together an international group of academics and practitioners from a diverse range of disciplines such as computing and engineering, social sciences, digital media and human-computer interaction to critically examine a range of applications of social and mobile technology, such as social networking, mobile interaction, wikis, twitter, blogging, virtual worlds, shared displays and urban sceens, and their impact to foster community activism, civic engagement and cultural citizenship.
Resumo:
In mobile videos, small viewing size and bitrate limitation often cause unpleasant viewing experiences, which is particularly important for fast-moving sports videos. For optimizing the overall user experience of viewing sports videos on mobile phones, this paper explores the benefits of emphasizing Region of Interest (ROI) by 1) zooming in and 2) enhancing the quality. The main goal is to measure the effectiveness of these two approaches and determine which one is more effective. To obtain a more comprehensive understanding of the overall user experience, the study considers user’s interest in video content and user’s acceptance of the perceived video quality, and compares the user experience in sports videos with other content types such as talk shows. The results from a user study with 40 subjects demonstrate that zooming and ROI-enhancement are both effective in improving the overall user experience with talk show and mid-shot soccer videos. However, for the full-shot scenes in soccer videos, only zooming is effective while ROI-enhancement has a negative effect. Moreover, user’s interest in video content directly affects not only the user experience and the acceptance of video quality, but also the effect of content type on the user experience. Finally, the overall user experience is closely related to the degree of the acceptance of video quality and the degree of the interest in video content. This study is valuable in exploiting effective approaches to improve user experience, especially in mobile sports video streaming contexts, whereby the available bandwidth is usually low or limited. It also provides further understanding of the influencing factors of user experience.
Resumo:
Today, participatory or citizen journalism – journalism which enables readers to become writers – exists online and offline in a variety of forms and formats, operates under a number of editorial schemes, and focusses on a wide range of topics from the specialist to the generic, and the micro-local to the global. Key models in this phenomenon include veteran sites Slashdot and Indymedia, as well as news-related Weblogs; more recent additions into the mix have been the South Korean OhmyNews, which in 2003 was “the most influential online news site in that country, attracting an estimated 2 million readers a day” (Gillmor, 2003a, p. 7), with its new Japanese and international offshoots, as well as the Wikipedia with its highly up-to-date news and current events section and its more recent offshoot Wikinews, and even citizen-produced video news as it is found in sites such as YouTube and Current.tv.
Resumo:
The recognition that Web 2.0 applications and social media sites will strengthen and improve interaction between governments and citizens has resulted in a global push into new e-democracy or Government 2.0 spaces. These typically follow government-to-citizen (g2c) or citizen-to-citizen (c2c) models, but both these approaches are problematic: g2c is often concerned more with service delivery to citizens as clients, or exists to make a show of ‘listening to the public’ rather than to genuinely source citizen ideas for government policy, while c2c often takes place without direct government participation and therefore cannot ensure that the outcomes of citizen deliberations are accepted into the government policy-making process. Building on recent examples of Australian Government 2.0 initiatives, we suggest a new approach based on government support for citizen-to-citizen engagement, or g4c2c, as a workable compromise, and suggest that public service broadcasters should play a key role in facilitating this model of citizen engagement.
Resumo:
Aim and objective: The primary aim was to examine the prevalence of poststroke depression in Chinese stroke survivors six months after discharge from a rehabilitation hospital. A second aim was to determine whether six-month poststroke depression was associated with psychological, social and physical outcomes and demographic variables.---------- Background: There has been increasing recognition of the influence of depression on poststroke recovery. While some previous studies report associations between depression and social, psychological, physical and clinical outcomes, few studies had sufficient sample sizes for regression analysis thereby limiting the clinical applicability of their findings. ---------- Design: A cross-sectional design was used.---------- Method: Data were collected from 124 male and 86 female stroke survivors (mean age 71Æ7, SD 10Æ2 years). The Geriatric Depression Scale was used to measure depression, the State Self-esteem Scale to measure state self-esteem, the London Handicap Scale to measure participation restriction, the Social Support Questionnaire to measure satisfaction with social support and the Modified Barthel Index to measure functional ability. Results. Forty-two survivors (20Æ5%) reported mild and 33 (16Æ1%) reported severe depression. The presence of depression was associated with low levels of state self-esteem, social support satisfaction and functional ability. Logistic regression analysis revealed that these variables were statistically significant in predicting the probability of having depression (p < 0Æ05). ---------- Conclusions: Analyses in the present study revealed distinct patterns of correlates of depression, and the results were in agreement with prior studies that depression has a consistent positive ssociation with physical disability, living arrangements and social support and no significant association with the different types of brain lesion. Relevance to clinical practice. There is a need, routinely, to assess stroke survivors for depression and, where necessary, to intervene with the aim of enhancing psychological and social well-being.