818 resultados para speaker diarization
Resumo:
Interacting with technology within a vehicle environment using a voice interface can greatly reduce the effects of driver distraction. Most current approaches to this problem only utilise the audio signal, making them susceptible to acoustic noise. An obvious approach to circumvent this is to use the visual modality in addition. However, capturing, storing and distributing audio-visual data in a vehicle environment is very costly and difficult. One current dataset available for such research is the AVICAR [1] database. Unfortunately this database is largely unusable due to timing mismatch between the two streams and in addition, no protocol is available. We have overcome this problem by re-synchronising the streams on the phone-number portion of the dataset and established a protocol for further research. This paper presents the first audio-visual results on this dataset for speaker-independent speech recognition. We hope this will serve as a catalyst for future research in this area.
Resumo:
“What did you think you were doing?” Was the question posed by the conference organizers to me as the inventor and constructor of the first working Tangible Interfaces over 40 years ago. I think the question was intended to encourage me to talk about the underlying ideas and intentionality rather than describe an endless sequence of electronic bricks and that is what I shall do in this presentation. In the sixties the prevalent idea for a graphics interface was an analogue with sketching which was to somehow be understood by the computer as three dimensional form. I rebelled against this notion for reasons which I will explain in the presentation and instead came up with tangible physical three dimensional intelligent objects. I called these first prototypes “Intelligent Physical Modelling Systems” which is a really dumb name for an obvious concept. I am eternally grateful to Hiroshi Ishii for coining the term “Tangible User Interfaces” - the same idea but with a much smarter name. Another motivator was user involvement in the design process, and that led to the Generator (1979) project with Cedric Price for the world’s first intelligent building capable of organizing itself in response to the appetites of the users. The working model of that project is in MoMA. And the same motivation led to a self builders design kit (1980) for Walter Segal which facilitated self-builders to design their own houses. And indeed as the organizer’s question implied, the motivation and intentionality of these projects developed over the years in step with advancing technology. The speaker will attempt to articulate these changes with medical, psychological and educational examples. Much of this later work indeed stemming from the Media Lab where we are talking. Related topics such as “tangible thinking” and “intelligent teacups” will be introduced and the presentation will end with some speculations for the future. The presentation will be given against a background of images of early prototypes many of which have never been previously published.
Resumo:
This paper describes the development of a substantive theory about police racism as the most significant factor, from among a number of competing societal based explanations, in accounting for Aboriginal over-representation in police arrest rates.
Resumo:
Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.
Resumo:
The Learning by Design Workshop Program 2010, a part of the Queensland Government Unlimited: Designing for the Asia Pacific Event Program, was a one-day professional development design thinking workshop run on October 9, 2011 at The Edge, State Library of Queensland for self-selected public and private secondary school teachers from the subject areas of Visual Art, Graphics and Industrial Technology and Design. Participants were drawn from a database of Brisbane and regional Queensland schools from the goDesign and Living City Workshop Programs. It aimed to generate leadership within schools for design-led education and creative thinking and give teachers a rare opportunity to work with professional designers to generate future strategies for design-based learning. Teachers were introduced to the concept of design thinking in education by international keynote speakers CJ Lim (Studio 8 Architects) and Jeb Brugmann (The Next Practice), national speaker Oliver Freeman (NevilleFreeman Agency) and three Queensland speakers, Alexander Loterztain, David Williams and Keith Holledge. Inspired by the Unlimited showcase exhibition Make Change: Design Thinking in Action and ‘Idea Starters’/teaching resources provided, teachers worked with a professional designer (from a discipline of architecture, interior design, industrial design, urban design, graphic design or landscape architecture) in ten random teams, to generate optimistic ideas for the Ideal City of tomorrow, each considering a theme – Food, Water, Transport, Ageing, Growth, Employment, Shelter, Health, Education and Energy. They then discussed how this process could be best activated and expanded on to build interest and knowledge in design thinking in the classroom. Assisted by illustrators, the teams prepared a visual presentation of their ideas and process from art materials provided. The workshop culminated in a video-taped interactive design charette to the larger group, which is intended to be utilised as a toolkit and praxis for teachers as part of the State Library of Queensland Design Minds Website Project.
Resumo:
This study examined the everyday practices of families within the context of family mealtime to investigate how members accomplished mealtime interactions. Using an ethnomethodological approach, conversation analysis and membership categorization analysis, the study investigated the interactional resources that family members used to assemble their social orders moment by moment during family mealtimes. While there is interest in mealtimes within educational policy, health research and the media, there remain few studies that provide fine-grained detail about how members produce the social activity of having a family meal. Findings from this study contribute empirical understandings about families and family mealtime. Two families with children aged 2 to 10 years were observed as they accomplished their everyday mealtime activities. Data collection took place in the family homes where family members video recorded their naturally occurring mealtimes. Each family was provided with a video camera for a one-month period and they decided which mealtimes they recorded, a method that afforded participants greater agency in the data collection process and made available to the analyst a window into the unfolding of the everyday lives of the families. A total of 14 mealtimes across the two families were recorded, capturing 347 minutes of mealtime interactions. Selected episodes from the data corpus, which includes centralised breakfast and dinnertime episodes, were transcribed using the Jeffersonian system. Three data chapters examine extended sequences of family talk at mealtimes, to show the interactional resources used by members during mealtime interactions. The first data chapter explores multiparty talk to show how the uniqueness of the occasion of having a meal influences turn design. It investigates the ways in which members accomplish two-party talk within a multiparty setting, showing how one child "tells" a funny story to accomplish the drawing together of his brothers as an audience. As well, this chapter identifies the interactional resources used by the mother to cohort her children to accomplish the choralling of grace. The second data chapter draws on sequential and categorical analysis to show how members are mapped to a locally produced membership category. The chapter shows how the mapping of members into particular categories is consequential for social order; for example, aligning members who belong to the membership category "had haircuts" and keeping out those who "did not have haircuts". Additional interactional resources such as echoing, used here to refer to the use of exactly the same words, similar prosody and physical action, and increasing physical closeness, are identified as important to the unfolding talk particularly as a way of accomplishing alignment between the grandmother and grand-daughter. The third and final data analysis chapter examines topical talk during family mealtimes. It explicates how members introduce topics of talk with an orientation to their co-participant and the way in which the take up of a topic is influenced both by the sequential environment in which it is introduced and the sensitivity of the topic. Together, these three data chapters show aspects of how family members participated in family mealtimes. The study contributes four substantive themes that emerged during the analytic process and, as such, the themes reflect what the members were observed to be doing. The first theme identified how family knowledge was relevant and consequential for initiating and sustaining interaction during mealtime with, for example, members buying into the talk of other members or being requested to help out with knowledge about a shared experience. Knowledge about members and their activities was evident with the design of questions evidencing an orientation to coparticipant’s knowledge. The second theme found how members used topic as a resource for social interaction. The third theme concerned the way in which members utilised membership categories for producing and making sense of social action. The fourth theme, evident across all episodes selected for analysis, showed how children’s competence is an ongoing interactional accomplishment as they manipulated interactional resources to manage their participation in family mealtime. The way in which children initiated interactions challenges previous understandings about children’s restricted rights as conversationalists. As well as making a theoretical contribution, the study offers methodological insight by working with families as research participants. The study shows the procedures involved as the study moved from one where the researcher undertook the decisions about what to videorecord to offering this decision making to the families, who chose when and what to videorecord of their mealtime practices. Evident also are the ways in which participants orient both to the video-camera and to the absent researcher. For the duration of the mealtime the video-camera was positioned by the adults as out of bounds to the children; however, it was offered as a "treat" to view after the mealtime was recorded. While situated within family mealtimes and reporting on the experiences of two families, this study illuminates how mealtimes are not just about food and eating; they are social. The study showed the constant and complex work of establishing and maintaining social orders and the rich array of interactional resources that members draw on during family mealtimes. The family’s interactions involved members contributing to building the social orders of family mealtime. With mealtimes occurring in institutional settings involving young children, such as long day care centres and kindergartens, the findings of this study may help educators working with young children to see the rich interactional opportunities mealtimes afford children, the interactional competence that children demonstrate during mealtimes, and the important role/s that adults may assume as co-participants in interactions with children within institutional settings.
Resumo:
The 31st TTRA conference was held in California’s San Fernando Valley, home of Hollywood and Burbank’s movie and television studios. The twin themes of Hollywood and the new Millennium promised and delivered “something old, yet something new”. The meeting offered a historical summary, not only of the year in review but also of many features of travel research since the first literature in the field appeared in the 1970s. Also, the millennium theme set the scene for some stimulating and forward thinking discussions. The Hollywood location offered an opportunity to ponder on the value of the movie-induced tourism for Los Angeles, at a time when Hollywood Boulevard was in the midst of a much needed redevelopment programme. Hollywood Chamber of Commerce speaker Oscar Arslanian acknowledged that the face of the famous district had become tired, and that its ability to continue to attract visitors in the future lay in redeveloping its past heritage. In line with the Hollywood theme a feature of the conference was a series of six special sessions with “Stars of Travel Research”. These sessions featured: Clare Gunn, Stanley Plog, Charles Gouldner, John Hunt, Brent Ritchie, Geoffrey Crouch, Peter Williams, Douglas Frechtling, Turgut Var, Robert Christie-Mill, and John Crotts. Delegates were indeed privileged to hear from many of the pioneers of tourism research. Clare Gunn, Charles Goeldner, Turgut Var and Stanley Plog, for example, traced the history of different aspects of the tourism literature, and in line with the millennium theme, offered some thought provoking discussion on the future challenges facing tourism. These included; the commodotisation of airlines and destinations, airport and traffic congestion, environment sustainability responsibility and the looming burst of the baby-boomer bubble. Included in the conference proceedings are four papers presented by five of the “Stars”. Brent Ritchie and Geoffrey Crouch discuss the critical success factors for destinations, Clare Gunn shares his concerns about tourism being a smokestack industry, Doug Frechtling provides forecasts of outbound travel from 20 countries, and Charles Gouldner, who has attended all 31 TTRA conferences, reflects on the changes that have taken place in tourism research over 35 years...
Resumo:
My contention is that creativity now is as important in education as literacy, and we should treat it with the same status. (Robinson, 2006) This bold assertion from Sir Ken Robinson, a leading expert and speaker on creativity, is perhaps even truer now than it was six years ago. Literacy (and numeracy) have always been, and should remain, fundamental to education. However, creativity is not a rival to literacy or numeracy education; it is not an addition to these (or any other) areas of the curriculum. Creativity should be a core, integrated element of teaching and learning throughout the curriculum and the school environment. In the new national curriculum, “critical and creative thinking” are highlighted as general capabilities “that can be developed and applied across the curriculum” (ACARA, 2011, p. 15). Moreover, an aim of education noted by the 2008 Melbourne Declaration on Educational Goals for Young Australians is “to support all young Australians to become ... confident and creative individuals” (MCEETYA, 2008, p. 8). These are confirmation that creativity should have high “status” in Australian education.
Resumo:
Fusion techniques have received considerable attention for achieving lower error rates with biometrics. A fused classifier architecture based on sequential integration of multi-instance and multi-sample fusion schemes allows controlled trade-off between false alarms and false rejects. Expressions for each type of error for the fused system have previously been derived for the case of statistically independent classifier decisions. It is shown in this paper that the performance of this architecture can be improved by modelling the correlation between classifier decisions. Correlation modelling also enables better tuning of fusion model parameters, ‘N’, the number of classifiers and ‘M’, the number of attempts/samples, and facilitates the determination of error bounds for false rejects and false accepts for each specific user. Error trade-off performance of the architecture is evaluated using HMM based speaker verification on utterances of individual digits. Results show that performance is improved for the case of favourable correlated decisions. The architecture investigated here is directly applicable to speaker verification from spoken digit strings such as credit card numbers in telephone or voice over internet protocol based applications. It is also applicable to other biometric modalities such as finger prints and handwriting samples.
Resumo:
Fusion techniques have received considerable attention for achieving performance improvement with biometrics. While a multi-sample fusion architecture reduces false rejects, it also increases false accepts. This impact on performance also depends on the nature of subsequent attempts, i.e., random or adaptive. Expressions for error rates are presented and experimentally evaluated in this work by considering the multi-sample fusion architecture for text-dependent speaker verification using HMM based digit dependent speaker models. Analysis incorporating correlation modeling demonstrates that the use of adaptive samples improves overall fusion performance compared to randomly repeated samples. For a text dependent speaker verification system using digit strings, sequential decision fusion of seven instances with three random samples is shown to reduce the overall error of the verification system by 26% which can be further reduced by 6% for adaptive samples. This analysis novel in its treatment of random and adaptive multiple presentations within a sequential fused decision architecture, is also applicable to other biometric modalities such as finger prints and handwriting samples.
Resumo:
Statistical dependence between classifier decisions is often shown to improve performance over statistically independent decisions. Though the solution for favourable dependence between two classifier decisions has been derived, the theoretical analysis for the general case of 'n' client and impostor decision fusion has not been presented before. This paper presents the expressions developed for favourable dependence of multi-instance and multi-sample fusion schemes that employ 'AND' and 'OR' rules. The expressions are experimentally evaluated by considering the proposed architecture for text-dependent speaker verification using HMM based digit dependent speaker models. The improvement in fusion performance is found to be higher when digit combinations with favourable client and impostor decisions are used for speaker verification. The total error rate of 20% for fusion of independent decisions is reduced to 2.1% for fusion of decisions that are favourable for both client and impostors. The expressions developed here are also applicable to other biometric modalities, such as finger prints and handwriting samples, for reliable identity verification.
Resumo:
Audio-visualspeechrecognition, or the combination of visual lip-reading with traditional acoustic speechrecognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visualspeechrecognition literature to show that further improvements in speechrecognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visualspeechrecognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotiveaudio-visualspeech database. We study the relative contribution between the side and central orientated cameras in improving visualspeechrecognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.
Resumo:
May was a particularly busy month with lots of exciting architectural things happening in Brisbane, including the sell-out 2012 National Architecture Conference. The total number of conference attendees was 1,625, which was the largest number of attendees to any Australian National Architecture Conference to date. This was the first time that the National Architecture Conference had been held in Brisbane in over 20 years, and the enormous turnout of 947 Queenslanders to the conference was testament to the positive decision to include Brisbane as a conference venue. The theme of this year’s conference was ‘experience’. Building on ideas introduced in the recent ‘natural artifice’ conference, creative directors Shane Thompson, Michael Rayner and Peter Skinner focused closely on the real, sensed experience of architecture within its natural and constructed settings and the experience of designing and making architecture. The conference attracted a variety of high profile international speakers, including architect and professor, Wang Shu, the 2012 Pritzker Architecture Prize Laureate and co-founder of the Amateur Architecture Studio in China. Other highlights included presentations from Peter Rich [South Africa], Kathryn Findlay [United Kingdom], Rachel Neeson [Australia], Anuradha Mathur & Dilip da Cunha [United States] and Kjetil Thorsen [Norway]. QUT had a strong presence at the conference. In addition to pleasing attendance rates from QUT School of Design students and staff, our Head-of-School Professor Paul Sanders, was given the honourable task of introducing keynote speaker Peter Rich, and facilitating the Q&A session after his presentation, which received a standing ovation. There were many events organised for students and young architects by QUT’s SONA reps, including a masterclass, opening party, collaborative design and construction of the SONA Pavilion, and finally, organisation of the all important SONA Hangover Breakfast, the morning after the closing party. The 2012 National Architecture Conference was truly memorable and an experience not to have been missed. I encourage anyone with a passion for architecture and a desire to be completely inspired by current and emerging leaders in our exciting profession, to start making plans to attend next year’s conference.
Resumo:
Classifier selection is a problem encountered by multi-biometric systems that aim to improve performance through fusion of decisions. A particular decision fusion architecture that combines multiple instances (n classifiers) and multiple samples (m attempts at each classifier) has been proposed in previous work to achieve controlled trade-off between false alarms and false rejects. Although analysis on text-dependent speaker verification has demonstrated better performance for fusion of decisions with favourable dependence compared to statistically independent decisions, the performance is not always optimal. Given a pool of instances, best performance with this architecture is obtained for certain combination of instances. Heuristic rules and diversity measures have been commonly used for classifier selection but it is shown that optimal performance is achieved for the `best combination performance' rule. As the search complexity for this rule increases exponentially with the addition of classifiers, a measure - the sequential error ratio (SER) - is proposed in this work that is specifically adapted to the characteristics of sequential fusion architecture. The proposed measure can be used to select a classifier that is most likely to produce a correct decision at each stage. Error rates for fusion of text-dependent HMM based speaker models using SER are compared with other classifier selection methodologies. SER is shown to achieve near optimal performance for sequential fusion of multiple instances with or without the use of multiple samples. The methodology applies to multiple speech utterances for telephone or internet based access control and to other systems such as multiple finger print and multiple handwriting sample based identity verification systems.
Resumo:
Language and Mobility is the latest monograph by Alastair Pennycook. It is part of the series, Critical Language and Literacy Studies. Co-edited by Pennycook, along with Brian Morgan and Ryuko Kubota, the series looks at relations of power in diverse worlds of language and literacy. As the title indicates, Pennycook’s own volume explores the idea of language turning up in ‘unexpected’ places, for example, Cornish in Moonta, South Australia, a century or two after it supposedly died with its last speaker. Why is it, Pennycook asks, that we expect to find a (particular form of a) language in a particular place? This question is generated by a critical project that seeks to leverage the educational potential of everyday moments of language use...