173 resultados para speech enhancement


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Road and highway infrastructure provides the backbone for a nation's economic growth. The versatile dispersion of population in Australia, from sparsely settled communities in remote areas to regenerated inner city suburbs with high density living in metropolitans, calls for continuing development and improvement on roads infrastructure under the current federal government policies and state governments' strategic plans. As road infrastructure projects involve large resources and mechanism, achieving sustainability not only in economic scales but also through environmental and social responsibility becomes a crucial issue. Current efforts are often impeded by different interpretation on sustainability agenda by stakeholders involved in these types of projects. As a result, sustainability deliverables at the project level is not often as transparent and measurable, compared to promises in project briefs and designs. This paper reviews the past studies on sustainable infrastructure construction, focusing on roads and highway projects. Through literature study and consultation with the industry, key sustainability indicators specific to road infrastructure projects have been identified. Based on these findings, this paper introduces an on-going research project aimed at identifying and integrating the different perceptions and priority needs of the stakeholders, and issues that impact on the gap between sustainability foci and its actual realization at project end level. The exploration helps generate an integrated decision-making model for sustainable road infrastructure projects. The research will promote to the industry more systematic and integrated approaches to decision-making on the implementation of sustainability strategies to achieve deliverable goals throughout the development and delivery process of road infrastructure projects in Australia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cardiovascular assist devices are tested in mock circulation loops (MCLs) prior to animal and clinical testing. These MCLs rely on characteristics such as pneumatic parameters to create pressure and flow, and pipe dimensions to replicate the resistance, compliance and fluid inertia of the natural cardiovascular system. A mathematical simulation was developed in SIMULINK to simulate an existing MCL. Model validation was achieved by applying the physical MCL characteristics to the simulation and comparing the resulting pressure traces. These characteristics were subsequently altered to improve and thus predict the performance of a more accurate physical system. The simulation was successful in simulating the physical mock circulation loop, and proved to be a useful tool in the development of improved cardiovascular device test rigs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Accurate road lane information is crucial for advanced vehicle navigation and safety applications. With the increasing of very high resolution (VHR) imagery of astonishing quality provided by digital airborne sources, it will greatly facilitate the data acquisition and also significantly reduce the cost of data collection and updates if the road details can be automatically extracted from the aerial images. In this paper, we proposed an effective approach to detect road lanes from aerial images with employment of the image analysis procedures. This algorithm starts with constructing the (Digital Surface Model) DSM and true orthophotos from the stereo images. Next, a maximum likelihood clustering algorithm is used to separate road from other ground objects. After the detection of road surface, the road traffic and lane lines are further detected using texture enhancement and morphological operations. Finally, the generated road network is evaluated to test the performance of the proposed approach, in which the datasets provided by Queensland department of Main Roads are used. The experiment result proves the effectiveness of our approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Secondary tasks such as cell phone calls or interaction with automated speech dialog systems (SDSs) increase the driver’s cognitive load as well as the probability of driving errors. This study analyzes speech production variations due to cognitive load and emotional state of drivers in real driving conditions. Speech samples were acquired from 24 female and 17 male subjects (approximately 8.5 h of data) while talking to a co-driver and communicating with two automated call centers, with emotional states (neutral, negative) and the number of necessary SDS query repetitions also labeled. A consistent shift in a number of speech production parameters (pitch, first format center frequency, spectral center of gravity, spectral energy spread, and duration of voiced segments) was observed when comparing SDS interaction against co-driver interaction; further increases were observed when considering negative emotion segments and the number of requested SDS query repetitions. A mel frequency cepstral coefficient based Gaussian mixture classifier trained on 10 male and 10 female sessions provided 91% accuracy in the open test set task of distinguishing co-driver interactions from SDS interactions, suggesting—together with the acoustic analysis—that it is possible to monitor the level of driver distraction directly from their speech.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: The classic study of Sumby and Pollack (1954, JASA, 26(2), 212-215) demonstrated that visual information aided speech intelligibility under noisy auditory conditions. Their work showed that visual information is especially useful under low signal-to-noise conditions where the auditory signal leaves greater margins for improvement. We investigated whether simulated cataracts interfered with the ability of participants to use visual cues to help disambiguate the auditory signal in the presence of auditory noise. Methods: Participants in the study were screened to ensure normal visual acuity (mean of 20/20) and normal hearing (auditory threshold ≤ 20 dB HL). Speech intelligibility was tested under an auditory only condition and two visual conditions: normal vision and simulated cataracts. The light scattering effects of cataracts were imitated using cataract-simulating filters. Participants wore blacked-out glasses in the auditory only condition and lens-free frames in the normal auditory-visual condition. Individual sentences were spoken by a live speaker in the presence of prerecorded four-person background babble set to a speech-to-noise ratio (SNR) of -16 dB. The SNR was determined in a preliminary experiment to support 50% correct identification of sentence under the auditory only conditions. The speaker was trained to match the rate, intensity and inflections of a prerecorded audio track of everyday speech sentences. The speaker was blind to the visual conditions of the participant to control for bias.Participants’ speech intelligibility was measured by comparing the accuracy of their written account of what they believed the speaker to have said to the actual spoken sentence. Results: Relative to the normal vision condition, speech intelligibility was significantly poorer when participants wore simulated catarcts. Conclusions: The results suggest that cataracts may interfere with the acquisition of visual cues to speech perception.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With increasing pressure to provide environmentally responsible infrastructure products and services, stakeholders are putting significant foci on the early identification of financial viability and outcome of infrastructure projects. Traditionally, there has been an imbalance between sustainable measures and project budget. On one hand, the industry tends to employ the first-cost mentality and approach to developing infrastructure projects. On the other, environmental experts and technology innovators often push for the ultimately green products and systems without much of a concern for cost. This situation is being quickly changed as the industry is under pressure to continue to return profit, while better adapting to current and emerging global issues of sustainability. For the infrastructure sector to contribute to sustainable development, it will need to increase value and efficiency. Thus, there is a great need for tools that will enable decision makers evaluate competing initiatives and identify the most sustainable approaches to procuring infrastructure projects. In order to ensure that these objectives are achieved, the concept of life-cycle costing analysis (LCCA) will play significant roles in the economics of an infrastructure project. Recently, a few research initiatives have applied the LCCA models for road infrastructure that focused on the traditional economics of a project. There is little coverage of life-cycle costing as a method to evaluate the criteria and assess the economic implications of pursuing sustainability in road infrastructure projects. To rectify this problem, this paper reviews the theoretical basis of previous LCCA models before discussing their inability to determinate the sustainability indicators in road infrastructure project. It then introduces an on-going research aimed at developing a new model to integrate the various new cost elements based on the sustainability indicators with the traditional and proven LCCA approach. It is expected that the research will generate a working model for sustainability based life-cycle cost analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Non-driving related cognitive load and variations of emotional state may impact a driver’s capability to control a vehicle and introduces driving errors. Availability of reliable cognitive load and emotion detection in drivers would benefit the design of active safety systems and other intelligent in-vehicle interfaces. In this study, speech produced by 68 subjects while driving in urban areas is analyzed. A particular focus is on speech production differences in two secondary cognitive tasks, interactions with a co-driver and calls to automated spoken dialog systems (SDS), and two emotional states during the SDS interactions - neutral/negative. A number of speech parameters are found to vary across the cognitive/emotion classes. Suitability of selected cepstral- and production-based features for automatic cognitive task/emotion classification is investigated. A fusion of GMM/SVM classifiers yields an accuracy of 94.3% in cognitive task and 81.3% in emotion classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

What really changed for Australian Aboriginal and Torres Strait Islander people between Paul Keating’s Redfern Park Speech (Keating 1992) and Kevin Rudd’s Apology to the stolen generations (Rudd 2008)? What will change between the Apology and the next speech of an Australian Prime Minister? The two speeches were intricately linked, and they were both personal and political. But do they really signify change at the political level? This paper reflects my attempt to turn the gaze away from Aboriginal and Torres Strait Islander people, and back to where the speeches originated: the Australian Labor Party (ALP). I question whether the changes foreshadowed in the two speeches – including changes by the Australian public and within Australian society – are evident in the internal mechanisms of the ALP. I also seek to understand why non-Indigenous women seem to have given in to the existing ways of the ALP instead of challenging the status quo which keeps Aboriginal and Torres Strait Islander peoples marginalised. I believe that, without a thorough examination and a change in the ALP’s practices, the domination and subjugation of Indigenous peoples will continue – within the Party, through the Australian political process and, therefore, through governments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.