807 resultados para Freedom of speech.
Resumo:
As virtual communities become more central to the everyday activities of connected individuals, we face increasingly pressing questions about the proper allocation of power, rights and responsibilities. This paper argues that our current legal discourse is ill-equipped to provide answers that will safeguard the legitimate interests of participants and simultaneously refrain from limiting the future innovative development of these spaces. From social networking sites like Facebook to virtual worlds like World of Warcraft and Second Life, participants who are banned from these communities stand to lose their virtual property, their connections to their friends and family, and their personal expression. Because our legal system views the proprietor’s interests as absolute private property rights, however, participants who are arbitrarily, capriciously or maliciously ejected have little recourse under law. This paper argues that, rather than assuming that a private property and freedom of contract model will provide the most desirable outcomes, a more critical approach is warranted. By rejecting the false dichotomy between ‘public’ and ‘private’ spaces, and recognising some of the absolutist and necessitarian trends in the current property debate, we may be able to craft legal rules that respect the social bonds between participants while simultaneously protecting the interests of developers.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.
Resumo:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.
Resumo:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Resumo:
This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
Driver simulators provide safe conditions to assess driver behaviour and provide controlled and repeatable environments for study. They are a promising research tool in terms of both providing safety and experimentally well controlled environments. There are wide ranges of driver simulators, from laptops to advanced technologies which are controlled by several computers in a real car mounted on platforms with six degrees of freedom of movement. The applicability of simulator-based research in a particular study needs to be considered before starting the study, to determine whether the use of a simulator is actually appropriate for the research. Given the wide range of driver simulators and their uses, it is important to know beforehand how closely the results from a driver simulator match results found in the real word. Comparison between drivers’ performance under real road conditions and in particular simulators is a fundamental part of validation. The important question is whether the results obtained in a simulator mirror real world results. In this paper, the results of the most recently conducted research into validity of simulators is presented.
Resumo:
The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.
Resumo:
According to the diagnosis of schizophrenia in the DSM-IV-TR (American Psychiatric Association, 2000), negative symptoms are those personal characteristics that are thought to be reduced from normal functioning, while positive symptoms are aspects of functioning that exist as an excess or distortion of normal functioning. Negative symptoms are generally considered to be a core feature of people diagnosed with schizophrenia. However, negative symptoms are not always present in those diagnosed, and a diagnosis can be made with only negative or only positive symptoms, or with a combination of both. Negative symptoms include an observed loss of emotional expression (affective flattening), loss of motivation or self directedness (avolition), loss of speech (alogia), and also a loss of interests and pleasures (anhedonia). Positive symptoms include the perception of things that others do not perceive (hallucinations), and extraordinary explanations for ordinary events (delusions) (American Psychiatric Association, 2000). Both negative and positive symptoms are derived from watching the patient and thus do not consider the patient’s subjective experience. However, aspects of negative symptoms, such as observed affective flattening are highly contended. Within conventional psychiatry, the absence of emotional expression is assumed to coincide with an absence of emotional experience. Contrasting research findings suggests that patients who were observed to score low on displayed emotional expression, scored high on self ratings of emotional experience. Patients were also observed to be significantly lower on emotional expression when compared with others (Aghevli, Blanchard, & Horan, 2003; Selton, van der Bosch, & Sijben, 1998). It appears that there is little correlation between emotional experience and emotional expression in patients, and that observer ratings cannot help us to understand the subjective experience of the negative symptoms. This chapter will focus on research into the subjective experiences of negative symptoms. A framework for these experiences will be used from the qualitative research findings of the primary author (Le Lievre, 2010). In this study, the primary author found that subjective experiences of the negative symptoms belonged to one of the two phases of the illness experience; “transitioning into emotional shutdown” or “recovering from emotional shutdown”. This chapter will use the six themes from the phase of “transitioning into emotional shutdown”. This phase described the experience of turning the focus of attention away from the world and onto the self and the past, thus losing contact with the world and others (emotional shutdown). Transitioning into emotional shutdown involved; “not being acknowledged”, “relational confusion”, “not being expressive”, “reliving the past”, “detachment”, and “no sense of direction” (Le Lievre, 2010). Detail will be added to this framework of experience from other qualitative research in this area. We will now review the six themes that constitute a “transition into emotional shutdown” and corresponding previous research findings.
Resumo:
In Bowenbrae Pty Ltd v Flying Fighters Maintenance and Restoration [2010] QDC 347 Reid DCJ made orders requiring the plaintiffs to make application under the Freedom of Information Act 1982 (Cth) (“the FOI Act”) for documents sought by the defendant.
Resumo:
Over the last decade, Ionic Liquids (ILs) have been used for the dissolution and derivatization of isolated cellulose. This ability of ILs is now sought for their application in the selective dissolution of cellulose from lignocellulosic biomass, for the manufacture of cellulosic ethanol. However, there are significant knowledge gaps in the understanding of the chemistry of the interaction of biomass and ILs. While imidazolium ILs have been used successfully to dissolve both isolated crystalline cellulose and components of lignocellulosic biomass, phosphonium ILs have not been sufficiently explored for the use in dissolution of lignocellulosic biomass. This thesis reports on the study of the chemistry of sugarcane bagasse with phosphonium ILs. Qualitative and quantitative measurements of biomass components dissolved in the phosphonium ionic liquids (ILs), trihexyltetradecylphosphonium chloride ([P66614]Cl) and tributylmethylphosphonium methylsulphate ([P4441]MeSO4) are obtained using attenuated total reflectance-Fourier Transform Infra Red (FTIR). Absorption bands related to cellulose, hemicelluloses and lignin dissolution monitored in situ in biomass-IL mixtures indicate lignin dissolution in both ILs and some holocellulose dissolution in the hydrophilic [P4441]MeSO4. The kinetics of lignin dissolution reported here indicate that while dissolution in the hydrophobic IL [P66614]Cl appears to follow an accepted mechanism of acid catalysed β-aryl ether cleavage, dissolution in the hydrophilic IL [P4441]MeSO4 does not appear to follow this mechanism and may not be followed by condensation reactions (initiated by reactive ketones). The quantitative measurement of lignin dissolution in phosphonium ILs based on absorbance at 1510 cm-1 has demonstrated utility and greater precision than the conventional Klason lignin method. The cleavage of lignin β-aryl ether bonds in sugarcane bagasse by the ionic liquid [P66614]Cl, in the presence of catalytic amounts of mineral acid. (ca. 0.4 %). The delignification process of bagasse is studied over a range of temperatures (120 °C to 150 °C) by monitoring the production of β-ketones (indicative of cleavage of β-aryl ethers) using FTIR spectroscopy and by compositional analysis of the undissolved fractions. Maximum delignification is obtained at 150 °C, with 52 % of lignin removed from the original lignin content of bagasse. No delignification is observed in the absence of acid which suggests that the reaction is acid catalysed with the IL solubilising the lignin fragments. The rate of delignification was significantly higher at 150 °C, suggesting that crossing the glass transition temperature of lignin effects greater freedom of rotation about the propanoid carbon-carbon bonds and leads to increased cleavage of β-aryl ethers. An attempt has been made to propose a probable mechanism of delignifcation of bagasse with the phosphonuim IL. All polymeric components of bagasse, a lignocellulosic biomass, dissolve in the hydrophilic ionic liquid (IL) tributylmethylphosphonium methylsulfate ([P4441]MeSO4) with and without a catalytic amount of acid (H2SO4, ca. 0.4 %). The presence of acid significantly increases the extent of dissolution of bagasse in [P4441]MeSO4 (by ca. 2.5 times under conditions used here). The dissolved fractions can be partially recovered by the addition of an antisolvent (water) and are significantly enriched in lignin. Unlike acid catalysed dissolution in the hydrophobic IL tetradecyltrihexylphosphonium chloride there is little evidence of cleavage of β-aryl ether bonds of lignin dissolving in [P4441]MeSO4 (with and without acid), but this mechanism may play some role in the acid catalysed dissolution. The XRD of the undissolved fractions suggests that the IL may selectively dissolve the amorphous cellulose component, leaving behind crystalline material.
Resumo:
This article reports on the cleavage of lignin ß-aryl ether bonds in sugarcane bagasse by the ionic liquid (IL) trihexyl tetradecyl phosphonium chloride [P66614] Cl, in the presence of catalytic amounts of mineral acid fca. 0.4%). The deligniflcation process of bagasse was studied over a range of temperatures (120°C to 150°C) by monitoring the production of ß-ketones (indicative of cleavage of ß-aryl ethers) using FTIR spectroscopy and by compositional analysis of the undissolved fractions. Maximum deligniflcation was obtained at 150°C, with 52% of lignin removed from the original lignin content of bagasse. No deligniflcation was observed in the absence of acid, which suggests that the reaction is acid catalyzed with the IL solubilizing the lignin fragments. The rate of deligniflcation was significantly higher at 150°C, suggesting that crossing the glass transition temperature of lignin effects greater freedom of rotation about the propanoid carbon-carbon bonds and leads to increased cleavage of ß-aryl ethers. An attempt has been made to propose a probable mechanism of deligniflcation of bagasse with the phosphonuim IL. © Taylor & Francis Group, LLC.
Resumo:
The development of creative industries has been connected to urban development since the end of the 20th century. However, the causality of why creative industries always cluster and develop in certain cities hasn‘t been adequately demonstrated, especially as to how various resources grow, interact and nurture the creative capacity of the locality. Therefore it is vital to observe how the local institutional environment nurtures creative industries and how creative industries consequently change the environment in order to better address the connection between creative industries and localities. In Beijing, the relocation of CCTV, BTV and Phoenix to Chaoyang District raises the possibility of a new era for Chinese media, one in which the stodginess of propaganda content will give way to exciting new forms and genres. The mixing of media companies in an open commercial environment (away from the political power district of Xicheng) holds the promise of more freedom of expression and, ultimately, to a =media capital‘ (Curtin, 2003). These are the dreams of many media practitioners in Beijing. But just how realistic are their expectations? This study adopts the concept of =media capital‘ to demonstrate how participants, including state-media organisations, private media companies and international media conglomerates, are seeking out space and networks to survive in Beijing. Drawing on policy analysis, interviews and case studies, this study illustrates how different agents meet, confront and adapt in Beijing. This study identifies factors responsible for the media industries clustering in China, and argues that Beijing is very likely to be the next Chinese media capital, after enough accumulation and development, although as a lower tier version compared to other media capitals in the world. This study contributes to Curtin‘s =media capital‘ concept, develops his interpretation on the relationship of media industries and the government, and suggests that the influence over the government of media companies and professionals should be acknowledged. Therefore, empirically, this study assists media practitioners in understanding how the Chinese government perceives media industries and, consequently, how media industries are operated in China. The study also reveals that despite the government‘s aspirations, China‘s media industries are still greatly constrained by institutional obstacles. Hence Beijing really needs to speed up its pace on the path of media reform, abandon the old mindset and create more room for creativity. Policy-makers in China should keep in mind that the only choice left to them is to further the reform.