97 resultados para Noisy corpora.
em Queensland University of Technology - ePrints Archive
Resumo:
The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.
Resumo:
Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.
Resumo:
We introduce a genetic programming (GP) approach for evolving genetic networks that demonstrate desired dynamics when simulated as a discrete stochastic process. Our representation of genetic networks is based on a biochemical reaction model including key elements such as transcription, translation and post-translational modifications. The stochastic, reaction-based GP system is similar but not identical with algorithmic chemistries. We evolved genetic networks with noisy oscillatory dynamics. The results show the practicality of evolving particular dynamics in gene regulatory networks when modelled with intrinsic noise.
Resumo:
Recently, vision-based systems have been deployed in professional sports to track the ball and players to enhance analysis of matches. Due to their unobtrusive nature, vision-based approaches are preferred to wearable sensors (e.g. GPS or RFID sensors) as it does not require players or balls to be instrumented prior to matches. Unfortunately, in continuous team sports where players need to be tracked continuously over long-periods of time (e.g. 35 minutes in field-hockey or 45 minutes in soccer), current vision-based tracking approaches are not reliable enough to provide fully automatic solutions. As such, human intervention is required to fix-up missed or false detections. However, in instances where a human can not intervene due to the sheer amount of data being generated - this data can not be used due to the missing/noisy data. In this paper, we investigate two representations based on raw player detections (and not tracking) which are immune to missed and false detections. Specifically, we show that both team occupancy maps and centroids can be used to detect team activities, while the occupancy maps can be used to retrieve specific team activities. An evaluation on over 8 hours of field hockey data captured at a recent international tournament demonstrates the validity of the proposed approach.
Resumo:
Techniques to improve the automated analysis of natural and spontaneous facial expressions have been developed. The outcome of the research has applications in several fields including national security (eg: expression invariant face recognition); education (eg: affect aware interfaces); mental and physical health (eg: depression and pain recognition).
Resumo:
Traffic state estimation in an urban road network remains a challenge for traffic models and the question of how such a network performs remains a difficult one to answer for traffic operators. Lack of detailed traffic information has long restricted research in this area. The introduction of Bluetooth into the automotive world presented an alternative that has now developed to a stage where large-scale test-beds are becoming available, for traffic monitoring and model validation purposes. But how much confidence should we have in such data? This paper aims to give an overview of the usage of Bluetooth, primarily for the city-scale management of urban transport networks, and to encourage researchers and practitioners to take a more cautious look at what is currently understood as a mature technology for monitoring travellers in urban environments. We argue that the full value of this technology is yet to be realised, for the analytical accuracies peculiar to the data have still to be adequately resolved.
Resumo:
Due to their unobtrusive nature, vision-based approaches to tracking sports players have been preferred over wearable sensors as they do not require the players to be instrumented for each match. Unfortunately however, due to the heavy occlusion between players, variation in resolution and pose, in addition to fluctuating illumination conditions, tracking players continuously is still an unsolved vision problem. For tasks like clustering and retrieval, having noisy data (i.e. missing and false player detections) is problematic as it generates discontinuities in the input data stream. One method of circumventing this issue is to use an occupancy map, where the field is discretised into a series of zones and a count of player detections in each zone is obtained. A series of frames can then be concatenated to represent a set-play or example of team behaviour. A problem with this approach though is that the compressibility is low (i.e. the variability in the feature space is incredibly high). In this paper, we propose the use of a bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labeled data.
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
Perceptual aliasing makes topological navigation a difficult task. In this paper we present a general approach for topological SLAM~(simultaneous localisation and mapping) which does not require motion or odometry information but only a sequence of noisy measurements from visited places. We propose a particle filtering technique for topological SLAM which relies on a method for disambiguating places which appear indistinguishable using neighbourhood information extracted from the sequence of observations. The algorithm aims to induce a small topological map which is consistent with the observations and simultaneously estimate the location of the robot. The proposed approach is evaluated using a data set of sonar measurements from an indoor environment which contains several similar places. It is demonstrated that our approach is capable of dealing with severe ambiguities and, and that it infers a small map in terms of vertices which is consistent with the sequence of observations.
Resumo:
Many contemporary currents in applied linguistics have favored discourse studies within assessment; there have been calls for cross-fertilization with other areas within applied linguistics, critiques of the positivist tradition within language testing research, and the growing impact of Conversation Analysis (CA) and sociocultural theory. This chapter focuses on the resulting increase in discourse-based studies of oral proficiency assessment techniques. These studies initially focused on the traditional oral proficiency interview but have since been extended to new test formats, including paired and group interaction. We discuss the research carried out on a number of factors in the assessment setting, including the role of the interlocutor, candidate, and rater, and the impact of tasks, task performance conditions, and rating criteria. Recent research has also concentrated more specifically on the assessment of pragmatic competence and on the applications of technology within the assessment of spoken language, including the comparability of semidirect and direct methods for such assessment and the use of computer corpora.