941 resultados para Gaussian
Resumo:
Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.
Resumo:
A quasi-maximum likelihood procedure for estimating the parameters of multi-dimensional diffusions is developed in which the transitional density is a multivariate Gaussian density with first and second moments approximating the true moments of the unknown density. For affine drift and diffusion functions, the moments are exactly those of the true transitional density and for nonlinear drift and diffusion functions the approximation is extremely good and is as effective as alternative methods based on likelihood approximations. The estimation procedure generalises to models with latent factors. A conditioning procedure is developed that allows parameter estimation in the absence of proxies.
Resumo:
This paper develops analytical distributions of temperature indices on which temperature derivatives are written. If the deviations of daily temperatures from their expected values are modelled as an Ornstein-Uhlenbeck process with timevarying variance, then the distributions of the temperature index on which the derivative is written is the sum of truncated, correlated Gaussian deviates. The key result of this paper is to provide an analytical approximation to the distribution of this sum, thus allowing the accurate computation of payoffs without the need for any simulation. A data set comprising average daily temperature spanning over a hundred years for four Australian cities is used to demonstrate the efficacy of this approach for estimating the payoffs to temperature derivatives. It is demonstrated that expected payoffs computed directly from historical records are a particularly poor approach to the problem when there are trends in underlying average daily temperature. It is shown that the proposed analytical approach is superior to historical pricing.
Resumo:
A significant amount of speech data is required to develop a robust speaker verification system, but it is difficult to find enough development speech to match all expected conditions. In this paper we introduce a new approach to Gaussian probabilistic linear discriminant analysis (GPLDA) to estimate reliable model parameters as a linearly weighted model taking more input from the large volume of available telephone data and smaller proportional input from limited microphone data. In comparison to a traditional pooled training approach, where the GPLDA model is trained over both telephone and microphone speech, this linear-weighted GPLDA approach is shown to provide better EER and DCF performance in microphone and mixed conditions in both the NIST 2008 and NIST 2010 evaluation corpora. Based upon these results, we believe that linear-weighted GPLDA will provide a better approach than pooled GPLDA, allowing for the further improvement of GPLDA speaker verification in conditions with limited development data.
Resumo:
Automated crowd counting has become an active field of computer vision research in recent years. Existing approaches are scene-specific, as they are designed to operate in the single camera viewpoint that was used to train the system. Real world camera networks often span multiple viewpoints within a facility, including many regions of overlap. This paper proposes a novel scene invariant crowd counting algorithm that is designed to operate across multiple cameras. The approach uses camera calibration to normalise features between viewpoints and to compensate for regions of overlap. This compensation is performed by constructing an 'overlap map' which provides a measure of how much an object at one location is visible within other viewpoints. An investigation into the suitability of various feature types and regression models for scene invariant crowd counting is also conducted. The features investigated include object size, shape, edges and keypoints. The regression models evaluated include neural networks, K-nearest neighbours, linear and Gaussian process regresion. Our experiments demonstrate that accurate crowd counting was achieved across seven benchmark datasets, with optimal performance observed when all features were used and when Gaussian process regression was used. The combination of scene invariance and multi camera crowd counting is evaluated by training the system on footage obtained from the QUT camera network and testing it on three cameras from the PETS 2009 database. Highly accurate crowd counting was observed with a mean relative error of less than 10%. Our approach enables a pre-trained system to be deployed on a new environment without any additional training, bringing the field one step closer toward a 'plug and play' system.
Resumo:
Our results demonstrate that photorefractive residual amplitude modulation (RAM) noise in electro-optic modulators (EOMs) can be reduced by modifying the incident beam intensity distribution. Here we report an order of magnitude reduction in RAM when beams with uniform intensity (flat-top) profiles, generated with an LCOS-SLM, are used instead of the usual fundamental Gaussian mode (TEM00). RAM arises from the photorefractive amplified scatter noise off the defects and impurities within the crystal. A reduction in RAM is observed with increasing intensity uniformity (flatness), which is attributed to a reduction in space charge field on the beam axis. The level of RAM reduction that can be achieved is physically limited by clipping at EOM apertures, with the observed results agreeing well with a simple model. These results are particularly important in applications where the reduction of residual amplitude modulation to 10^-6 is essential.
Resumo:
Discretization of a geographical region is quite common in spatial analysis. There have been few studies into the impact of different geographical scales on the outcome of spatial models for different spatial patterns. This study aims to investigate the impact of spatial scales and spatial smoothing on the outcomes of modelling spatial point-based data. Given a spatial point-based dataset (such as occurrence of a disease), we study the geographical variation of residual disease risk using regular grid cells. The individual disease risk is modelled using a logistic model with the inclusion of spatially unstructured and/or spatially structured random effects. Three spatial smoothness priors for the spatially structured component are employed in modelling, namely an intrinsic Gaussian Markov random field, a second-order random walk on a lattice, and a Gaussian field with Matern correlation function. We investigate how changes in grid cell size affect model outcomes under different spatial structures and different smoothness priors for the spatial component. A realistic example (the Humberside data) is analyzed and a simulation study is described. Bayesian computation is carried out using an integrated nested Laplace approximation. The results suggest that the performance and predictive capacity of the spatial models improve as the grid cell size decreases for certain spatial structures. It also appears that different spatial smoothness priors should be applied for different patterns of point data.
Resumo:
The huge amount of CCTV footage available makes it very burdensome to process these videos manually through human operators. This has made automated processing of video footage through computer vision technologies necessary. During the past several years, there has been a large effort to detect abnormal activities through computer vision techniques. Typically, the problem is formulated as a novelty detection task where the system is trained on normal data and is required to detect events which do not fit the learned ‘normal’ model. There is no precise and exact definition for an abnormal activity; it is dependent on the context of the scene. Hence there is a requirement for different feature sets to detect different kinds of abnormal activities. In this work we evaluate the performance of different state of the art features to detect the presence of the abnormal objects in the scene. These include optical flow vectors to detect motion related anomalies, textures of optical flow and image textures to detect the presence of abnormal objects. These extracted features in different combinations are modeled using different state of the art models such as Gaussian mixture model(GMM) and Semi- 2D Hidden Markov model(HMM) to analyse the performances. Further we apply perspective normalization to the extracted features to compensate for perspective distortion due to the distance between the camera and objects of consideration. The proposed approach is evaluated using the publicly available UCSD datasets and we demonstrate improved performance compared to other state of the art methods.
Resumo:
An important aspect of robotic path planning for is ensuring that the vehicle is in the best location to collect the data necessary for the problem at hand. Given that features of interest are dynamic and move with oceanic currents, vehicle speed is an important factor in any planning exercises to ensure vehicles are at the right place at the right time. Here, we examine different Gaussian process models to find a suitable predictive kinematic model that enable the speed of an underactuated, autonomous surface vehicle to be accurately predicted given a set of input environmental parameters.
Resumo:
A novel method of matching stiffness and continuous variable damping of an ECAS (electronically controlled air suspension) based on LQG (linear quadratic Gaussian) control was proposed to simultaneously improve the road-friendliness and ride comfort of a two-axle school bus. Taking account of the suspension nonlinearities and target-height-dependent variation in suspension characteristics, a stiffness model of the ECAS mounted on the drive axle of the bus was developed based on thermodynamics and the key parameters were obtained through field tests. By determining the proper range of the target height for the ECAS of the fully-loaded bus based on the design requirements of vehicle body bounce frequency, the control algorithm of the target suspension height (i.e., stiffness) was derived according to driving speed and road roughness. Taking account of the nonlinearities of a continuous variable semi-active damper, the damping force was obtained through the subtraction of the air spring force from the optimum integrated suspension force, which was calculated based on LQG control. Finally, a GA (genetic algorithm)-based matching method between stepped variable damping and stiffness was employed as a benchmark to evaluate the effectiveness of the LQG-based matching method. Simulation results indicate that compared with the GA-based matching method, both dynamic tire force and vehicle body vertical acceleration responses are markedly reduced around the vehicle body bounce frequency employing the LQG-based matching method, with peak values of the dynamic tire force PSD (power spectral density) decreased by 73.6%, 60.8% and 71.9% in the three cases, and corresponding reduction are 71.3%, 59.4% and 68.2% for the vehicle body vertical acceleration. A strong robustness to variation of driving speed and road roughness is also observed for the LQG-based matching method.
Resumo:
The study of the relationship between macroscopic traffic parameters, such as flow, speed and travel time, is essential to the understanding of the behaviour of freeway and arterial roads. However, the temporal dynamics of these parameters are difficult to model, especially for arterial roads, where the process of traffic change is driven by a variety of variables. The introduction of the Bluetooth technology into the transportation area has proven exceptionally useful for monitoring vehicular traffic, as it allows reliable estimation of travel times and traffic demands. In this work, we propose an approach based on Bayesian networks for analyzing and predicting the complex dynamics of flow or volume, based on travel time observations from Bluetooth sensors. The spatio-temporal relationship between volume and travel time is captured through a first-order transition model, and a univariate Gaussian sensor model. The two models are trained and tested on travel time and volume data, from an arterial link, collected over a period of six days. To reduce the computational costs of the inference tasks, volume is converted into a discrete variable. The discretization process is carried out through a Self-Organizing Map. Preliminary results show that a simple Bayesian network can effectively estimate and predict the complex temporal dynamics of arterial volumes from the travel time data. Not only is the model well suited to produce posterior distributions over single past, current and future states; but it also allows computing the estimations of joint distributions, over sequences of states. Furthermore, the Bayesian network can achieve excellent prediction, even when the stream of travel time observation is partially incomplete.
Resumo:
A new community and communication type of social networks - online dating - are gaining momentum. With many people joining in the dating network, users become overwhelmed by choices for an ideal partner. A solution to this problem is providing users with partners recommendation based on their interests and activities. Traditional recommendation methods ignore the users’ needs and provide recommendations equally to all users. In this paper, we propose a recommendation approach that employs different recommendation strategies to different groups of members. A segmentation method using the Gaussian Mixture Model (GMM) is proposed to customize users’ needs. Then a targeted recommendation strategy is applied to each identified segment. Empirical results show that the proposed approach outperforms several existing recommendation methods.
Resumo:
This paper analyses the probabilistic linear discriminant analysis (PLDA) speaker verification approach with limited development data. This paper investigates the use of the median as the central tendency of a speaker’s i-vector representation, and the effectiveness of weighted discriminative techniques on the performance of state-of-the-art length-normalised Gaussian PLDA (GPLDA) speaker verification systems. The analysis within shows that the median (using a median fisher discriminator (MFD)) provides a better representation of a speaker when the number of representative i-vectors available during development is reduced, and that further, usage of the pair-wise weighting approach in weighted LDA and weighted MFD provides further improvement in limited development conditions. Best performance is obtained using a weighted MFD approach, which shows over 10% improvement in EER over the baseline GPLDA system on mismatched and interview-interview conditions.
Resumo:
For a planetary rover to successfully traverse across unstructured terrain autonomously, one of the major challenges is to assess its local traversability such that it can plan a trajectory to achieve its mission goals efficiently while minimising risk to the vehicle itself. This paper aims to provide a comparative study on different approaches for representing the geometry of Martian terrain for the purpose of evaluating terrain traversability. An accurate representation of the geometric properties of the terrain is essential as it can directly affect the determination of traversability for a ground vehicle. We explore current state-of-the-art techniques for terrain estimation, in particular Gaussian Processes (GP) in various forms, and discuss the suitability of each technique in the context of an unstructured Martian terrain. Furthermore, we present the limitations of regression techniques in terms of spatial correlation and continuity assumptions, and the impact on traversability analysis of a planetary rover across unstructured terrain. The analysis was performed on datasets of the Mars Yard at the Powerhouse Museum in Sydney, obtained using the onboard RGB-D camera.
Resumo:
It is well recognized that many scientifically interesting sites on Mars are located in rough terrains. Therefore, to enable safe autonomous operation of a planetary rover during exploration, the ability to accurately estimate terrain traversability is critical. In particular, this estimate needs to account for terrain deformation, which significantly affects the vehicle attitude and configuration. This paper presents an approach to estimate vehicle configuration, as a measure of traversability, in deformable terrain by learning the correlation between exteroceptive and proprioceptive information in experiments. We first perform traversability estimation with rigid terrain assumptions, then correlate the output with experienced vehicle configuration and terrain deformation using a multi-task Gaussian Process (GP) framework. Experimental validation of the proposed approach was performed on a prototype planetary rover and the vehicle attitude and configuration estimate was compared with state-of-the-art techniques. We demonstrate the ability of the approach to accurately estimate traversability with uncertainty in deformable terrain.