985 resultados para Florence (Italy).
Resumo:
Mode of access: Internet.
Resumo:
Mode of access: Internet.
Resumo:
This paper develops a general theory of validation gating for non-linear non-Gaussian mod- els. Validation gates are used in target tracking to cull very unlikely measurement-to-track associa- tions, before remaining association ambiguities are handled by a more comprehensive (and expensive) data association scheme. The essential property of a gate is to accept a high percentage of correct associ- ations, thus maximising track accuracy, but provide a su±ciently tight bound to minimise the number of ambiguous associations. For linear Gaussian systems, the ellipsoidal vali- dation gate is standard, and possesses the statistical property whereby a given threshold will accept a cer- tain percentage of true associations. This property does not hold for non-linear non-Gaussian models. As a system departs from linear-Gaussian, the ellip- soid gate tends to reject a higher than expected pro- portion of correct associations and permit an excess of false ones. In this paper, the concept of the ellip- soidal gate is extended to permit correct statistics for the non-linear non-Gaussian case. The new gate is demonstrated by a bearing-only tracking example.
Resumo:
In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.
Resumo:
Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
The number of Internet users in Australia has been steadily increasing, with over 10.9 million people currently subscribed to an internet provider (ABS, 2011). Over the past year, the most avid users of the Internet were 15 – 24 year olds, with approximately 95% accessing the internet on a regular basis (ABS, Social Trends, 2011). While the internet has been described as fundamental to higher education students, social and leisure internet tools are also increasingly being used by these students to generate and maintain their social and professional networks and interactions (Duffy & Bruns 2006). Rapid technological advancements have enabled greater and faster access to information for learning and education (Hemmi et al, 2009; Glassman and Kang, 2011). As such, we sought to integrate interactive, online social media into the assessment profile of a Public Health undergraduate cohort at the Queensland University of Technology (QUT). The aim of this exercise was to engage students to both develop and showcase their research on a range of complex, contemporary health issues within the online forum of Wikispaces (http://www.wikispaces.com/) for review and critique by their peers. We applied Bandura’s Social Learning Theory (SLT) to analyse the interactive processes from which students developed deeper and more sustained learning, and via which their overall academic writing standards were raised. This paper outlines the assessment task, and the students’ feedback on their learning outcomes in relation to the Attentional, Retentional, Motor Reproduction, and Motivational Processes outlined by Bandura in SLT. We conceptualise the findings in a theoretical model, and discuss the implications for this approach within the broader tertiary environment.
Resumo:
At present, many approaches have been proposed for deformable face alignment with varying degrees of success. However, the common drawback to nearly all these approaches is the inaccurate landmark registrations. The registration errors which occur are predominantly heterogeneous (i.e. low error for some frames in a sequence and higher error for others). In this paper we propose an approach for simultaneously aligning an ensemble of deformable face images stemming from the same subject given noisy heterogeneous landmark estimates. We propose that these initial noisy landmark estimates can be used as an “anchor” in conjunction with known state-of-the-art objectives for unsupervised image ensemble alignment. Impressive alignment performance is obtained using well known deformable face fitting algorithms as “anchors.
Resumo:
Image representations derived from simplified models of the primary visual cortex (V1), such as HOG and SIFT, elicit good performance in a myriad of visual classification tasks including object recognition/detection, pedestrian detection and facial expression classification. A central question in the vision, learning and neuroscience communities regards why these architectures perform so well. In this paper, we offer a unique perspective to this question by subsuming the role of V1-inspired features directly within a linear support vector machine (SVM). We demonstrate that a specific class of such features in conjunction with a linear SVM can be reinterpreted as inducing a weighted margin on the Kronecker basis expansion of an image. This new viewpoint on the role of V1-inspired features allows us to answer fundamental questions on the uniqueness and redundancies of these features, and offer substantial improvements in terms of computational and storage efficiency.
Resumo:
The selection of optimal camera configurations (camera locations, orientations etc.) for multi-camera networks remains an unsolved problem. Previous approaches largely focus on proposing various objective functions to achieve different tasks. Most of them, however, do not generalize well to large scale networks. To tackle this, we introduce a statistical formulation of the optimal selection of camera configurations as well as propose a Trans-Dimensional Simulated Annealing (TDSA) algorithm to effectively solve the problem. We compare our approach with a state-of-the-art method based on Binary Integer Programming (BIP) and show that our approach offers similar performance on small scale problems. However, we also demonstrate the capability of our approach in dealing with large scale problems and show that our approach produces better results than 2 alternative heuristics designed to deal with the scalability issue of BIP.
Resumo:
This paper considers the problem of reconstructing the motion of a 3D articulated tree from 2D point correspondences subject to some temporal prior. Hitherto, smooth motion has been encouraged using a trajectory basis, yielding a hard combinatorial problem with time complexity growing exponentially in the number of frames. Branch and bound strategies have previously attempted to curb this complexity whilst maintaining global optimality. However, they provide no guarantee of being more efficient than exhaustive search. Inspired by recent work which reconstructs general trajectories using compact high-pass filters, we develop a dynamic programming approach which scales linearly in the number of frames, leveraging the intrinsically local nature of filter interactions. Extension to affine projection enables reconstruction without estimating cameras.
Resumo:
This paper analyses the probabilistic linear discriminant analysis (PLDA) speaker verification approach with limited development data. This paper investigates the use of the median as the central tendency of a speaker’s i-vector representation, and the effectiveness of weighted discriminative techniques on the performance of state-of-the-art length-normalised Gaussian PLDA (GPLDA) speaker verification systems. The analysis within shows that the median (using a median fisher discriminator (MFD)) provides a better representation of a speaker when the number of representative i-vectors available during development is reduced, and that further, usage of the pair-wise weighting approach in weighted LDA and weighted MFD provides further improvement in limited development conditions. Best performance is obtained using a weighted MFD approach, which shows over 10% improvement in EER over the baseline GPLDA system on mismatched and interview-interview conditions.
Resumo:
The construction industry is responsible for a significant part of the solid waste that industrialised nations dispose of each year. One reason for this is the inability to easily separate materials and components from each other and from the building structure. If buildings were designed for disassembly in the first instance, then future material and component recovery would be easier. This paper presents a number of principles for design for disassembly that have been tested and developed through a process of research through creative practice. A number of architectural designs have been used to trial the principles in practice.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.