940 resultados para noisy speaker verification


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a clustered approach for blind beamfoming from ad-hoc microphone arrays. In such arrangements, microphone placement is arbitrary and the speaker may be close to one, all or a subset of microphones at a given time. Practical issues with such a configuration mean that some microphones might be better discarded due to poor input signal to noise ratio (SNR) or undesirable spatial aliasing effects from large inter-element spacings when beamforming. Large inter-microphone spacings may also lead to inaccuracies in delay estimation during blind beamforming. In such situations, using a cluster of microphones (ie, a sub-array), closely located both to each other and to the desired speech source, may provide more robust enhancement than the full array. This paper proposes a method for blind clustering of microphones based on the magnitude square coherence function, and evaluates the method on a database recorded using various ad-hoc microphone arrangements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A configurable process model describes a family of similar process models in a given domain. Such a model can be configured to obtain a specific process model that is subsequently used to handle individual cases, for instance, to process customer orders. Process configuration is notoriously difficult as there may be all kinds of interdependencies between configuration decisions.} In fact, an incorrect configuration may lead to behavioral issues such as deadlocks and livelocks. To address this problem, we present a novel verification approach inspired by the ``operating guidelines'' used for partner synthesis. We view the configuration process as an external service, and compute a characterization of all such services which meet particular requirements using the notion of configuration guideline. As a result, we can characterize all feasible configurations (i.\,e., configurations without behavioral problems) at design time, instead of repeatedly checking each individual configuration while configuring a process model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present recent results with using range from radio for mobile robot localization. In previous work we have shown how range readings from radio tags placed in the environment can be used to localize a robot. We have extended previous work to consider robustness. Specifically, we are interested in the case where range readings are very noisy and available intermittently. Also, we consider the case where the location of the radio tags is not known at all ahead of time and must be solved for simultaneously along with the position of the moving robot. We present results from a mobile robot that is equipped with GPS for ground truth, operating over several km.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microphone arrays have been used in various applications to capture conversations, such as in meetings and teleconferences. In many cases, the microphone and likely source locations are known \emph{a priori}, and calculating beamforming filters is therefore straightforward. In ad-hoc situations, however, when the microphones have not been systematically positioned, this information is not available and beamforming must be achieved blindly. In achieving this, a commonly neglected issue is whether it is optimal to use all of the available microphones, or only an advantageous subset of these. This paper commences by reviewing different approaches to blind beamforming, characterising them by the way they estimate the signal propagation vector and the spatial coherence of noise in the absence of prior knowledge of microphone and speaker locations. Following this, a novel clustered approach to blind beamforming is motivated and developed. Without using any prior geometrical information, microphones are first grouped into localised clusters, which are then ranked according to their relative distance from a speaker. Beamforming is then performed using either the closest microphone cluster, or a weighted combination of clusters. The clustered algorithms are compared to the full set of microphones in experiments on a database recorded on different ad-hoc array geometries. These experiments evaluate the methods in terms of signal enhancement as well as performance on a large vocabulary speech recognition task.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Silhouettes are common features used by many applications in computer vision. For many of these algorithms to perform optimally, accurately segmenting the objects of interest from the background to extract the silhouettes is essential. Motion segmentation is a popular technique to segment moving objects from the background, however such algorithms can be prone to poor segmentation, particularly in noisy or low contrast conditions. In this paper, the work of [3] combining motion detection with graph cuts, is extended into two novel implementations that aim to allow greater uncertainty in the output of the motion segmentation, providing a less restricted input to the graph cut algorithm. The proposed algorithms are evaluated on a portion of the ETISEO dataset using hand segmented ground truth data, and an improvement in performance over the motion segmentation alone and the baseline system of [3] is shown.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These image hashes can be used for watermarking, image integrity authentication or image indexing for fast retrieval. This paper introduces a new method of generating image hashes based on extracting Higher Order Spectral features from the Radon projection of an input image. The feature extraction process is non-invertible, non-linear and different hashes can be produced from the same image through the use of random permutations of the input. We show that the transform is robust to typical image transformations such as JPEG compression, noise, scaling, rotation, smoothing and cropping. We evaluate our system using a verification-style framework based on calculating false match, false non-match likelihoods using the publicly available Uncompressed Colour Image database (UCID) of 1320 images. We also compare our results to Swaminathan’s Fourier-Mellin based hashing method with at least 1% EER improvement under noise, scaling and sharpening.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but these approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks are an alternative that optimise parameters of enhancement algorithms based on state sequences generated for utterances with known transcriptions. Previous reports of LIMA frameworks have shown significant promise for improving speech recognition accuracies under additive background noise for a range of speech enhancement techniques. In this paper we discuss the drawbacks of the LIMA approach when multiple layers of acoustic mismatch are present – namely background noise and speaker accent. Experimentation using LIMA-based Mel-filterbank noise subtraction on American and Australian English in-car speech databases supports this discussion, demonstrating that inferior speech recognition performance occurs when a second layer of mismatch is seen during evaluation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional speech enhancement methods optimise signal-level criteria such as signal-to-noise ratio, but such approaches are sub-optimal for noise-robust speech recognition. Likelihood-maximising (LIMA) frameworks on the other hand, optimise the parameters of speech enhancement algorithms based on state sequences generated by a speech recogniser for utterances of known transcriptions. Previous applications of LIMA frameworks have generated a set of global enhancement parameters for all model states without taking in account the distribution of model occurrence, making optimisation susceptible to favouring frequently occurring models, in particular silence. In this paper, we demonstrate the existence of highly disproportionate phonetic distributions on two corpora with distinct speech tasks, and propose to normalise the influence of each phone based on a priori occurrence probabilities. Likelihood analysis and speech recognition experiments verify this approach for improving ASR performance in noisy environments.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Shell structures find use in many fields of engineering, notably structural, mechanical, aerospace and nuclear-reactor disciplines. Axisymmetric shell structures are used as dome type of roofs, hyperbolic cooling towers, silos for storage of grain, oil and industrial chemicals and water tanks. Despite their thin walls, strength is derived due to the curvature. The generally high strength-to-weight ratio of the shell form, combined with its inherent stiffness, has formed the basis of this vast application. With the advent in computation technology, the finite element method and optimisation techniques, structural engineers have extremely versatile tools for the optimum design of such structures. Optimisation of shell structures can result not only in improved designs, but also in a large saving of material. The finite element method being a general numerical procedure that could be used to treat any shell problem to any desired degree of accuracy, requires several runs in order to obtain a complete picture of the effect of one parameter on the shell structure. This redesign I re-analysis cycle has been achieved via structural optimisation in the present research, and MSC/NASTRAN (a commercially available finite element code) has been used in this context for volume optimisation of axisymmetric shell structures under axisymmetric and non-axisymmetric loading conditions. The parametric study of different axisymmetric shell structures has revealed that the hyperbolic shape is the most economical solution of shells of revolution. To establish this, axisymmetric loading; self-weight and hydrostatic pressure, and non-axisymmetric loading; wind pressure and earthquake dynamic forces have been modelled on graphical pre and post processor (PATRAN) and analysis has been performed on two finite element codes (ABAQUS and NASTRAN), numerical model verification studies are performed, and optimum material volume required in the walls of cylindrical, conical, parabolic and hyperbolic forms of axisymmetric shell structures are evaluated and reviewed. Free vibration and transient earthquake analysis of hyperbolic shells have been performed once it was established that hyperbolic shape is the most economical under all possible loading conditions. Effect of important parameters of hyperbolic shell structures; shell wall thickness, height and curvature, have been evaluated and empirical relationships have been developed to estimate an approximate value of the lowest (first) natural frequency of vibration. The outcome of this thesis has been the generation of new research information on performance characteristics of axisymmetric shell structures that will facilitate improved designs of shells with better choice of shapes and enhanced levels of economy and performance. Key words; Axisymmetric shell structures, Finite element analysis, Volume Optimisation_ Free vibration_ Transient response.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The material presented in this thesis may be viewed as comprising two key parts, the first part concerns batch cryptography specifically, whilst the second deals with how this form of cryptography may be applied to security related applications such as electronic cash for improving efficiency of the protocols. The objective of batch cryptography is to devise more efficient primitive cryptographic protocols. In general, these primitives make use of some property such as homomorphism to perform a computationally expensive operation on a collective input set. The idea is to amortise an expensive operation, such as modular exponentiation, over the input. Most of the research work in this field has concentrated on its employment as a batch verifier of digital signatures. It is shown that several new attacks may be launched against these published schemes as some weaknesses are exposed. Another common use of batch cryptography is the simultaneous generation of digital signatures. There is significantly less previous work on this area, and the present schemes have some limited use in practical applications. Several new batch signatures schemes are introduced that improve upon the existing techniques and some practical uses are illustrated. Electronic cash is a technology that demands complex protocols in order to furnish several security properties. These typically include anonymity, traceability of a double spender, and off-line payment features. Presently, the most efficient schemes make use of coin divisibility to withdraw one large financial amount that may be progressively spent with one or more merchants. Several new cash schemes are introduced here that make use of batch cryptography for improving the withdrawal, payment, and deposit of electronic coins. The devised schemes apply both to the batch signature and verification techniques introduced, demonstrating improved performance over the contemporary divisible based structures. The solutions also provide an alternative paradigm for the construction of electronic cash systems. Whilst electronic cash is used as the vehicle for demonstrating the relevance of batch cryptography to security related applications, the applicability of the techniques introduced extends well beyond this.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Literally, the word compliance suggests conformity in fulfilling official requirements. The thesis presents the results of the analysis and design of a class of protocols called compliant cryptologic protocols (CCP). The thesis presents a notion for compliance in cryptosystems that is conducive as a cryptologic goal. CCP are employed in security systems used by at least two mutually mistrusting sets of entities. The individuals in the sets of entities only trust the design of the security system and any trusted third party the security system may include. Such a security system can be thought of as a broker between the mistrusting sets of entities. In order to provide confidence in operation for the mistrusting sets of entities, CCP must provide compliance verification mechanisms. These mechanisms are employed either by all the entities or a set of authorised entities in the system to verify the compliance of the behaviour of various participating entities with the rules of the system. It is often stated that confidentiality, integrity and authentication are the primary interests of cryptology. It is evident from the literature that authentication mechanisms employ confidentiality and integrity services to achieve their goal. Therefore, the fundamental services that any cryptographic algorithm may provide are confidentiality and integrity only. Since controlling the behaviour of the entities is not a feasible cryptologic goal,the verification of the confidentiality of any data is a futile cryptologic exercise. For example, there exists no cryptologic mechanism that would prevent an entity from willingly or unwillingly exposing its private key corresponding to a certified public key. The confidentiality of the data can only be assumed. Therefore, any verification in cryptologic protocols must take the form of integrity verification mechanisms. Thus, compliance verification must take the form of integrity verification in cryptologic protocols. A definition of compliance that is conducive as a cryptologic goal is presented as a guarantee on the confidentiality and integrity services. The definitions are employed to provide a classification mechanism for various message formats in a cryptologic protocol. The classification assists in the characterisation of protocols, which assists in providing a focus for the goals of the research. The resulting concrete goal of the research is the study of those protocols that employ message formats to provide restricted confidentiality and universal integrity services to selected data. The thesis proposes an informal technique to understand, analyse and synthesise the integrity goals of a protocol system. The thesis contains a study of key recovery,electronic cash, peer-review, electronic auction, and electronic voting protocols. All these protocols contain message format that provide restricted confidentiality and universal integrity services to selected data. The study of key recovery systems aims to achieve robust key recovery relying only on the certification procedure and without the need for tamper-resistant system modules. The result of this study is a new technique for the design of key recovery systems called hybrid key escrow. The thesis identifies a class of compliant cryptologic protocols called secure selection protocols (SSP). The uniqueness of this class of protocols is the similarity in the goals of the member protocols, namely peer-review, electronic auction and electronic voting. The problem statement describing the goals of these protocols contain a tuple,(I, D), where I usually refers to an identity of a participant and D usually refers to the data selected by the participant. SSP are interested in providing confidentiality service to the tuple for hiding the relationship between I and D, and integrity service to the tuple after its formation to prevent the modification of the tuple. The thesis provides a schema to solve the instances of SSP by employing the electronic cash technology. The thesis makes a distinction between electronic cash technology and electronic payment technology. It will treat electronic cash technology to be a certification mechanism that allows the participants to obtain a certificate on their public key, without revealing the certificate or the public key to the certifier. The thesis abstracts the certificate and the public key as the data structure called anonymous token. It proposes design schemes for the peer-review, e-auction and e-voting protocols by employing the schema with the anonymous token abstraction. The thesis concludes by providing a variety of problem statements for future research that would further enrich the literature.