982 resultados para Dataset


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Comprehensive Australian Study of Entrepreneurial Emergence (CAUSEE) is the largest study of new firm creation ever undertaken in Australia. The project provides an exciting opportunity to fundamentally improve our understanding of independent entrepreneurship in Australia by studying factors that initiate, hinder and facilitate the process of emergence of new economic activities and organisations. The longitudinal project has followed a large random sample of nascent firms (n=625) and young firms (n=559) over a six year period. NFs are on-going start-up efforts while YFs are already established but less than four years old. The study also includes a comparison group of non-founders and over-samples of over 100 high potential start-ups in each category. The CAUSEE dataset file contains hundreds of variables throughout 5 waves of data collection. Extensive documentation on the dataset is available in the related handbook. The CAUSEE project has received significant external funding from the Australian Research Council (DP0666616 and LP0776845); National Australia Bank; BDO Australia, and the Australian Government Department of Industry, Innovation and Science.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, increasing focus has been made on making good business decisions utilizing the product of data analysis. With the advent of the Big Data phenomenon, this is even more apparent than ever before. But the question is how can organizations trust decisions made on the basis of results obtained from analysis of untrusted data? Assurances and trust that data and datasets that inform these decisions have not been tainted by outside agency. This study will propose enabling the authentication of datasets specifically by the extension of the RESTful architectural scheme to include authentication parameters while operating within a larger holistic security framework architecture or model compliant to legislation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experimental studies have found that when the state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are trained using out-domain data, it significantly affects speaker verification performance due to the mismatch between development data and evaluation data. To overcome this problem we propose a novel unsupervised inter dataset variability (IDV) compensation approach to compensate the dataset mismatch. IDV-compensated PLDA system achieves over 10% relative improvement in EER values over out-domain PLDA system by effectively compensating the mismatch between in-domain and out-domain data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We explore the impact of delisting on the performance of the momentum trading strategy in Australia. We employ a new dataset of hand-collected delisting returns for all Australian stocks and provide the first study outside the U.S. to jointly examine the effects of delisting and missing returns on the magnitude of momentum profits. In the sample of all stocks, we find that the profitability of momentum strategies depends crucially on the returns of delisted stocks, especiallyon bankrupt firms. In the sample of large stocks, however, the momentum effect remains strong after controlling for the effect of delisted stocks, in contrast to the U.S. evidence in which delisting returns can explain 40% of momentum profits. As these large stocks are less exposed to liquidity risks, the momentum effect in Australia is even more puzzling than in the U.S.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we introduce a novel domain-invariant covariance normalization (DICN) technique to relocate both in-domain and out-domain i-vectors into a third dataset-invariant space, providing an improvement for out-domain PLDA speaker verification with a very small number of unlabelled in-domain adaptation i-vectors. By capturing the dataset variance from a global mean using both development out-domain i-vectors and limited unlabelled in-domain i-vectors, we could obtain domain- invariant representations of PLDA training data. The DICN- compensated out-domain PLDA system is shown to perform as well as in-domain PLDA training with as few as 500 unlabelled in-domain i-vectors for NIST-2010 SRE and 2000 unlabelled in-domain i-vectors for NIST-2008 SRE, and considerable relative improvement over both out-domain and in-domain PLDA development if more are available.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The deployment of new emerging technologies, such as cooperative systems, allows the traffic community to foresee relevant improvements in terms of traffic safety and efficiency. Autonomous vehicles are able to share information about the local traffic state in real time, which could result in a better reaction to the mechanism of traffic jam formation. An upstream single-hop radio broadcast network can improve the perception of each cooperative driver within a specific radio range and hence the traffic stability. The impact of vehicle to vehicle cooperation on the onset of traffic congestion is investigated analytically and through simulation. A next generation simulation field dataset is used to calibrate the full velocity difference car-following model, and the MOBIL lane-changing model is implemented. The robustness of the calibration as well as the heterogeneity of the drivers is discussed. Assuming that congestion can be triggered either by the heterogeneity of drivers' behaviours or abnormal lane-changing behaviours, the calibrated car-following model is used to assess the impact of a microscopic cooperative law on egoistic lane-changing behaviours. The cooperative law can help reduce and delay traffic congestion and can have a positive effect on safety indicators.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The MIT Lincoln Laboratory IDS evaluation methodology is a practical solution in terms of evaluating the performance of Intrusion Detection Systems, which has contributed tremendously to the research progress in that field. The DARPA IDS evaluation dataset has been criticized and considered by many as a very outdated dataset, unable to accommodate the latest trend in attacks. Then naturally the question arises as to whether the detection systems have improved beyond detecting these old level of attacks. If not, is it worth thinking of this dataset as obsolete? The paper presented here tries to provide supporting facts for the use of the DARPA IDS evaluation dataset. The two commonly used signature-based IDSs, Snort and Cisco IDS, and two anomaly detectors, the PHAD and the ALAD, are made use of for this evaluation purpose and the results support the usefulness of DARPA dataset for IDS evaluation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of unlabeled data from the test distribution, viz., Transductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Himalayan region is one of the most active seismic regions in the world and many researchers have highlighted the possibility of great seismic event in the near future due to seismic gap. Seismic hazard analysis and microzonation of highly populated places in the region are mandatory in a regional scale. Region specific Ground Motion Predictive Equation (GMPE) is an important input in the seismic hazard analysis for macro- and micro-zonation studies. Few GMPEs developed in India are based on the recorded data and are applicable for a particular range of magnitudes and distances. This paper focuses on the development of a new GMPE for the Himalayan region considering both the recorded and simulated earthquakes of moment magnitude 5.3-8.7. The Finite Fault simulation model has been used for the ground motion simulation considering region specific seismotectonic parameters from the past earthquakes and source models. Simulated acceleration time histories and response spectra are compared with available records. In the absence of a large number of recorded data, simulations have been performed at unavailable locations by adopting Apparent Stations concept. Earthquakes recorded up to 2007 have been used for the development of new GMPE and earthquakes records after 2007 are used to validate new GMPE. Proposed GMPE matched very well with recorded data and also with other highly ranked GMPEs developed elsewhere and applicable for the region. Comparison of response spectra also have shown good agreement with recorded earthquake data. Quantitative analysis of residuals for the proposed GMPE and region specific GMPEs to predict Nepal-India 2011 earthquake of Mw of 5.7 records values shows that the proposed GMPE predicts Peak ground acceleration and spectral acceleration for entire distance and period range with lower percent residual when compared to exiting region specific GMPEs. Crown Copyright (C) 2013 Published by Elsevier Ltd. All rights reserved.