Biblioteca Digital

364 resultados para Dataset

em Queensland University of Technology - ePrints Archive

Improved GMM-based speaker verification using SVM-driven impostor dataset selection

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of impostor dataset selection for GMM-based speaker verification is addressed through the recently proposed data-driven background dataset refinement technique. The SVM-based refinement technique selects from a candidate impostor dataset those examples that are most frequently selected as support vectors when training a set of SVMs on a development corpus. This study demonstrates the versatility of dataset refinement in the task of selecting suitable impostor datasets for use in GMM-based speaker verification. The use of refined Z- and T-norm datasets provided performance gains of 15% in EER in the NIST 2006 SRE over the use of heuristically selected datasets. The refined datasets were shown to generalise well to the unseen data of the NIST 2008 SRE.

Data-driven impostor selection for T-norm score normalisation and the background dataset in SVM-based speaker verification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A data-driven background dataset refinement technique was recently proposed for SVM based speaker verification. This method selects a refined SVM background dataset from a set of candidate impostor examples after individually ranking examples by their relevance. This paper extends this technique to the refinement of the T-norm dataset for SVM-based speaker verification. The independent refinement of the background and T-norm datasets provides a means of investigating the sensitivity of SVM-based speaker verification performance to the selection of each of these datasets. Using refined datasets provided improvements of 13% in min. DCF and 9% in EER over the full set of impostor examples on the 2006 SRE corpus with the majority of these gains due to refinement of the T-norm dataset. Similar trends were observed for the unseen data of the NIST 2008 SRE.

Data-driven background dataset selection for SVM-based speaker verification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.

Exploiting multiple feature sets in data-driven impostor dataset selection for speaker verification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study assesses the recently proposed data-driven background dataset refinement technique for speaker verification using alternate SVM feature sets to the GMM supervector features for which it was originally designed. The performance improvements brought about in each trialled SVM configuration demonstrate the versatility of background dataset refinement. This work also extends on the originally proposed technique to exploit support vector coefficients as an impostor suitability metric in the data-driven selection process. Using support vector coefficients improved the performance of the refined datasets in the evaluation of unseen data. Further, attempts are made to exploit the differences in impostor example suitability measures from varying features spaces to provide added robustness.

CAUSEE Dataset and documentation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Comprehensive Australian Study of Entrepreneurial Emergence (CAUSEE) is the largest study of new firm creation ever undertaken in Australia. The project provides an exciting opportunity to fundamentally improve our understanding of independent entrepreneurship in Australia by studying factors that initiate, hinder and facilitate the process of emergence of new economic activities and organisations. The longitudinal project has followed a large random sample of nascent firms (n=625) and young firms (n=559) over a six year period. NFs are on-going start-up efforts while YFs are already established but less than four years old. The study also includes a comparison group of non-founders and over-samples of over 100 high potential start-ups in each category. The CAUSEE dataset file contains hundreds of variables throughout 5 waves of data collection. Extensive documentation on the dataset is available in the related handbook. The CAUSEE project has received significant external funding from the Australian Research Council (DP0666616 and LP0776845); National Australia Bank; BDO Australia, and the Australian Government Department of Industry, Innovation and Science.

Dataset validation within big data security architecture

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, increasing focus has been made on making good business decisions utilizing the product of data analysis. With the advent of the Big Data phenomenon, this is even more apparent than ever before. But the question is how can organizations trust decisions made on the basis of results obtained from analysis of untrusted data? Assurances and trust that data and datasets that inform these decisions have not been tainted by outside agency. This study will propose enabling the authentication of datasets specifically by the extension of the RESTful architectural scheme to include authentication parameters while operating within a larger holistic security framework architecture or model compliant to legislation.

Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experimental studies have found that when the state-of-the-art probabilistic linear discriminant analysis (PLDA) speaker verification systems are trained using out-domain data, it significantly affects speaker verification performance due to the mismatch between development data and evaluation data. To overcome this problem we propose a novel unsupervised inter dataset variability (IDV) compensation approach to compensate the dataset mismatch. IDV-compensated PLDA system achieves over 10% relative improvement in EER values over out-domain PLDA system by effectively compensating the mismatch between in-domain and out-domain data.

Delisted stocks and momentum: Evidence from a new Australian dataset

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We explore the impact of delisting on the performance of the momentum trading strategy in Australia. We employ a new dataset of hand-collected delisting returns for all Australian stocks and provide the first study outside the U.S. to jointly examine the effects of delisting and missing returns on the magnitude of momentum profits. In the sample of all stocks, we find that the profitability of momentum strategies depends crucially on the returns of delisted stocks, especiallyon bankrupt firms. In the sample of large stocks, however, the momentum effect remains strong after controlling for the effect of delisted stocks, in contrast to the U.S. evidence in which delisting returns can explain 40% of momentum profits. As these large stocks are less exposed to liquidity risks, the momentum effect in Australia is even more puzzling than in the U.S.

Dataset-invariant covariance normalization for out-domain PLDA speaker verification

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we introduce a novel domain-invariant covariance normalization (DICN) technique to relocate both in-domain and out-domain i-vectors into a third dataset-invariant space, providing an improvement for out-domain PLDA speaker verification with a very small number of unlabelled in-domain adaptation i-vectors. By capturing the dataset variance from a global mean using both development out-domain i-vectors and limited unlabelled in-domain i-vectors, we could obtain domain- invariant representations of PLDA training data. The DICN- compensated out-domain PLDA system is shown to perform as well as in-domain PLDA training with as few as 500 unlabelled in-domain i-vectors for NIST-2010 SRE and 2000 unlabelled in-domain i-vectors for NIST-2008 SRE, and considerable relative improvement over both out-domain and in-domain PLDA development if more are available.

Microscopic cooperative traffic flow: Calibration and simulation based on a next generation simulation dataset

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The deployment of new emerging technologies, such as cooperative systems, allows the traffic community to foresee relevant improvements in terms of traffic safety and efficiency. Autonomous vehicles are able to share information about the local traffic state in real time, which could result in a better reaction to the mechanism of traffic jam formation. An upstream single-hop radio broadcast network can improve the perception of each cooperative driver within a specific radio range and hence the traffic stability. The impact of vehicle to vehicle cooperation on the onset of traffic congestion is investigated analytically and through simulation. A next generation simulation field dataset is used to calibrate the full velocity difference car-following model, and the MOBIL lane-changing model is implemented. The robustness of the calibration as well as the heterogeneity of the drivers is discussed. Assuming that congestion can be triggered either by the heterogeneity of drivers' behaviours or abnormal lane-changing behaviours, the calibrated car-following model is used to assess the impact of a microscopic cooperative law on egoistic lane-changing behaviours. The cooperative law can help reduce and delay traffic congestion and can have a positive effect on safety indicators.

Effective 20 newsgroups dataset cleaning

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.

Too young to drink but old enough to drive under the influence : a study of underage offenders as seen in substance abuse treatment in Texas

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Driving under the influence (DUI) is a major road safety problem. Historically, alcohol has been assumed to play a larger role in crashes and DUI education programs have reflected this assumption, although recent evidence suggests that younger drivers are becoming more likely to drive drugged than to drive drunk. This is a study of 7096 Texas clients under age 21 who were admitted to state-funded treatment programs between 1997 and 2007 with a past-year DUI arrest, DUI probation, or DUI referral. Data were obtained from the State’s administrative dataset. Multivariate logistic regressions models were used to understand the differences between those minors entering treatment as a DUI as compared to a non-DUI as well as the risks for completing treatment and for being abstinent in the month prior to follow-up. A major finding was that over time, the primary problem for underage DUI drivers changed from alcohol to marijuana. Being abstinent in the month prior to discharge, having a primary problem with alcohol rather than another drug, and having more family involved were the strongest predictors of treatment completion. Living in a household where the client was exposed to alcohol abuse or drug use, having been in residential treatment, and having more drug and alcohol and family problems were the strongest predictors of not being abstinent at follow-up. As a result, there is a need to direct more attention towards meeting the needs of the young DUI population through programs that address drug as well as alcohol consumption problems.

Effectuation & newness : an intertwined relationship?

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Principal topic: Effectuation theory suggests that entrepreneurs develop their new ventures in an iterative way by selecting possibilities through flexibility and interactions with the market; a focus on affordability of loss rather than maximal return on the capital invested, and the development of pre-commitments and alliances from stakeholders (Sarasvathy, 2001, 2008; Sarasvathy et al., 2005, 2006). In contrast, causation may be described as a rationalistic reasoning method to create a company. After a comprehensive market analysis to discover opportunities, the entrepreneur will select the alternative with the higher expected return and implement it through the use of a business plan. However, little is known about the consequences of following either of these two processes. One aspect that remains unclear is the relationship between newness and effectuation. On one hand it can be argued that the combination of a means-centered, interactive (through pre-commitments and alliances with stakeholders from the early phases of the venture creation) and open-minded process (through flexibility of exploiting contingencies) should encourage and facilitate the development of innovative solutions. On the other hand, having a close relationship with their “future first customers” and focussing too much on the resources and knowledge already within the firm may be a constraint that is not conducive to innovation, or at least not to a radical innovation. While it has been suggested that effectuation strategy is more likely to be used by innovative entrepreneurs (Sarasvathy, 2001), this hypothesis has not been demonstrated yet (Sarasvathy, 2001). Method: In our attempt to capture newness in its different aspects we have considered the following four domains where newness may happen: new product/service; new method for promotion and sales; new production methods/sourcing; market creation. We identified how effectuation may be differently associated with these four domains of newness. To test our four sets of hypotheses a dataset of 1329 firms (702 nascent and 627 young firms) randomly selected in Australia was examined through ANOVA Tukey HSD Test. Results and Implications: Results indicate the existence of a curvilinear relationship between effectuation and newness where low and high levels of newness are associated with low level of effectuation while medium level of newness is associated with high level of effectuation. Implications for academia, practitioners and policy makers are also discussed.

Abstracting and correlating heterogeneous events to detect complex scenarios

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The research presented in this thesis addresses inherent problems in signaturebased intrusion detection systems (IDSs) operating in heterogeneous environments. The research proposes a solution to address the difficulties associated with multistep attack scenario specification and detection for such environments. The research has focused on two distinct problems: the representation of events derived from heterogeneous sources and multi-step attack specification and detection. The first part of the research investigates the application of an event abstraction model to event logs collected from a heterogeneous environment. The event abstraction model comprises a hierarchy of events derived from different log sources such as system audit data, application logs, captured network traffic, and intrusion detection system alerts. Unlike existing event abstraction models where low-level information may be discarded during the abstraction process, the event abstraction model presented in this work preserves all low-level information as well as providing high-level information in the form of abstract events. The event abstraction model presented in this work was designed independently of any particular IDS and thus may be used by any IDS, intrusion forensic tools, or monitoring tools. The second part of the research investigates the use of unification for multi-step attack scenario specification and detection. Multi-step attack scenarios are hard to specify and detect as they often involve the correlation of events from multiple sources which may be affected by time uncertainty. The unification algorithm provides a simple and straightforward scenario matching mechanism by using variable instantiation where variables represent events as defined in the event abstraction model. The third part of the research looks into the solution to address time uncertainty. Clock synchronisation is crucial for detecting multi-step attack scenarios which involve logs from multiple hosts. Issues involving time uncertainty have been largely neglected by intrusion detection research. The system presented in this research introduces two techniques for addressing time uncertainty issues: clock skew compensation and clock drift modelling using linear regression. An off-line IDS prototype for detecting multi-step attacks has been implemented. The prototype comprises two modules: implementation of the abstract event system architecture (AESA) and of the scenario detection module. The scenario detection module implements our signature language developed based on the Python programming language syntax and the unification-based scenario detection engine. The prototype has been evaluated using a publicly available dataset of real attack traffic and event logs and a synthetic dataset. The distinct features of the public dataset are the fact that it contains multi-step attacks which involve multiple hosts with clock skew and clock drift. These features allow us to demonstrate the application and the advantages of the contributions of this research. All instances of multi-step attacks in the dataset have been correctly identified even though there exists a significant clock skew and drift in the dataset. Future work identified by this research would be to develop a refined unification algorithm suitable for processing streams of events to enable an on-line detection. In terms of time uncertainty, identified future work would be to develop mechanisms which allows automatic clock skew and clock drift identification and correction. The immediate application of the research presented in this thesis is the framework of an off-line IDS which processes events from heterogeneous sources using abstraction and which can detect multi-step attack scenarios which may involve time uncertainty.

The comprehensive Australian Study of entrepreneurial emergence (CAUSEE) high potential nascent entrepreneurs : some preliminary findings

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Principal Topic The Comprehensive Australian Study of Entrepreneurial Emergence (CAUSEE) represents the first Australian study to employ and extend the longitudinal and large scale systematic research developed for the Panel Study of Entrepreneurial Dynamics (PSED) in the US (Gartner, Shaver, Carter and Reynolds, 2004; Reynolds, 2007). This research approach addresses several shortcomings of other data sets including under coverage; selection bias; memory decay and hindsight bias, and lack of time separation between the assessment of causes and their assumed effects (Johnson et al 2006; Davidsson 2006). However, a remaining problem is that any a random sample of start-ups will be dominated by low potential, imitative ventures. In recognition of this issue CAUSEE supplemented PSED-type random samples with theoretically representative samples of the 'high potential' emerging ventures employing a unique methodology using novel multiple screening criteria. We define new ''high-potential'' ventures as new entrepreneurial innovative ventures with high aspirations and potential for growth. This distinguishes them from those ''lifestyle'' imitative businesses that start small and remain intentionally small (Timmons, 1986). CAUSEE is providing the opportunity to explore, for the first time, if process and outcomes of high potentials differ from those of traditional lifestyle firms. This will allows us to compare process and outcome attributes of the random sample with the high potential over sample of new firms and young firms. The attributes in which we will examine potential differences will include source of funding, and internationalisation. This is interesting both in terms of helping to explain why different outcomes occur but also in terms of assistance to future policymaking, given that high growth potential firms are increasingly becoming the focus of government intervention in economic development policies around the world. The first wave of data of a four year longitudinal study has been collected using these samples, allowing us to also provide some initial analysis on which to continue further research. The aim of this paper therefore is to present some selected preliminary results from the first wave of the data collection, with comparisons of high potential with lifestyle firms. We expect to see owing to greater resource requirements and higher risk profiles, more use of venture capital and angel investment, and more internationalisation activity to assist in recouping investment and to overcome Australia's smaller economic markets Methodology/Key Propositions In order to develop the samples of 'high potential' in the NF and YF categories a set of qualification criteria were developed. Specifically, to qualify, firms as nascent or young high potentials, we used multiple, partly compensating screening criteria related to the human capital and aspirations of the founders as well as the novelty of the venture idea, and venture high technology. A variety of techniques were also employed to develop a multi level dataset of sources to develop leads and firm details. A dataset was generated from a variety of websites including major stakeholders including the Federal and State Governments, Australian Chamber of Commerce, University Commercialisation Offices, Patent and Trademark Attorneys, Government Awards and Industry Awards in Entrepreneurship and Innovation, Industry lead associations, Venture Capital Association, Innovation directories including Australian Technology Showcase, Business and Entrepreneurs Magazines including BRW and Anthill. In total, over 480 industry, association, government and award sources were generated in this process. Of these, 74 discrete sources generated high potentials that fufilled the criteria. 1116 firms were contacted as high potential cases. 331 cases agreed to participate in the screener, with 279 firms (134 nascents, and 140 young firms) successfully passing the high potential criteria. 222 Firms (108 Nascents and 113 Young firms) completed the full interview. For the general sample CAUSEE conducts screening phone interviews with a very large number of adult members of households randomly selected through random digit dialing using screening questions which determine whether respondents qualify as 'nascent entrepreneurs'. CAUSEE additionally targets 'young firms' those that commenced trading from 2004 or later. This process yielded 977 Nascent Firms (3.4%) and 1,011 Young Firms (3.6%). These were directed to the full length interview (40-60 minutes) either directly following the screener or later by appointment. The full length interviews were completed by 594 NF and 514 YF cases. These are the cases we will use in the comparative analysis in this report. Results and Implications The results for this paper are based on Wave one of the survey which has been completed and the data obtained. It is expected that the findings will assist in beginning to develop an understanding of high potential nascent and young firms in Australia, how they differ from the larger lifestyle entrepreneur group that makes up the vast majority of the new firms created each year, and the elements that may contribute to turning high potential growth status into high growth realities. The results have implications for Government in the design of better conditions for the creation of new business, firms who assist high potentials in developing better advice programs in line with a better understanding of their needs and requirements, individuals who may be considering becoming entrepreneurs in high potential arenas and existing entrepreneurs make better decisions.

«
1
2
3
4
5
6
7
8
...
24
25
»