Biblioteca Digital

918 resultados para cross likelihood ratio

Automatic Labeling of Software Components and their Evolution using Log-Likelihood Ratio of Word Frequencies in Source Code

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As more and more open-source software components become available on the internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained at the level of source code due to the semantic gap between source code and the domain model. In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. In particular, we apply the approach to detect trends in the evolution of a software system.

Extended phone log-likelihood ratio features and acoustic-based I-vectors for language recognition

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.

Extending the task of diarization to speaker attribution

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.

Speaker linking using complete-linkage clustering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Robust automatic speaker linking and attribution

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research makes a major contribution which enables efficient searching and indexing of large archives of spoken audio based on speaker identity. It introduces a novel technique dubbed as “speaker attribution” which is the task of automatically determining ‘who spoke when?’ in recordings and then automatically linking the unique speaker identities within each recording across multiple recordings. The outcome of the research will also have significant impact in improving the performance of automatic speech recognition systems through the extracted speaker identities.

Speaker attribution of Australian broadcast news data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.

A speaker rediarization scheme for improving diarization in large two-speaker telephone datasets

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.

Measurement of the Ratio σtt̅ /σZ/γ*→ll and Precise Extraction of the tt̅ Cross Section

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report a measurement of the ratio of the tt̅ to Z/γ* production cross sections in √s=1.96 TeV pp̅ collisions using data corresponding to an integrated luminosity of up to 4.6 fb-1, collected by the CDF II detector. The tt̅ cross section ratio is measured using two complementary methods, a b-jet tagging measurement and a topological approach. By multiplying the ratios by the well-known theoretical Z/γ*→ll cross section predicted by the standard model, the extracted tt̅ cross sections are effectively insensitive to the uncertainty on luminosity. A best linear unbiased estimate is used to combine both measurements with the result σtt̅ =7.70±0.52 pb, for a top-quark mass of 172.5 GeV/c2.

Measurement of the Ratio σtt̅ /σZ/γ*→ll and Precise Extraction of the tt̅ Cross Section

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report a measurement of the ratio of the tt̅ to Z/γ* production cross sections in √s=1.96 TeV pp̅ collisions using data corresponding to an integrated luminosity of up to 4.6 fb-1, collected by the CDF II detector. The tt̅ cross section ratio is measured using two complementary methods, a b-jet tagging measurement and a topological approach. By multiplying the ratios by the well-known theoretical Z/γ*→ll cross section predicted by the standard model, the extracted tt̅ cross sections are effectively insensitive to the uncertainty on luminosity. A best linear unbiased estimate is used to combine both measurements with the result σtt̅ =7.70±0.52 pb, for a top-quark mass of 172.5 GeV/c2.

Measurement of the Ratio σtt̅ /σZ/γ*→ll and Precise Extraction of the tt̅ Cross Section

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report a measurement of the ratio of the top-antitop to Z/gamma* production cross sections in sqrt(s) = 1.96 TeV proton-antiproton collisions using data corresponding to an integrated luminosity of up to 4.6 fb-1, collected by the CDF II detector. The top-antitop cross section ratio is measured using two complementary methods, a b-jet tagging measurement and a topological approach. By multiplying the ratios by the well-known theoretical Z/gamma*->ll cross section, the extracted top-antitop cross sections are effectively insensitive to the uncertainty on luminosity. A best linear unbiased estimate is used to combine both measurements with the result sigma_(top-antitop) = 7.70 +/- 0.52 pb, for a top-quark mass of 172.5 GeV/c^2.

Sedentary behaviour and physical activity in bronchiectasis: a cross-sectional study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: The impact of bronchiectasis on sedentary behaviour and physical activity is unknown. It is important to explore this to identify the need for physical activity interventions and how to tailor interventions to this patient population. We aimed to explore the patterns and correlates of sedentary behaviour and physical activity in bronchiectasis.

METHODS: Physical activity was assessed in 63 patients with bronchiectasis using an ActiGraph GT3X+ accelerometer over seven days. Patients completed: questionnaires on health-related quality-of-life and attitudes to physical activity (questions based on an adaption of the transtheoretical model (TTM) of behaviour change); spirometry; and the modified shuttle test (MST). Multiple linear regression analysis using forward selection based on likelihood ratio statistics explored the correlates of sedentary behaviour and physical activity dimensions. Between-group analysis using independent sample t-tests were used to explore differences for selected variables.

RESULTS: Fifty-five patients had complete datasets. Average daily time, mean(standard deviation) spent in sedentary behaviour was 634(77)mins, light-lifestyle physical activity was 207(63)mins and moderate-vigorous physical activity (MVPA) was 25(20)mins. Only 11% of patients met recommended guidelines. Forced expiratory volume in one-second percentage predicted (FEV1% predicted) and disease severity were not correlates of sedentary behaviour or physical activity. For sedentary behaviour, decisional balance 'pros' score was the only correlate. Performance on the MST was the strongest correlate of physical activity. In addition to the MST, there were other important correlate variables for MVPA accumulated in ≥10-minute bouts (QOL-B Social Functioning) and for activity energy expenditure (Body Mass Index and QOL-B Respiratory Symptoms).

CONCLUSIONS: Patients with bronchiectasis demonstrated a largely inactive lifestyle and few met the recommended physical activity guidelines. Exercise capacity was the strongest correlate of physical activity, and dimensions of the QOL-B were also important. FEV1% predicted and disease severity were not correlates of sedentary behaviour or physical activity. The inclusion of a range of physical activity dimensions could facilitate in-depth exploration of patterns of physical activity. This study demonstrates the need for interventions targeted at reducing sedentary behaviour and increasing physical activity, and provides information to tailor interventions to the bronchiectasis population.

Measurement of the Lambda(b) cross section and the (Lambda)over-bar(b) to Lambda(b) ratio with J/psi Lambda decays in pp collisions at root s=7 TeV CMS Collaboration

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Fitting of Mixtures with Unspecified Number of Components Using Cross Validation Distance Estimate

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Estimation of the number of mixture components (k) is an unsolved problem. Available methods for estimation of k include bootstrapping the likelihood ratio test statistics and optimizing a variety of validity functionals such as AIC, BIC/MDL, and ICOMP. We investigate the minimization of distance between fitted mixture model and the true density as a method for estimating k. The distances considered are Kullback-Leibler (KL) and “L sub 2”. We estimate these distances using cross validation. A reliable estimate of k is obtained by voting of B estimates of k corresponding to B cross validation estimates of distance. This estimation methods with KL distance is very similar to Monte Carlo cross validated likelihood methods discussed by Smyth (2000). With focus on univariate normal mixtures, we present simulation studies that compare the cross validated distance method with AIC, BIC/MDL, and ICOMP. We also apply the cross validation estimate of distance approach along with AIC, BIC/MDL and ICOMP approach, to data from an osteoporosis drug trial in order to find groups that differentially respond to treatment.

Cross-sectoral differences in the drivers of innovation: evidence from the Irish Community Innovation Survey

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: The purpose of this paper is to analyse differences in the drivers of firm innovation performance across sectors. The literature often makes the assumption that firms in different sectors differ in their propensity to innovate but not in the drivers of innovation. The authors empirically assess whether this assumption is accurate through a series of econometric estimations and tests. Design/methodology/approach: The data used are derived from the Irish Community Innovation Survey 2004-2006. A series of multivariate probit models are estimated and the resulting coefficients are tested for parameter stability across sectors using likelihood ratio tests. Findings: The results indicate that there is a strong degree of heterogeneity in the drivers of innovation across sectors. The determinants of process, organisational, new to firm and new to market innovation varies across sectors suggesting that the pooling of sectors in an innovation production function may lead to biased inferences. Research limitations/implications: The implications of the results are that innovation policies targeted at stimulating innovation need to be tailored to particular industries. One size fits all policies would seem inappropriate given the large degree of heterogeneity observed across the drivers of innovation in different sectors. Originality/value: The value of this paper is that it provides an empirical test as to whether it is suitable to group sectoral data when estimating innovation production functions. Most papers simply include sectoral dummies, implying that only the propensity to innovate differs across sectors and that the slope of the coefficient estimates are in fact consistent across sectors.

First measurement of the ratio of branching fractions B(Λb0→Λc+μ-ν̅ μ)/B(Λb0→Λc+π-)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article presents the first measurement of the ratio of branching fractions B(Λb0→Λc+μ-ν̅ μ)/B(Λb0→Λc+π-). Measurements in two control samples using the same technique B(B̅ 0→D+μ-ν̅ μ)/B(B̅ 0→D+π-) and B(B̅ 0→D*(2010)+μ-ν̅ μ)/B(B̅ 0→D*(2010)+π-) are also reported. The analysis uses data from an integrated luminosity of approximately 172 pb-1 of pp̅ collisions at √s=1.96 TeV, collected with the CDF II detector at the Fermilab Tevatron. The relative branching fractions are measured to be B(Λb0→Λc+μ-ν̅ μ)/B(Λb0→Λc+π-)=16.6±3.0(stat)±1.0(syst)+2.6/-3.4(PDG)±0.3(EBR), B(B̅ 0→D+μ-ν̅ μ)/B(B̅ 0→D+π-)= 9.9±1.0(stat)±0.6(syst)±0.4(PDG)±0.5(EBR), and B(B̅ 0→D*(2010)+μ-ν̅ μ)/B(B̅ 0→D*(2010)+π-)=16.5±2.3(stat)± 0.6(syst)±0.5(PDG)±0.8(EBR). The uncertainties are from statistics (stat), internal systematics (syst), world averages of measurements published by the Particle Data Group or subsidiary measurements in this analysis (PDG), and unmeasured branching fractions estimated from theory (EBR), respectively. This article also presents measurements of the branching fractions of four new Λb0 semileptonic decays: Λb0→Λc(2595)+μ-ν̅ μ, Λb0→Λc(2625)+μ-ν̅ μ, Λb0→Σc(2455)0π+μ-ν̅ μ, and Λb0→Σc(2455)++π-μ-ν̅ μ, relative to the branching fraction of the Λb0→Λc+μ-ν̅ μ decay. Finally, the transverse-momentum distribution of Λb0 baryons produced in pp̅ collisions is measured and found to be significantly different from that of B̅ 0 mesons, which results in a modification in the production cross-section ratio σΛb0/σB̅ 0 with respect to the CDF I measurement.

«
1
2
3
4
5
6
7
8
...
61
62
»