840 resultados para in-domain data requirement


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes the limitations upon the amount of in- domain (NIST SREs) data required for training a probabilistic linear discriminant analysis (PLDA) speaker verification system based on out-domain (Switchboard) total variability subspaces. By limiting the number of speakers, the number of sessions per speaker and the length of active speech per session available in the target domain for PLDA training, we investigated the relative effect of these three parameters on PLDA speaker verification performance in the NIST 2008 and NIST 2010 speaker recognition evaluation datasets. Experimental results indicate that while these parameters depend highly on each other, to beat out-domain PLDA training, more than 10 seconds of active speech should be available for at least 4 sessions/speaker for a minimum of 800 speakers. If further data is available, considerable improvement can be made over solely out-domain PLDA training.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features. © 2012 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ferr?, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Convectively coupled equatorial waves are fundamental components of the interaction between the physics and dynamics of the tropical atmosphere. A new methodology, which isolates individual equatorial wave modes, has been developed and applied to observational data. The methodology assumes that the horizontal structures given by equatorial wave theory can be used to project upper- and lower-tropospheric data onto equatorial wave modes. The dynamical fields are first separated into eastward- and westward-moving components with a specified domain of frequency–zonal wavenumber. Each of the components for each field is then projected onto the different equatorial modes using the y structures of these modes given by the theory. The latitudinal scale yo of the modes is predetermined by data to fit the equatorial trapping in a suitable latitude belt y = ±Y. The extent to which the different dynamical fields are consistent with one another in their depiction of each equatorial wave structure determines the confidence in the reality of that structure. Comparison of the analyzed modes with the eastward- and westward-moving components in the convection field enables the identification of the dynamical structure and nature of convectively coupled equatorial waves. In a case study, the methodology is applied to two independent data sources, ECMWF Reanalysis and satellite-observed window brightness temperature (Tb) data for the summer of 1992. Various convectively coupled equatorial Kelvin, mixed Rossby–gravity, and Rossby waves have been detected. The results indicate a robust consistency between the two independent data sources. Different vertical structures for different wave modes and a significant Doppler shifting effect of the background zonal winds on wave structures are found and discussed. It is found that in addition to low-level convergence, anomalous fluxes induced by strong equatorial zonal winds associated with equatorial waves are important for inducing equatorial convection. There is evidence that equatorial convection associated with Rossby waves leads to a change in structure involving a horizontal structure similar to that of a Kelvin wave moving westward with it. The vertical structure may also be radically changed. The analysis method should make a very powerful diagnostic tool for investigating convectively coupled equatorial waves and the interaction of equatorial dynamics and physics in the real atmosphere. The results from application of the analysis method for a reanalysis dataset should provide a benchmark against which model studies can be compared.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two wavelet-based control variable transform schemes are described and are used to model some important features of forecast error statistics for use in variational data assimilation. The first is a conventional wavelet scheme and the other is an approximation of it. Their ability to capture the position and scale-dependent aspects of covariance structures is tested in a two-dimensional latitude-height context. This is done by comparing the covariance structures implied by the wavelet schemes with those found from the explicit forecast error covariance matrix, and with a non-wavelet- based covariance scheme used currently in an operational assimilation scheme. Qualitatively, the wavelet-based schemes show potential at modeling forecast error statistics well without giving preference to either position or scale-dependent aspects. The degree of spectral representation can be controlled by changing the number of spectral bands in the schemes, and the least number of bands that achieves adequate results is found for the model domain used. Evidence is found of a trade-off between the localization of features in positional and spectral spaces when the number of bands is changed. By examining implied covariance diagnostics, the wavelet-based schemes are found, on the whole, to give results that are closer to diagnostics found from the explicit matrix than from the nonwavelet scheme. Even though the nature of the covariances has the right qualities in spectral space, variances are found to be too low at some wavenumbers and vertical correlation length scales are found to be too long at most scales. The wavelet schemes are found to be good at resolving variations in position and scale-dependent horizontal length scales, although the length scales reproduced are usually too short. The second of the wavelet-based schemes is often found to be better than the first in some important respects, but, unlike the first, it has no exact inverse transform.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The plastid rRNA (rrn) operon in chloroplasts of tobacco (Nicotiana tabacum), maize, and pea is transcribed by the plastid-encoded plastid RNA polymerase from a ς70-type promoter (P1). In contrast, the rrn operon in spinach (Spinacia oleracea) and mustard chloroplasts is transcribed from the distinct Pc promoter, probably also by the plastid-encoded plastid RNA polymerase. Primer-extension analysis reported here indicates that in Arabidopsis both promoters may be active. To understand promoter selection in the plastid rrn operon in the different species, we have tested transcription from the spinach rrn promoter in transplastomic tobacco and from the tobacco rrn promoter in transplastomic Arabidopsis. Our data suggest that transcription of the rrn operon depends on species-specific factors that facilitate transcription initiation by the general transcription machinery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"Actii Synceri Sannazarii Vita Per Paulum Jovium": p. 5-6.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conceptual modeling forms an important part of systems analysis. If this is done incorrectly or incompletely, there can be serious implications for the resultant system, specifically in terms of rework and useability. One approach to improving the conceptual modelling process is to evaluate how well the model represents reality. Emergence of the Bunge-Wand-Weber (BWW) ontological model introduced a platform to classify and compare the grammar of conceptual modelling languages. This work applies the BWW theory to a real world example in the health arena. The general practice computing group data model was developed using the Barker Entity Relationship Modelling technique. We describe an experiment, grounded in ontological theory, which evaluates how well the GPCG data model is understood by domain experts. The results show that with the exception of the use of entities to represent events, the raw model is better understood by domain experts

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present new methodologies to generate rational function approximations of broadband electromagnetic responses of linear and passive networks of high-speed interconnects, and to construct SPICE-compatible, equivalent circuit representations of the generated rational functions. These new methodologies are driven by the desire to improve the computational efficiency of the rational function fitting process, and to ensure enhanced accuracy of the generated rational function interpolation and its equivalent circuit representation. Toward this goal, we propose two new methodologies for rational function approximation of high-speed interconnect network responses. The first one relies on the use of both time-domain and frequency-domain data, obtained either through measurement or numerical simulation, to generate a rational function representation that extrapolates the input, early-time transient response data to late-time response while at the same time providing a means to both interpolate and extrapolate the used frequency-domain data. The aforementioned hybrid methodology can be considered as a generalization of the frequency-domain rational function fitting utilizing frequency-domain response data only, and the time-domain rational function fitting utilizing transient response data only. In this context, a guideline is proposed for estimating the order of the rational function approximation from transient data. The availability of such an estimate expedites the time-domain rational function fitting process. The second approach relies on the extraction of the delay associated with causal electromagnetic responses of interconnect systems to provide for a more stable rational function process utilizing a lower-order rational function interpolation. A distinctive feature of the proposed methodology is its utilization of scattering parameters. For both methodologies, the approach of fitting the electromagnetic network matrix one element at a time is applied. It is shown that, with regard to the computational cost of the rational function fitting process, such an element-by-element rational function fitting is more advantageous than full matrix fitting for systems with a large number of ports. Despite the disadvantage that different sets of poles are used in the rational function of different elements in the network matrix, such an approach provides for improved accuracy in the fitting of network matrices of systems characterized by both strongly coupled and weakly coupled ports. Finally, in order to provide a means for enforcing passivity in the adopted element-by-element rational function fitting approach, the methodology for passivity enforcement via quadratic programming is modified appropriately for this purpose and demonstrated in the context of element-by-element rational function fitting of the admittance matrix of an electromagnetic multiport.