22 resultados para Data Interpretation, Statistical

em Cambridge University Engineering Department Publications Database


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigate the dependency of electrostatic interaction forces on applied potentials in electrostatic force microscopy (EFM) as well as in related local potentiometry techniques such as Kelvin probe microscopy (KPM). The approximated expression of electrostatic interaction between two conductors, usually employed in EFM and KPM, may loose its validity when probe-sample distance is not very small, as often realized when realistic nanostructured systems with complex topography are investigated. In such conditions, electrostatic interaction does not depend solely on the potential difference between probe and sample, but instead it may depend on the bias applied to each conductor. For instance, electrostatic force can change from repulsive to attractive for certain ranges of applied potentials and probe-sample distances, and this fact cannot be accounted for by approximated models. We propose a general capacitance model, even applicable to more than two conductors, considering values of potentials applied to each of the conductors to determine the resulting forces and force gradients, being able to account for the above phenomenon as well as to describe interactions at larger distances. Results from numerical simulations and experiments on metal stripe electrodes and semiconductor nanowires supporting such scenario in typical regimes of EFM investigations are presented, evidencing the importance of a more rigorous modeling for EFM data interpretation. Furthermore, physical meaning of Kelvin potential as used in KPM applications can also be clarified by means of the reported formalism. © 2009 American Institute of Physics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Condition-based maintenance is concerned with the collection and interpretation of data to support maintenance decisions. The non-intrusive nature of vibration data enables the monitoring of enclosed systems such as gearboxes. It remains a significant challenge to analyze vibration data that are generated under fluctuating operating conditions. This is especially true for situations where relatively little prior knowledge regarding the specific gearbox is available. It is therefore investigated how an adaptive time series model, which is based on Bayesian model selection, may be used to remove the non-fault related components in the structural response of a gear assembly to obtain a residual signal which is robust to fluctuating operating conditions. A statistical framework is subsequently proposed which may be used to interpret the structure of the residual signal in order to facilitate an intuitive understanding of the condition of the gear system. The proposed methodology is investigated on both simulated and experimental data from a single stage gearbox. © 2011 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Spatial normalisation is a key element of statistical parametric mapping and related techniques for analysing cohort statistics on voxel arrays and surfaces. The normalisation process involves aligning each individual specimen to a template using some sort of registration algorithm. Any misregistration will result in data being mapped onto the template at the wrong location. At best, this will introduce spatial imprecision into the subsequent statistical analysis. At worst, when the misregistration varies systematically with a covariate of interest, it may lead to false statistical inference. Since misregistration generally depends on the specimen's shape, we investigate here the effect of allowing for shape as a confound in the statistical analysis, with shape represented by the dominant modes of variation observed in the cohort. In a series of experiments on synthetic surface data, we demonstrate how allowing for shape can reveal true effects that were previously masked by systematic misregistration, and also guard against misinterpreting systematic misregistration as a true effect. We introduce some heuristics for disentangling misregistration effects from true effects, and demonstrate the approach's practical utility in a case study of the cortical bone distribution in 268 human femurs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gene microarray technology is highly effective in screening for differential gene expression and has hence become a popular tool in the molecular investigation of cancer. When applied to tumours, molecular characteristics may be correlated with clinical features such as response to chemotherapy. Exploitation of the huge amount of data generated by microarrays is difficult, however, and constitutes a major challenge in the advancement of this methodology. Independent component analysis (ICA), a modern statistical method, allows us to better understand data in such complex and noisy measurement environments. The technique has the potential to significantly increase the quality of the resulting data and improve the biological validity of subsequent analysis. We performed microarray experiments on 31 postmenopausal endometrial biopsies, comprising 11 benign and 20 malignant samples. We compared ICA to the established methods of principal component analysis (PCA), Cyber-T, and SAM. We show that ICA generated patterns that clearly characterized the malignant samples studied, in contrast to PCA. Moreover, ICA improved the biological validity of the genes identified as differentially expressed in endometrial carcinoma, compared to those found by Cyber-T and SAM. In particular, several genes involved in lipid metabolism that are differentially expressed in endometrial carcinoma were only found using this method. This report highlights the potential of ICA in the analysis of microarray data.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates several approaches to bootstrapping a new spoken language understanding (SLU) component in a target language given a large dataset of semantically-annotated utterances in some other source language. The aim is to reduce the cost associated with porting a spoken dialogue system from one language to another by minimising the amount of data required in the target language. Since word-level semantic annotations are costly, Semantic Tuple Classifiers (STCs) are used in conjunction with statistical machine translation models both of which are trained from unaligned data to further reduce development time. The paper presents experiments in which a French SLU component in the tourist information domain is bootstrapped from English data. Results show that training STCs on automatically translated data produced the best performance for predicting the utterance's dialogue act type, however individual slot/value pairs are best predicted by training STCs on the source language and using them to decode translated utterances. © 2010 ISCA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent work in the area of probabilistic user simulation for training statistical dialogue managers has investigated a new agenda-based user model and presented preliminary experiments with a handcrafted model parameter set. Training the model on dialogue data is an important next step, but non-trivial since the user agenda states are not observable in data and the space of possible states and state transitions is intractably large. This paper presents a summary-space mapping which greatly reduces the number of state transitions and introduces a tree-based method for representing the space of possible agenda state sequences. Treating the user agenda as a hidden variable, the forward/backward algorithm can then be successfully applied to iteratively estimate the model parameters on dialogue data. © 2007 Association for Computational Linguistics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data. © 2010 Association for Computational Linguistics.