48 resultados para Noisy corpora.
Resumo:
This article suggests a theoretical and methodological framework for a systematic contrastive discourse analysis across languages and discourse communities through keywords, constituting a lexical approach to discourse analysis which is considered to be particularly fruitful for comparative analysis. We use a corpus assisted methodology, presuming meaning to be constituted, revealed and constrained by collocation environment. We compare the use of the keyword intégration and Integration in French and German public discourses about migration on the basis of newspaper corpora built from two French and German newspapers from 1998 to 2011. We look at the frequency of these keywords over the given time span, group collocates into thematic categories and discuss indicators of discursive salience by comparing the development of collocation profiles over time in both corpora as well as the occurrence of neologisms and compounds based on intégration/Integration.
Resumo:
We present a novel algorithm for concurrent model state and parameter estimation in nonlinear dynamical systems. The new scheme uses ideas from three dimensional variational data assimilation (3D-Var) and the extended Kalman filter (EKF) together with the technique of state augmentation to estimate uncertain model parameters alongside the model state variables in a sequential filtering system. The method is relatively simple to implement and computationally inexpensive to run for large systems with relatively few parameters. We demonstrate the efficacy of the method via a series of identical twin experiments with three simple dynamical system models. The scheme is able to recover the parameter values to a good level of accuracy, even when observational data are noisy. We expect this new technique to be easily transferable to much larger models.
Resumo:
Subspace clustering groups a set of samples from a union of several linear subspaces into clusters, so that the samples in the same cluster are drawn from the same linear subspace. In the majority of the existing work on subspace clustering, clusters are built based on feature information, while sample correlations in their original spatial structure are simply ignored. Besides, original high-dimensional feature vector contains noisy/redundant information, and the time complexity grows exponentially with the number of dimensions. To address these issues, we propose a tensor low-rank representation (TLRR) and sparse coding-based (TLRRSC) subspace clustering method by simultaneously considering feature information and spatial structures. TLRR seeks the lowest rank representation over original spatial structures along all spatial directions. Sparse coding learns a dictionary along feature spaces, so that each sample can be represented by a few atoms of the learned dictionary. The affinity matrix used for spectral clustering is built from the joint similarities in both spatial and feature spaces. TLRRSC can well capture the global structure and inherent feature information of data, and provide a robust subspace segmentation from corrupted data. Experimental results on both synthetic and real-world data sets show that TLRRSC outperforms several established state-of-the-art methods.