4 resultados para Data-driven modelling

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the practical seismic profile multiple reflections tend to impede the task of even the experienced interpreter in deducing information from the reflection data. Surface multiples are usually much stronger, more broadband, and more of a problem than internal multiples because the reflection coefficient at the water surface is much larger than the reflection coefficients found in the subsurface. For this reason most attempts to remove multiples from marine data focus on surface multiples, as will I. A surface-related multiple attenuation method can be formulated as an iterative procedure. In this essay a fully data-driven approach which is called MPI —multiple prediction through inversion (Wang, 2003) is applied to a real marine seismic data example. This is a pretty promising scheme for predicting a relative accurate multiple model by updating the multiple model iteratively, as we usually do in a linearized inverse problem. The prominent characteristic of MPI method lie in that it eliminate the need for an explicit surface operator which means it can model the multiple wavefield without any knowledge of surface and subsurface structures even a source signature. Another key feature of this scheme is that it can predict multiples not only in time but also in phase and in amplitude domain. According to the real data experiments it is shown that this scheme for multiple prediction can be made very efficient if a good initial estimate of the multiple-free data set can be provided in the first iteration. In the other core step which is multiple subtraction we use an expanded multi-channel matching filter to fulfil this aim. Compared to a normal multichannel matching filter where an original seismic trace is matched by a group of multiple-model traces, in EMCM filter a seismic trace is matched by not only a group of the ordinary multiple-model traces but also their adjoints generated mathematically. The adjoints of a multiple-model trace include its first derivative, its Hilbert transform and the derivative of the Hilbert transform. The third chapter of the thesis is the application for the real data using the previous methods we put forward from which we can obviously find the effectivity and prospect of the value in use. For this specific case I have done three group experiments to test the effectiveness of MPI method, compare different subtraction results with fixed filter length but different window length, invest the influence of the initial subtraction result for MPI method. In terms of the real data application, we do fine that the initial demultiple estimate take on a great deal of influence for the MPI method. Then two approaches are introduced to refine the intial demultiple estimate which are first arrival and masking filter respectively. In the last part some conclusions are drawn in terms of the previous results I have got.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Chinese Academy of Sciences ; National Science Foundation of China [41071059]; National Key Technology R&D Program of China [2008BAK50B06-02]; National Basic Research Program of China [2010CB950900, 2010CB950704]; Natural Sciences and Engineering Research Council of Canada