140 resultados para incorporate probabilistic techniques


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Northeast India and its adjoining areas are characterized by very high seismic activity. According to the Indian seismic code, the region falls under seismic zone V, which represents the highest seismic-hazard level in the country. This region has experienced a number of great earthquakes, such as the Assam (1950) and Shillong (1897) earthquakes, that caused huge devastation in the entire northeast and adjacent areas by flooding, landslides, liquefaction, and damage to roads and buildings. In this study, an attempt has been made to find the probability of occurrence of a major earthquake (M-w > 6) in this region using an updated earthquake catalog collected from different sources. Thereafter, dividing the catalog into six different seismic regions based on different tectonic features and seismogenic factors, the probability of occurrences was estimated using three models: the lognormal, Weibull, and gamma distributions. We calculated the logarithmic probability of the likelihood function (ln L) for all six regions and the entire northeast for all three stochastic models. A higher value of ln L suggests a better model, and a lower value shows a worse model. The results show different model suits for different seismic zones, but the majority follows lognormal, which is better for forecasting magnitude size. According to the results, Weibull shows the highest conditional probabilities among the three models for small as well as large elapsed time T and time intervals t, whereas the lognormal model shows the lowest and the gamma model shows intermediate probabilities. Only for elapsed time T = 0, the lognormal model shows the highest conditional probabilities among the three models at a smaller time interval (t = 3-15 yrs). The opposite result is observed at larger time intervals (t = 15-25 yrs), which show the highest probabilities for the Weibull model. However, based on this study, the IndoBurma Range and Eastern Himalaya show a high probability of occurrence in the 5 yr period 2012-2017 with >90% probability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Image and video analysis requires rich features that can characterize various aspects of visual information. These rich features are typically extracted from the pixel values of the images and videos, which require huge amount of computation and seldom useful for real-time analysis. On the contrary, the compressed domain analysis offers relevant information pertaining to the visual content in the form of transform coefficients, motion vectors, quantization steps, coded block patterns with minimal computational burden. The quantum of work done in compressed domain is relatively much less compared to pixel domain. This paper aims to survey various video analysis efforts published during the last decade across the spectrum of video compression standards. In this survey, we have included only the analysis part, excluding the processing aspect of compressed domain. This analysis spans through various computer vision applications such as moving object segmentation, human action recognition, indexing, retrieval, face detection, video classification and object tracking in compressed videos.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Computer Assisted Assessment (CAA) has been existing for several years now. While some forms of CAA do not require sophisticated text understanding (e.g., multiple choice questions), there are also student answers that consist of free text and require analysis of text in the answer. Research towards the latter till date has concentrated on two main sub-tasks: (i) grading of essays, which is done mainly by checking the style, correctness of grammar, and coherence of the essay and (ii) assessment of short free-text answers. In this paper, we present a structured view of relevant research in automated assessment techniques for short free-text answers. We review papers spanning the last 15 years of research with emphasis on recent papers. Our main objectives are two folds. First we present the survey in a structured way by segregating information on dataset, problem formulation, techniques, and evaluation measures. Second we present a discussion on some of the potential future directions in this domain which we hope would be helpful for researchers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Signals recorded from the brain often show rhythmic patterns at different frequencies, which are tightly coupled to the external stimuli as well as the internal state of the subject. In addition, these signals have very transient structures related to spiking or sudden onset of a stimulus, which have durations not exceeding tens of milliseconds. Further, brain signals are highly nonstationary because both behavioral state and external stimuli can change on a short time scale. It is therefore essential to study brain signals using techniques that can represent both rhythmic and transient components of the signal, something not always possible using standard signal processing techniques such as short time fourier transform, multitaper method, wavelet transform, or Hilbert transform. In this review, we describe a multiscale decomposition technique based on an over-complete dictionary called matching pursuit (MP), and show that it is able to capture both a sharp stimulus-onset transient and a sustained gamma rhythm in local field potential recorded from the primary visual cortex. We compare the performance of MP with other techniques and discuss its advantages and limitations. Data and codes for generating all time-frequency power spectra are provided.