964 resultados para Markup Language for Manuscript Images


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a improved language modeling technique for Lempel-Ziv-Welch (LZW) based LID scheme. The previous approach to LID using LZW algorithm prepares the language pattern table using LZW algorithm. Because of the sequential nature of the LZW algorithm, several language specific patterns of the language were missing in the pattern table. To overcome this, we build a universal pattern table, which contains all patterns of different length. For each language it's corresponding language specific pattern table is constructed by retaining the patterns of the universal table whose frequency of appearance in the training data is above the threshold.This approach reduces the classification score (Compression Ratio [LZW-CR] or the weighted discriminant score[LZW-WDS]) for non native languages and increases the LID performance considerably.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fully structured and matured open source spatial and temporal analysis technology seems to be the official carrier of the future for planning of the natural resources especially in the developing nations. This technology has gained enormous momentum because of technical superiority, affordability and ability to join expertise from all sections of the society. Sustainable development of a region depends on the integrated planning approaches adopted in decision making which requires timely and accurate spatial data. With the increased developmental programmes, the need for appropriate decision support system has increased in order to analyse and visualise the decisions associated with spatial and temporal aspects of natural resources. In this regard Geographic Information System (GIS) along with remote sensing data support the applications that involve spatial and temporal analysis on digital thematic maps and the remotely sensed images. Open source GIS would help in wide scale applications involving decisions at various hierarchical levels (for example from village panchayat to planning commission) on economic viability, social acceptance apart from technical feasibility. GRASS (Geographic Resources Analysis Support System, http://wgbis.ces.iisc.ernet.in/grass) is an open source GIS that works on Linux platform (freeware), but most of the applications are in command line argument, necessitating a user friendly and cost effective graphical user interface (GUI). Keeping these aspects in mind, Geographic Resources Decision Support System (GRDSS) has been developed with functionality such as raster, topological vector, image processing, statistical analysis, geographical analysis, graphics production, etc. This operates through a GUI developed in Tcltk (Tool command language / Tool kit) under Linux as well as with a shell in X-Windows. GRDSS include options such as Import /Export of different data formats, Display, Digital Image processing, Map editing, Raster Analysis, Vector Analysis, Point Analysis, Spatial Query, which are required for regional planning such as watershed Analysis, Landscape Analysis etc. This is customised to Indian context with an option to extract individual band from the IRS (Indian Remote Sensing Satellites) data, which is in BIL (Band Interleaved by Lines) format. The integration of PostgreSQL (a freeware) in GRDSS aids as an efficient database management system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Image filtering techniques have numerous potential applications in biomedical imaging and image processing. The design of filters largely depends on the a-priori knowledge about the type of noise corrupting the image and image features. This makes the standard filters to be application and image specific. The most popular filters such as average, Gaussian and Wiener reduce noisy artifacts by smoothing. However, this operation normally results in smoothing of the edges as well. On the other hand, sharpening filters enhance the high frequency details making the image non-smooth. An integrated general approach to design filters based on discrete cosine transform (DCT) is proposed in this study for optimal medical image filtering. This algorithm exploits the better energy compaction property of DCT and re-arrange these coefficients in a wavelet manner to get the better energy clustering at desired spatial locations. This algorithm performs optimal smoothing of the noisy image by preserving high and low frequency features. Evaluation results show that the proposed filter is robust under various noise distributions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fusion of multi-sensor imaging data enables a synergetic interpretation of complementary information obtained by sensors of different spectral ranges. Multi-sensor data of diverse spectral, spatial and temporal resolutions require advanced numerical techniques for analysis and interpretation. This paper reviews ten advanced pixel based image fusion techniques – Component substitution (COS), Local mean and variance matching, Modified IHS (Intensity Hue Saturation), Fast Fourier Transformed-enhanced IHS, Laplacian Pyramid, Local regression, Smoothing filter (SF), Sparkle, SVHC and Synthetic Variable Ratio. The above techniques were tested on IKONOS data (Panchromatic band at 1 m spatial resolution and Multispectral 4 bands at 4 m spatial resolution). Evaluation of the fused results through various accuracy measures, revealed that SF and COS methods produce images closest to corresponding multi-sensor would observe at the highest resolution level (1 m).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents hierarchical clustering algorithms for land cover mapping problem using multi-spectral satellite images. In unsupervised techniques, the automatic generation of number of clusters and its centers for a huge database is not exploited to their full potential. Hence, a hierarchical clustering algorithm that uses splitting and merging techniques is proposed. Initially, the splitting method is used to search for the best possible number of clusters and its centers using Mean Shift Clustering (MSC), Niche Particle Swarm Optimization (NPSO) and Glowworm Swarm Optimization (GSO). Using these clusters and its centers, the merging method is used to group the data points based on a parametric method (k-means algorithm). A performance comparison of the proposed hierarchical clustering algorithms (MSC, NPSO and GSO) is presented using two typical multi-spectral satellite images - Landsat 7 thematic mapper and QuickBird. From the results obtained, we conclude that the proposed GSO based hierarchical clustering algorithm is more accurate and robust.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an improved hierarchical clustering algorithm for land cover mapping problem using quasi-random distribution. Initially, Niche Particle Swarm Optimization (NPSO) with pseudo/quasi-random distribution is used for splitting the data into number of cluster centers by satisfying Bayesian Information Criteria (BIC). Themain objective is to search and locate the best possible number of cluster and its centers. NPSO which highly depends on the initial distribution of particles in search space is not been exploited to its full potential. In this study, we have compared more uniformly distributed quasi-random with pseudo-random distribution with NPSO for splitting data set. Here to generate quasi-random distribution, Faure method has been used. Performance of previously proposed methods namely K-means, Mean Shift Clustering (MSC) and NPSO with pseudo-random is compared with the proposed approach - NPSO with quasi distribution(Faure). These algorithms are used on synthetic data set and multi-spectral satellite image (Landsat 7 thematic mapper). From the result obtained we conclude that use of quasi-random sequence with NPSO for hierarchical clustering algorithm results in a more accurate data classification.