983 resultados para Old Norse language.


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we approach the problem of computing the characteristic polynomial of a matrix from the combinatorial viewpoint. We present several combinatorial characterizations of the coefficients of the characteristic polynomial, in terms of walks and closed walks of different kinds in the underlying graph. We develop algorithms based on these characterizations, and show that they tally with well-known algorithms arrived at independently from considerations in linear algebra.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the first stable isotope (delta O-18 and delta C-13) data of a similar to 400 years (1590-2006 AD) long annual to decadal-resolution speleothem record collected from the Indian Lesser Himalaya. The data show a variation from -2.7 to -5.9 parts per thousand in delta O-18 and -5.3 to -8.8 parts per thousand in delta C-13. The isotopic analyses indicate that the climate during this period can be divided into two stages: a wet phase during the Little Ice Age (LIA) (1590-1850 AD) and comparatively dry phase during the post-LIA after 1850 AD. However, the record also documents the minor dry events during the LIA and a wet episode after the LIA. Within the age uncertainty, the dry spells during the LIA are linked with the historical drought events in the Indian subcontinent and similar latitudes. The isotopic record is consistent with a number of previous studies in the areas influenced by the Westerlies but appears to be conflicting to the regions, dominated by the Indian Summer Monsoon (ISM). This may be due to the possible changes in the strength of Westerlies in the study area and added by negative anomaly of North Atlantic Oscillation (NAO) during the LIA. (C) 2012 Elsevier Ltd and INQUA. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Carbon isotope compositions of carbonate rocks from similar to 2.7-Ga-old Neoarchean Vanivilas Formation of the Dharwar Supergroup presented earlier by us are re-evaluated in this study, besides oxygen isotope compositions of a few silica dolomite pairs. The purpose of such a revisit assumes significance in view of recent field evidences that suggest a glaciomarine origin for the matrix-supported conglomerate member, the Talya conglomerate, which underlies the carbonate rocks of the Vanivilas Formation. An in-depth analysis of carbon isotope data reveals preservation of their pristine character despite the rocks having been subjected to metamorphism to different degrees (from lower greenschist to lower amphibolite facies). The dolomitic member of Vanivilas Formation of Marikanive area is characterized by highly depleted delta C-13 value (up to -5 parts per thousand VPDB) and merits as the Indian example of ca. 2.7-Ga-old cap carbonate. This inference is further supported by estimated low temperature of equilibration documented by a few silica dolomite pairs from the Vanivilas Formation collected near Kalche area. These pairs show evidence for oxygen isotopic equilibrium at low temperatures (similar to 0-20 degrees C) with depleted water (delta O-18 = -21 parts per thousand to -15 parts per thousand VSMOW) of glacial origin. We propose that the mineral pairs were deposited during the deglaciation period when the ocean temperature was in its gradual restoration phase. The dolomite of Marikanive area is the first record of cap carbonates from the Indian subcontinent with Neoarchean antiquity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many bacterial transcription factors do not behave as per the textbook operon model. We draw on whole genome work, as well as reported diversity across different bacteria, to argue that transcription factors may have evolved from nucleoid-associated proteins. This view would explain a large amount of recent data gleaned from high-throughput sequencing and bioinformatic analyses.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have constructed plasmids to be used for in vitro signature-tagged mutagenesis (STM) of Campylobacter jejuni and used these to generate STM libraries in three different strains. Statistical analysis of the transposon insertion sites in the C. jejuni NCTC 11168 chromosome and the plasmids of strain 81-176 indicated that their distribution was not uniform. Visual inspection of the distribution suggested that deviation from uniformity was not due to preferential integration of the transposon into a limited number of hot spots but rather that there was a bias towards insertions around the origin. We screened pools of mutants from the STM libraries for their ability to colonize the ceca of 2-week-old chickens harboring a standardized gut flora. We observed high-frequency random loss of colonization proficient mutants. When cohoused birds were individually inoculated with different tagged mutants, random loss of colonization-proficient mutants was similarly observed, as was extensive bird-to-bird transmission of mutants. This indicates that the nature of campylobacter colonization in chickens is complex and dynamic, and we hypothesize that bottlenecks in the colonization process and between-bird transmission account for these observations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute. © 2007 IEEE.