871 resultados para Language Acquisition
Resumo:
Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Assembly is an important part of the product development process. To avoid potential issues during assembly in specialized domains such as aircraft assembly, expert knowledge to predict such issues is helpful. Knowledge based systems can act as virtual experts to provide assistance. Knowledge acquisition for such systems however, is a challenge, and this paper describes one part of an ongoing research to acquire knowledge through a dialog between an expert and a knowledge acquisition system. In particular this paper discusses the use of a situation model for assemblies to present experts with a virtual assembly and help them locate the specific context of the knowledge they provide to the system.
Resumo:
Distributed compressed sensing exploits information redundancy, inbuilt in multi-signal ensembles with interas well as intra-signal correlations, to reconstruct undersampled signals. In this paper we revisit this problem, albeit from a different perspective, of taking streaming data, from several correlated sources, as input to a real time system which, without any a priori information, incrementally learns and admits each source into the system.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
In animal populations, the constraints of energy and time can cause intraspecific variation in foraging behaviour. The proximate developmental mediators of such variation are often the mechanisms underlying perception and associative learning. Here, experience-dependent changes in foraging behaviour and their consequences were investigated in an urban population of free-ranging dogs, Canis familiaris by continually challenging them with the task of food extraction from specially crafted packets. Typically, males and pregnant/lactating (PL) females extracted food using the sophisticated `gap widening' technique, whereas non-pregnant/non-lactating (NPNL) females, the relatively underdeveloped `rip opening' technique. In contrast to most males and PL females (and a few NPNL females) that repeatedly used the gap widening technique and improved their performance in food extraction with experience, most NPNL females (and a few males and PL females) non-preferentially used the two extraction techniques and did not improve over successive trials. Furthermore, the ability of dogs to sophisticatedly extract food was positively related to their ability to improve their performance with experience. Collectively, these findings demonstrate that factors such as sex and physiological state can cause differences among individuals in the likelihood of learning new information and hence, in the rate of resource acquisition and monopolization.
Resumo:
The design and development of a Bottom Pressure Recorder for a Tsunami Early Warning System is described here. The special requirements that it should satisfy for the specific application of deployment at ocean bed and pressure monitoring of the water column above are dealt with. A high-resolution data digitization and low circuit power consumption are typical ones. The implementation details of the data sensing and acquisition part to meet these are also brought out. The data processing part typically encompasses a Tsunami detection algorithm that should detect an event of significance in the background of a variety of periodic and aperiodic noise signals. Such an algorithm and its simulation are presented. Further, the results of sea trials carried out on the system off the Chennai coast are presented. The high quality and fidelity of the data prove that the system design is robust despite its low cost and with suitable augmentations, is ready for a full-fledged deployment at ocean bed. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
The widely conserved omega subunit encoded by rpoZ is the smallest subunit of Escherichia coli RNA polymerase (RNAP) but is dispensable for bacterial growth. Function of omega is known to be substituted by GroEL in omega-null strain, which thus does not exhibit a discernable phenotype. In this work, we report isolation of omega variants whose expression in vivo leads to a dominant lethal phenotype. Studies show that in contrast to omega, which is largely unstructured, omega mutants display substantial acquisition of secondary structure. By detailed study with one of the mutants, omega(6) bearing N60D substitution, the mechanism of lethality has been deciphered. Biochemical analysis reveals that omega(6) binds to beta ` subunit in vitro with greater affinity than that of omega. The reconstituted RNAP holoenzyme in the presence of omega(6) in vitro is defective in transcription initiation. Formation of a faulty RNAP in the presence of mutant omega results in death of the cell. Furthermore, lethality of omega(6) is relieved in cells expressing the rpoC2112 allele encoding beta ` (2112), a variant beta ` bearing Y457S substitution, immediately adjacent to the beta ` catalytic center. Our results suggest that the enhanced omega(6)-beta ` interaction may perturb the plasticity of the RNAP active center, implicating a role for omega and its flexible state.
Resumo:
Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system.
Resumo:
Following rising demands in positioning with GPS, low-cost receivers are becoming widely available; but their energy demands are still too high. For energy efficient GPS sensing in delay-tolerant applications, the possibility of offloading a few milliseconds of raw signal samples and leveraging the greater processing power of the cloud for obtaining a position fix is being actively investigated. In an attempt to reduce the energy cost of this data offloading operation, we propose Sparse-GPS(1): a new computing framework for GPS acquisition via sparse approximation. Within the framework, GPS signals can be efficiently compressed by random ensembles. The sparse acquisition information, pertaining to the visible satellites that are embedded within these limited measurements, can subsequently be recovered by our proposed representation dictionary. By extensive empirical evaluations, we demonstrate the acquisition quality and energy gains of Sparse-GPS. We show that it is twice as energy efficient than offloading uncompressed data, and has 5-10 times lower energy costs than standalone GPS; with a median positioning accuracy of 40 m.
Resumo:
In this study we showed that a freshwater fish, the climbing perch (Anabas testudineus) is incapable of using chemical communication but employs visual cues to acquire familiarity and distinguish a familiar group of conspecifics from an unfamiliar one. Moreover, the isolation of olfactory signals from visual cues did not affect the recognition and preference for a familiar shoal in this species.
Resumo:
NMR-based approach to metabolomics typically involves the collection of two-dimensional (2D) heteronuclear correlation spectra for identification and assignment of metabolites. In case of spectral overlap, a 3D spectrum becomes necessary, which is hampered by slow data acquisition for achieving sufficient resolution. We describe here a method to simultaneously acquire three spectra (one 3D and two 2D) in a single data set, which is based on a combination of different fast data acquisition techniques such as G-matrix Fourier transform (GFT) NMR spectroscopy, parallel data acquisition and non-uniform sampling. The following spectra are acquired simultaneously: (1) C-13 multiplicity edited GFT (3,2)D HSQC-TOCSY, (2) 2D H-1- H-1] TOCSY and (3) 2D C-13- H-1] HETCOR. The spectra are obtained at high resolution and provide high-dimensional spectral information for resolving ambiguities. While the GFT spectrum has been shown previously to provide good resolution, the editing of spin systems based on their CH multiplicities further resolves the ambiguities for resonance assignments. The experiment is demonstrated on a mixture of 21 metabolites commonly observed in metabolomics. The spectra were acquired at natural abundance of C-13. This is the first application of a combination of three fast NMR methods for small molecules and opens up new avenues for high-throughput approaches for NMR-based metabolomics.
Resumo:
Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.
Resumo:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.