907 resultados para native language (L1)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The origin of Borneo's elephants is controversial. Two competing hypotheses argue that they are either indigenous, tracing back to the Pleistocene, or were introduced, descending from elephants imported in the 16th-18th centuries. Taxonomically, they have either been classified as a unique subspecies or placed under the Indian or Sumatran subspecies. If shown to be a unique indigenous population, this would extend the natural species range of the Asian elephant by 1300 km, and therefore Borneo elephants would have much greater conservation importance than if they were a feral population. We compared DNA of Borneo elephants to that of elephants from across the range of the Asian elephant, using a fragment of mitochondrial DNA, including part of the hypervariable d-loop, and five autosomal microsatellite loci. We find that Borneo's elephants are genetically distinct, with molecular divergence indicative of a Pleistocene colonisation of Borneo and subsequent isolation. We reject the hypothesis that Borneo's elephants were introduced. The genetic divergence of Borneo elephants warrants their recognition as a separate evolutionary significant unit. Thus, interbreeding Borneo elephants with those from other populations would be contraindicated in ex situ conservation, and their genetic distinctiveness makes them one of the highest priority populations for Asian elephant conservation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a new approach to spoken language modeling for language identification (LID) using the Lempel-Ziv-Welch (LZW) algorithm. The LZW technique is applicable to any kind of tokenization of the speech signal. Because of the efficiency of LZW algorithm to obtain variable length symbol strings in the training data, the LZW codebook captures the essentials of a language effectively. We develop two new deterministic measures for LID based on the LZW algorithm namely: (i) Compression ratio score (LZW-CR) and (ii) weighted discriminant score (LZW-WDS). To assess these measures, we consider error-free tokenization of speech as well as artificially induced noise in the tokenization. It is shown that for a 6 language LID task of OGI-TS database with clean tokenization, the new model (LZW-WDS) performs slightly better than the conventional bigram model. For noisy tokenization, which is the more realistic case, LZW-WDS significantly outperforms the bigram technique

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effect of an applied electric field on the magnetic properties of L1(0)-ordered CoPd thin films is investigated by first-principle calculations. Both the magnetic moment and the magnetocrystalline anisotropy of the surface atoms are changed by the electric field, but the net effect depends on the surface termination. The magnetocrystalline anisotropy switches from in-plane to perpendicular in the presence of external electric field. Typical magnetic-moment changes are 0.1 mu(B) per eV/angstrom The main mechanism is the shift of the Fermi level, but the anisotropy change also reflects a crystal-field change due to incomplete screening.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Native species' response to the presence of invasive species is context specific. This response cannot be studied in isolation from the prevailing environmental stresses in invaded habitats such as seasonal drought. We investigated the combined effects of an invasive shrub Lantana camara L. (lantana), seasonal rainfall and species' microsite preferences on the growth and survival of 1,105 naturally established seedlings of native trees and shrubs in a seasonally dry tropical forest. Individuals were followed from April 2008 to February 2010, and growth and survival measured in relation to lantana density, seasonality of rainfall and species characteristics in a 50-ha permanent forest plot located in Mudumalai, southern India. We used a mixed effects modelling approach to examine seedling growth and generalized linear models to examine seedling survival. The overall relative height growth rate of established seedlings was found to be very low irrespective of the presence or absence of dense lantana. 22-month growth rate of dry forest species was lower under dense lantana while moist forest species were not affected by the presence of lantana thickets. 4-month growth rates of all species increased with increasing inter-census rainfall. Community results may be influenced by responses of the most abundant species, Catunaregam spinosa, whose growth rates were always lower under dense lantana. Overall seedling survival was high, increased with increasing rainfall and was higher for species with dry forest preference than for species with moist forest preference. The high survival rates of naturally established seedlings combined with their basal sprouting ability in this forest could enable the persistence of woody species in the face of invasive species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Close packing of hydrophobic residues in the protein interior is an important determinant of protein stability. Cavities introduced by large to small substitutions are known to destabilize proteins. Conversely, native states of proteins and protein fragments can be stabilized by filling in existing cavities. Molten globules (MGs) were initially used to describe a state of protein which has well-defined secondary structure but little or no tertiary packing. Subsequent studies have shown that MGs do have some degree of native-like topology and specific packing. Wet molten globules (WMGs) with hydrated cores and considerably decreased packing relative to the native state have been studied extensively. Recently there has been renewed interest in identification and characterization of dry molten globules (DMGs). These are slightly expanded forms of the native state which show increased conformational flexibility, native-like main-chain hydrogen bonding and dry interiors. The generality of occurrence of DMGs during protein unfolding and the extent and nature of packing in DMGs remain to be elucidated. Packing interactions in native proteins and MGs can be probed through mutations. Next generation sequencing technologies make it possible to determine relative populations of mutants in a large pool. When this is coupled to phenotypic screens or cell-surface display, it becomes possible to rapidly examine large panels of single-site or multi-site mutants. From such studies, residue specific contributions to protein stability and function can be estimated in a highly parallelized fashion. This complements conventional biophysical methods for characterization of packing in native states and molten globules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Protein structure space is believed to consist of a finite set of discrete folds, unlike the protein sequence space which is astronomically large, indicating that proteins from the available sequence space are likely to adopt one of the many folds already observed. In spite of extensive sequence-structure correlation data, protein structure prediction still remains an open question with researchers having tried different approaches (experimental as well as computational). One of the challenges of protein structure prediction is to identify the native protein structures from a milieu of decoys/models. In this work, a rigorous investigation of Protein Structure Networks (PSNs) has been performed to detect native structures from decoys/ models. Ninety four parameters obtained from network studies have been optimally combined with Support Vector Machines (SVM) to derive a general metric to distinguish decoys/models from the native protein structures with an accuracy of 94.11%. Recently, for the first time in the literature we had shown that PSN has the capability to distinguish native proteins from decoys. A major difference between the present work and the previous study is to explore the transition profiles at different strengths of non-covalent interactions and SVM has indeed identified this as an important parameter. Additionally, the SVM trained algorithm is also applied to the recent CASP10 predicted models. The novelty of the network approach is that it is based on general network properties of native protein structures and that a given model can be assessed independent of any reference structure. Thus, the approach presented in this paper can be valuable in validating the predicted structures. A web-server has been developed for this purpose and is freely available at http://vishgraph.mbu.iisc.ernet.in/GraProStr/PSN-QA.html.