28 resultados para Northern Bullom language
Resumo:
Current scientific research is characterized by increasing specialization, accumulating knowledge at a high speed due to parallel advances in a multitude of sub-disciplines. Recent estimates suggest that human knowledge doubles every two to three years – and with the advances in information and communication technologies, this wide body of scientific knowledge is available to anyone, anywhere, anytime. This may also be referred to as ambient intelligence – an environment characterized by plentiful and available knowledge. The bottleneck in utilizing this knowledge for specific applications is not accessing but assimilating the information and transforming it to suit the needs for a specific application. The increasingly specialized areas of scientific research often have the common goal of converting data into insight allowing the identification of solutions to scientific problems. Due to this common goal, there are strong parallels between different areas of applications that can be exploited and used to cross-fertilize different disciplines. For example, the same fundamental statistical methods are used extensively in speech and language processing, in materials science applications, in visual processing and in biomedicine. Each sub-discipline has found its own specialized methodologies making these statistical methods successful to the given application. The unification of specialized areas is possible because many different problems can share strong analogies, making the theories developed for one problem applicable to other areas of research. It is the goal of this paper to demonstrate the utility of merging two disparate areas of applications to advance scientific research. The merging process requires cross-disciplinary collaboration to allow maximal exploitation of advances in one sub-discipline for that of another. We will demonstrate this general concept with the specific example of merging language technologies and computational biology.
Resumo:
Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.
Resumo:
During summer, the northern Indian Ocean exhibits significant atmospheric intraseasonal variability associated with active and break phases of the monsoon in the 30-90 days band. In this paper, we investigate mechanisms of the Sea Surface Temperature (SST) signature of this atmospheric variability, using a combination of observational datasets and Ocean General Circulation Model sensitivity experiments. In addition to the previously-reported intraseasonal SST signature in the Bay of Bengal, observations show clear SST signals in the Arabian Sea related to the active/break cycle of the monsoon. As the atmospheric intraseasonal oscillation moves northward, SST variations appear first at the southern tip of India (day 0), then in the Somali upwelling region (day 10), northern Bay of Bengal (day 19) and finally in the Oman upwelling region (day 23). The Bay of Bengal and Oman signals are most clearly associated with the monsoon active/break index, whereas the relationship with signals near Somali upwelling and the southern tip of India is weaker. In agreement with previous studies, we find that heat flux variations drive most of the intraseasonal SST variability in the Bay of Bengal, both in our model (regression coefficient, 0.9, against similar to 0.25 for wind stress) and in observations (0.8 regression coefficient); similar to 60% of the heat flux variation is due do shortwave radiation and similar to 40% due to latent heat flux. On the other hand, both observations and model results indicate a prominent role of dynamical oceanic processes in the Arabian Sea. Wind-stress variations force about 70-100% of SST intraseasonal variations in the Arabian Sea, through modulation of oceanic processes (entrainment, mixing, Ekman pumping, lateral advection). Our similar to 100 km resolution model suggests that internal oceanic variability (i.e. eddies) contributes substantially to intraseasonal variability at small-scale in the Somali upwelling region, but does not contribute to large-scale intraseasonal SST variability due to its small spatial scale and random phase relation to the active-break monsoon cycle. The effect of oceanic eddies; however, remains to be explored at a higher spatial resolution.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Evaluating the hazard potential of the Makran subduction zone requires understanding the previous records of the large earthquakes and tsunamis. We address this problem by searching for earthquake and tectonic proxies along the Makran Coast and linking those observations with the available constraints on historical seismicity and the tell-tale characteristics of sea floor morphology. The earthquake of Mw 8.1 of 1945 and the consequent tsunami that originated on the eastern part of the Makran are the only historically known hazardous events in this region. The seismic status of the western part of the subduction zone outside the rupture area of the 1945 earthquake remains an enigma. The near-shore shallow stratigraphy of the central part of Makran near Chabahar shows evidence of seismically induced liquefaction that we attribute to the distant effects of the 1945 earthquake. The coastal sites further westward around Jask are remarkable for the absence of liquefaction features, at least at the shallow level. Although a negative evidence, this possibly implies that the western part of Makran Coast region may not have been impacted by near-field large earthquakes in the recent past-a fact also supported by the analysis of historical data. On the other hand, the elevated marine terraces on the western Makran and their uplift rates are indicative of comparable degree of long-term tectonic activity, at least around Chabahar. The offshore data suggest occurrences of recently active submarine slumps on the eastern part of the Makran, reflective of shaking events, owing to the great 1945 earthquake. The ocean floor morphologic features on the western segment, on the contrary, are much subdued and the prograding delta lobes on the shelf edge also remain intact. The coast on the western Makran, in general, shows indications of progradation and uplift. The various lines of evidence thus suggest that although the western segment is potentially seismogenic, large earthquakes have not occurred there in the recent past, at least during the last 600 years. The recurrence period of earthquakes may range up to 1,000 years or more, an assessment based on the age of the youngest dated coastal ridge. The long elapsed time points to the fact that the western segment may have accumulated sufficient slip to produce a major earthquake.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
In Northern Vietnam, organic fertilization programmes are being tested to restore soil fertility and reduce soil erosion. However, the amendment of organic matter in soil is also associated with the development of the invasive earthworm species Dichogaster bolaui. The objective of this study was to investigate the influence of organic matter amendment quality (compost vs. vermicompost) on D. bolaui. Our study confirmed D. bolaui development in organic patches in the field. However, we also observed that the flat-backed millipede Asiomorpha coarctata proliferated in these organic patches. Native to Asia, this millipede species is also considered as invasive in America. Both D. bolaui and A. coarctata more rapidly colonized compost than vermicompost patches. A laboratory experiment confirmed this trend and showed the limited development of D. bolaui in vermicompost. This is probably because of the decreased palatability of this substrate to soil fauna. In conclusion, any restoration practice that aims to increase the organic stocks in soils degraded by erosion should consider the quality of the organic amendment. In Northern Vietnam, vermicompost may be the preferred substrate for restoring soils while limiting the spread of D. bolaui. (C) 2014 Elsevier Masson SAS. All rights reserved.
Resumo:
Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system.
Resumo:
Terrestrial water storage (TWS) plays a key role in the global water cycle and is highly influenced by climate variability and human activities. In this study, monthly TWS, rainfall and Ganga-Brahmaputra river discharge (GBRD) are analysed over India for the period of 2003-12 using remote sensing satellite data. The spatial pattern of mean TWS shows a decrease over a large and populous region of Northern India comprising the foothills of the Himalayas, the Indo-Gangetic Plains and North East India. Over this region, the mean monthly TWS exhibits a pronounced seasonal cycle and a large interannual variability, highly correlated with rainfall and GBRD variations (r > 0.8) with a lag time of 2 months and 1 month respectively. The time series of monthly TWS shows a consistent and statistically significant decrease of about 1 cm year(-1) over Northern India, which is not associated with changes in rainfall and GBRD. This recent change in TWS suggests a possible impact of rapid industrialization, urbanization and increase in population on land water resources. Our analysis highlights the potential of the Earth-observation satellite data for hydrological applications.
Resumo:
Compost, vermicompost and biochar amendments are thought to improve soil quality and plant yield. However, little is known about their long-term impact on crop yield and the environment in tropical agro-ecosystems. In this study we investigated the effect of organic amendments (buffalo manure, compost and verrnicompost) and biochar (applied alone or with vermicompost) on plant yield, soil fertility, soil erosion and water dynamics in a degraded Acrisol in Vietnam. Maize growth and yield, as well as weed growth, were examined for three years in terrestrial mesocosms under natural rainfall. Maize yield and growth showed high inter-annual variability depending on the organic amendment. Vermicompost improved maize growth and yield but its effect was rather small and was only significant when water availability was limited (year 2). This suggests that vermicompost could be a promising substrate for improving the resistance of agrosystems to water stress. When the vermicompost biochar mixture was applied, further growth and yield improvements were recorded in some cases. When applied alone, biochar had a positive influence on maize yield and growth, thus confirming its interest for improving long-term soil productivity. All organic amendments reduced water runoff, soil detachment and NH4+ and NO3- transfer to water. These effects were more significant with vermicompost than with buffalo manure and compost, highlighting that the beneficial influence of vermicompost is not limited to its influence on plant yield. In addition, this study showed for the first time that the combination of vermicompost and biochar may not only improve plant productivity but also reduce the negative impact of agriculture on water quality. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.
Resumo:
Graph algorithms have been shown to possess enough parallelism to keep several computing resources busy-even hundreds of cores on a GPU. Unfortunately, tuning their implementation for efficient execution on a particular hardware configuration of heterogeneous systems consisting of multicore CPUs and GPUs is challenging, time consuming, and error prone. To address these issues, we propose a domain-specific language (DSL), Falcon, for implementing graph algorithms that (i) abstracts the hardware, (ii) provides constructs to write explicitly parallel programs at a higher level, and (iii) can work with general algorithms that may change the graph structure (morph algorithms). We illustrate the usage of our DSL to implement local computation algorithms (that do not change the graph structure) and morph algorithms such as Delaunay mesh refinement, survey propagation, and dynamic SSSP on GPU and multicore CPUs. Using a set of benchmark graphs, we illustrate that the generated code performs close to the state-of-the-art hand-tuned implementations.