45 resultados para Language Resources
Resumo:
The assignment of tasks to multiple resources becomes an interesting game theoretic problem, when both the task owner and the resources are strategic. In the classical, nonstrategic setting, where the states of the tasks and resources are observable by the controller, this problem is that of finding an optimal policy for a Markov decision process (MDP). When the states are held by strategic agents, the problem of an efficient task allocation extends beyond that of solving an MDP and becomes that of designing a mechanism. Motivated by this fact, we propose a general mechanism which decides on an allocation rule for the tasks and resources and a payment rule to incentivize agents' participation and truthful reports. In contrast to related dynamic strategic control problems studied in recent literature, the problem studied here has interdependent values: the benefit of an allocation to the task owner is not simply a function of the characteristics of the task itself and the allocation, but also of the state of the resources. We introduce a dynamic extension of Mezzetti's two phase mechanism for interdependent valuations. In this changed setting, the proposed dynamic mechanism is efficient, within period ex-post incentive compatible, and within period ex-post individually rational.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
With the introduction of the earth observing satellites, remote sensing has become an important tool in analyzing the Earth's surface characteristics, and hence in supplying valuable information necessary for the hydrologic analysis. Due to their capability to capture the spatial variations in the hydro-meteorological variables and frequent temporal resolution sufficient to represent the dynamics of the hydrologic processes, remote sensing techniques have significantly changed the water resources assessment and management methodologies. Remote sensing techniques have been widely used to delineate the surface water bodies, estimate meteorological variables like temperature and precipitation, estimate hydrological state variables like soil moisture and land surface characteristics, and to estimate fluxes such as evapotranspiration. Today, near-real time monitoring of flood, drought events, and irrigation management are possible with the help of high resolution satellite data. This paper gives a brief overview of the potential applications of remote sensing in water resources.
Resumo:
Water is the most important medium through which climate change influences human life. Rising temperatures together with regional changes in precipitation patterns are some of the impacts of climate change that have implications on water availability, frequency and intensity of floods and droughts, soil moisture, water quality, water supply and water demands for irrigation and hydropower generation. In this article we provide an introduction to the emerging field of hydrologic impacts of climate change with a focus on water availability, water quality and irrigation demands. Climate change estimates on regional or local spatial scales are burdened with a considerable amount of uncertainty, stemming from various sources such as climate models, downscaling and hydrological models used in the impact assessments and uncertainty in the downscaling relationships. The present article summarizes the recent advances on uncertainty modeling and regional impacts of climate change for the Mahanadi and Tunga-Bhadra Rivers in India.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
Drastic groundwater resource depletion due to excessive extraction for irrigation is a major concern in many parts of India. In this study, an attempt was made to simulate the groundwater scenario of the catchment using ArcSWAT. Due to the restriction on the maximum initial storage, the deep aquifer component in ArcSWAT was found to be insufficient to represent the excessive groundwater depletion scenario. Hence, a separate water balance model was used for simulating the deep aquifer water table. This approach is demonstrated through a case study for the Malaprabha catchment in India. Multi-site rainfall data was used to represent the spatial variation in the catchment climatology. Model parameters were calibrated using observed monthly stream flow data. Groundwater table simulation was validated using the qualitative information available from the field. The stream flow was found to be well simulated in the model. The simulated groundwater table fluctuation is also matching reasonably well with the field observations. From the model simulations, deep aquifer water table fluctuation was found very severe in the semi-arid lower parts of the catchment, with some areas showing around 60m depletion over a period of eight years. Copyright (c) 2012 John Wiley & Sons, Ltd.
Resumo:
Elasticity in cloud systems provides the flexibility to acquire and relinquish computing resources on demand. However, in current virtualized systems resource allocation is mostly static. Resources are allocated during VM instantiation and any change in workload leading to significant increase or decrease in resources is handled by VM migration. Hence, cloud users tend to characterize their workloads at a coarse grained level which potentially leads to under-utilized VM resources or under performing application. A more flexible and adaptive resource allocation mechanism would benefit variable workloads, such as those characterized by web servers. In this paper, we present an elastic resources framework for IaaS cloud layer that addresses this need. The framework provisions for application workload forecasting engine, that predicts at run-time the expected demand, which is input to the resource manager to modulate resource allocation based on the predicted demand. Based on the prediction errors, resources can be over-allocated or under-allocated as compared to the actual demand made by the application. Over-allocation leads to unused resources and under allocation could cause under performance. To strike a good trade-off between over-allocation and under-performance we derive an excess cost model. In this model excess resources allocated are captured as over-allocation cost and under-allocation is captured as a penalty cost for violating application service level agreement (SLA). Confidence interval for predicted workload is used to minimize this excess cost with minimal effect on SLA violations. An example case-study for an academic institute web server workload is presented. Using the confidence interval to minimize excess cost, we achieve significant reduction in resource allocation requirement while restricting application SLA violations to below 2-3%.
Resumo:
The low level, denuded, laterite landscape of coastal Uttara Kannada has a rich diversity of monsoon herbs, including threatened and newly discovered ones. Our study reveals that honey bees congregate on the ephemeral herb community of Utricularias, Eriocaulons and Impatiens during their gregarious monsoon flowering period. Apis dorsata had highest visitations on Utricularias, Impatiens and Flacourtia indica, whereas Trigona preferred Eriocaulons. Laterite herb flora merits conservation efforts as a keystone food resource for the insect community, especially for honey bees.
Resumo:
Global change in climate and consequent large impacts on regional hydrologic systems have, in recent years, motivated significant research efforts in water resources modeling under climate change. In an integrated future hydrologic scenario, it is likely that water availability and demands will change significantly due to modifications in hydro-climatic variables such as rainfall, reservoir inflows, temperature, net radiation, wind speed and humidity. An integrated regional water resources management model should capture the likely impacts of climate change on water demands and water availability along with uncertainties associated with climate change impacts and with management goals and objectives under non-stationary conditions. Uncertainties in an integrated regional water resources management model, accumulating from various stages of decision making include climate model and scenario uncertainty in the hydro-climatic impact assessment, uncertainty due to conflicting interests of the water users and uncertainty due to inherent variability of the reservoir inflows. This paper presents an integrated regional water resources management modeling approach considering uncertainties at various stages of decision making by an integration of a hydro-climatic variable projection model, a water demand quantification model, a water quantity management model and a water quality control model. Modeling tools of canonical correlation analysis, stochastic dynamic programming and fuzzy optimization are used in an integrated framework, in the approach presented here. The proposed modeling approach is demonstrated with the case study of the Bhadra Reservoir system in Karnataka, India.
Resumo:
Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system.
Resumo:
User authentication is essential for accessing computing resources, network resources, email accounts, online portals etc. To authenticate a user, system stores user credentials (user id and password pair) in system. It has been an interested field problem to discover user password from a system and similarly protecting them against any such possible attack. In this work we show that passwords are still vulnerable to hash chain based and efficient dictionary attacks. Human generated passwords use some identifiable patterns. We have analysed a sample of 19 million passwords, of different lengths, available online and studied the distribution of the symbols in the password strings. We show that the distribution of symbols in user passwords is affected by the native language of the user. From symbol distributions we can build smart and efficient dictionaries, which are smaller in size and their coverage of plausible passwords from Key-space is large. These smart dictionaries make dictionary based attacks practical.
Resumo:
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual articles, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialized approaches that are not feasible in the major reference databases. Many are labors of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on August 11 and 12, 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value. Proteins 2015; 83:1005-1013. (c) 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
Resumo:
During 11-12 August 2014, a Protein Bioinformatics and Community Resources Retreat was held at the Wellcome Trust Genome Campus in Hinxton, UK. This meeting brought together the principal investigators of several specialized protein resources (such as CAZy, TCDB and MEROPS) as well as those from protein databases from the large Bioinformatics centres (including UniProt and RefSeq). The retreat was divided into five sessions: (1) key challenges, (2) the databases represented, (3) best practices for maintenance and curation, (4) information flow to and from large data centers and (5) communication and funding. An important outcome of this meeting was the creation of a Specialist Protein Resource Network that we believe will improve coordination of the activities of its member resources. We invite further protein database resources to join the network and continue the dialogue.
Resumo:
Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.