944 resultados para computer language
Resumo:
HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.
Resumo:
FinnWordNet is a wordnet for Finnish that complies with the format of the Princeton WordNet (PWN) (Fellbaum, 1998). It was built by translating the PrincetonWordNet 3.0 synsets into Finnish by human translators. It is open source and contains 117000 synsets. The Finnish translations were inserted into the PWN structure resulting in a bilingual lexical database. In natural language processing (NLP), wordnets have been used for infusing computers with semantic knowledge assuming that humans already have a sufficient amount of this knowledge. In this paper we present a case study of using wordnets as an electronic dictionary. We tested whether native Finnish speakers benefit from using a wordnet while completing English sentence completion tasks. We found that using either an English wordnet or a bilingual English Finnish wordnet significantly improves performance in the task. This should be taken into account when setting standards and comparing human and computer performance on these tasks.
Resumo:
This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described. Moreover, results achieved in first five months of the project, i.e. language whitepapers, metadata specification and IPR, are presented.
Resumo:
In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrate rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet.
Resumo:
Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.
Resumo:
Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.
Resumo:
Electronic, magnetic, and structural properties of graphene flakes depend sensitively upon the type of edge atoms. We present a simple software tool for determining the type of edge atoms in a honeycomb lattice. The algorithm is based on nearest neighbor counting. Whether an edge atom is of armchair or zigzag type is decided by the unique pattern of its nearest neighbors. Particular attention is paid to the practical aspects of using the tool, as additional features such as extracting out the edges from the lattice could help in analyzing images from transmission microscopy or other experimental probes. Ultimately, the tool in combination with density-functional theory or tight-binding method can also be helpful in correlating the properties of graphene flakes with the different armchair-to-zigzag ratios. Program summary Program title: edgecount Catalogue identifier: AEIA_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEIA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 66685 No. of bytes in distributed program, including test data, etc.: 485 381 Distribution format: tar.gz Programming language: FORTRAN 90/95 Computer: Most UNIX-based platforms Operating system: Linux, Mac OS Classification: 16.1, 7.8 Nature of problem: Detection and classification of edge atoms in a finite patch of honeycomb lattice. Solution method: Build nearest neighbor (NN) list; assign types to edge atoms on the basis of their NN pattern. Running time: Typically similar to second(s) for all examples. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In the past two decades RNase A has been the focus of diverse investigations in order to understand the nature of substrate binding and to know the mechanism of enzyme action. Although this system is reasonably well characterized from the view point of some of the binding sites, the details of interactions in the second base binding (B2) site is insufficient. Further, the nature of ligand-protein interaction is elucidated generally by studies on RNase A-substrate analog complexes (mainly with the help of X-ray crystallography). Hence, the details of interactions at atomic level arising due to substrates are inferred indirectly. In the present paper, the dinucleotide substrate UpA is fitted into the active site of RNase A Several possible substrate conformations are investigated and the binding modes have been selected based on Contact Criteria. Thus identified RNase A-UpA complexes are energy minimized in coordinate space and are analysed in terms of conformations, energetics and interactions. The best possible ligand conformations for binding to RNase A are identified by experimentally known interactions and by the energetics. Upon binding of UpA to RNase A the changes associated,with protein back bone, Side chains in general and at the binding sites in particular are described. Further, the detailed interactions between UpA and RNase A are characterized in terms of hydrogen bonds and energetics. An extensive study has helped in interpreting the diverse results obtained from a number of experiments and also in evaluating the extent of changes the protein and the substrate undergo in order to maximize their interactions.
Resumo:
A theoretical analysis of the three currently popular microscopic theories of solvation dynamics, namely, the dynamic mean spherical approximation (DMSA), the molecular hydrodynamic theory (MHT), and the memory function theory (MFT) is carried out. It is shown that in the underdamped limit of momentum relaxation, all three theories lead to nearly identical results when the translational motions of both the solute ion and the solvent molecules are neglected. In this limit, the theoretical prediction is in almost perfect agreement with the computer simulation results of solvation dynamics in the model Stockmayer liquid. However, the situation changes significantly in the presence of the translational motion of the solvent molecules. In this case, DMSA breaks down but the other two theories correctly predict the acceleration of solvation in agreement with the simulation results. We find that the translational motion of a light solute ion can play an important role in its own solvation. None of the existing theories describe this aspect. A generalization of the extended hydrodynamic theory is presented which, for the first time, includes the contribution of solute motion towards its own solvation dynamics. The extended theory gives excellent agreement with the simulations where solute motion is allowed. It is further shown that in the absence of translation, the memory function theory of Fried and Mukamel can be recovered from the hydrodynamic equations if the wave vector dependent dissipative kernel in the hydrodynamic description is replaced by its long wavelength value. We suggest a convenient memory kernel which is superior to the limiting forms used in earlier descriptions. We also present an alternate, quite general, statistical mechanical expression for the time dependent solvation energy of an ion. This expression has remarkable similarity with that for the translational dielectric friction on a moving ion.
Resumo:
We describe a compiler for the Flat Concurrent Prolog language on a message passing multiprocessor architecture. This compiler permits symbolic and declarative programming in the syntax of Guarded Horn Rules, The implementation has been verified and tested on the 64-node PARAM parallel computer developed by C-DAC (Centre for the Development of Advanced Computing, India), Flat Concurrent Prolog (FCP) is a logic programming language designed for concurrent programming and parallel execution, It is a process oriented language, which embodies dataflow synchronization and guarded-command as its basic control mechanisms. An identical algorithm is executed on every processor in the network, We assume regular network topologies like mesh, ring, etc, Each node has a local memory, The algorithm comprises of two important parts: reduction and communication, The most difficult task is to integrate the solutions of problems that arise in the implementation in a coherent and efficient manner. We have tested the efficacy of the compiler on various benchmark problems of the ICOT project that have been reported in the recent book by Evan Tick, These problems include Quicksort, 8-queens, and Prime Number Generation, The results of the preliminary tests are favourable, We are currently examining issues like indexing and load balancing to further optimize our compiler.
Resumo:
The modes of binding of adenosine 2'-monophosphate (2'-AMP) to the enzyme ribonuclease (RNase) T1 were determined by computer modelling studies. The phosphate moiety of 2'-AMP binds at the primary phosphate binding site. However, adenine can occupy two distinct sites--(1) The primary base binding site where the guanine of 2'-GMP binds and (2) The subsite close to the N1 subsite for the base on the 3'-side of guanine in a guanyl dinucleotide. The minimum energy conformers corresponding to the two modes of binding of 2'-AMP to RNase T1 were found to be of nearly the same energy implying that in solution 2'-AMP binds to the enzyme in both modes. The conformation of the inhibitor and the predicted hydrogen bonding scheme for the RNase T1-2'-AMP complex in the second binding mode (S) agrees well with the reported x-ray crystallographic study. The existence of the first mode of binding explains the experimental observations that RNase T1 catalyses the hydrolysis of phosphodiester bonds adjacent to adenosine at high enzyme concentrations. A comparison of the interactions of 2'-AMP and 2'-GMP with RNase T1 reveals that Glu58 and Asn98 at the phosphate binding site and Glu46 at the base binding site preferentially stabilise the enzyme-2'-GMP complex.
Resumo:
This work describes an online handwritten character recognition system working in combination with an offline recognition system. The online input data is also converted into an offline image, and parallely recognized by both online and offline strategies. Features are proposed for offline recognition and a disambiguation step is employed in the offline system for the samples for which the confidence level of the classifier is low. The outputs are then combined probabilistically resulting in a classifier out-performing both individual systems. Experiments are performed for Kannada, a South Indian Language, over a database of 295 classes. The accuracy of the online recognizer improves by 11% when the combination with offline system is used.
Resumo:
Bacteriorhodopsin has been the subject of intense study in order to understand its photochemical function. The recent atomic model proposed by Henderson and coworkers based on electron cryo-microscopic studies has helped in understanding many of the structural and functional aspects of bacteriorhodopsin. However, the accuracy of the positions of the side chains is not very high since the model is based on low-resolution data. In this study, we have minimized the energy of this structure of bacteriorhodopsin and analyzed various types of interactions such as - intrahelical and interhelical hydrogen bonds and retinal environment. In order to understand the photochemical action, it is necessary to obtain information on the structures adopted at the intermediate states. In this direction, we have generated some intermediate structures taking into account certain experimental data, by computer modeling studies. Various isomers of retinal with 13-cis and/or 15-cis conformations and all possible staggered orientations of Lys-216 side chain were generated. The resultant structures were examined for the distance between Lys-216-schiff base nitrogen and the carboxylate oxygen atoms of Asp-96 - a residue which is known to reprotonate the schiff base at later stages of photocycle. Some of the structures were selected on the basis of suitable retinal orientation and the stability of these structures were tested by energy minimization studies. Further, the minimized structures are analyzed for the hydrogen bond interactions and retinal environment and the results are compared with those of the minimized rest state structure. The importance of functional groups in stabilizing the structure of bacteriorhodopsin and in participating dynamically during the photocycle have been discussed.