2 resultados para Translation and rotation
em DRUM (Digital Repository at the University of Maryland)
Resumo:
Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.
Resumo:
Turnip crinkle virus (TCV) and Pea enation mosaic virus (PEMV) are two positive (+)-strand RNA viruses that are used to investigate the regulation of translation and replication due to their small size and simple genomes. Both viruses contain cap-independent translation elements (CITEs) within their 3´ untranslated regions (UTRs) that fold into tRNA-shaped structures (TSS) according to nuclear magnetic resonance and small angle x-ray scattering analysis (TCV) and computational prediction (PEMV). Specifically, the TCV TSS can directly associate with ribosomes and participates in RNA-dependent RNA polymerase (RdRp) binding. The PEMV kissing-loop TSS (kl-TSS) can simultaneously bind to ribosomes and associate with the 5´ UTR of the viral genome. Mutational analysis and chemical structure probing methods provide great insight into the function and secondary structure of the two 3´ CITEs. However, lack of 3-D structural information has limited our understanding of their functional dynamics. Here, I report the folding dynamics for the TCV TSS using optical tweezers (OT), a single molecule technique. My study of the unfolding/folding pathways for the TCV TSS has provided an unexpected unfolding pathway, confirmed the presence of Ψ3 and hairpin elements, and suggested an interconnection between the hairpins and pseudoknots. In addition, this study has demonstrated the importance of the adjacent upstream adenylate-rich sequence for the formation of H4a/Ψ3 along with the contribution of magnesium to the stability of the TCV TSS. In my second project, I report on the structural analysis of the PEMV kl-TSS using NMR and SAXS. This study has re-confirmed the base-pair pattern for the PEMV kl-TSS and the proposed interaction of the PEMV kl-TSS with its interacting partner, hairpin 5H2. The molecular envelope of the kl-TSS built from SAXS analysis suggests the kl-TSS has two functional conformations, one of which has a different shape from the previously predicted tRNA-shaped form. Along with applying biophysical methods to study the structural folding dynamics of RNAs, I have also developed a technique that improves the production of large quantities of recombinant RNAs in vivo for NMR study. In this project, I report using the wild-type and mutant E.coli strains to produce cost-effective, site-specific labeled, recombinant RNAs. This technique was validated with four representative RNAs of different sizes and complexity to produce milligram amounts of RNAs. The benefit of using site-specific labeled RNAs made from E.coli was demonstrated with several NMR techniques.