12 resultados para Statistical Language Model
em Aston University Research Archive
Resumo:
Resource Space Model is a kind of data model which can effectively and flexibly manage the digital resources in cyber-physical system from multidimensional and hierarchical perspectives. This paper focuses on constructing resource space automatically. We propose a framework that organizes a set of digital resources according to different semantic dimensions combining human background knowledge in WordNet and Wikipedia. The construction process includes four steps: extracting candidate keywords, building semantic graphs, detecting semantic communities and generating resource space. An unsupervised statistical language topic model (i.e., Latent Dirichlet Allocation) is applied to extract candidate keywords of the facets. To better interpret meanings of the facets found by LDA, we map the keywords to Wikipedia concepts, calculate word relatedness using WordNet's noun synsets and construct corresponding semantic graphs. Moreover, semantic communities are identified by GN algorithm. After extracting candidate axes based on Wikipedia concept hierarchy, the final axes of resource space are sorted and picked out through three different ranking strategies. The experimental results demonstrate that the proposed framework can organize resources automatically and effectively.©2013 Published by Elsevier Ltd. All rights reserved.
Resumo:
In a linguistic context where it seems the entire world is only interested in learning English, it is worth considering the idea of whether French still has a place in Mexico. In spite of the predominance of English, there is nevertheless a feeling that French remains alive in Mexico, and indeed in certain areas has retained its strength and appeal. This hypothesis was to put to the test by exploring the current linguistic environment prevalent in the state of Veracruz. An investigation in the form of questionnaires and interviews of all those connected to the teaching of French (including students, teachers and employees and directors of language schools) shows that the desire of the Mexican government of promoting English for everyone is not necessarily consistent with the desire and expectations of the general populace. This in turn suggest the need of adopting a policy that enables us not only to take into consideration what people seem to be telling us regarding the learning of foreign learning but also of what they are not telling us. If the teaching and learning of French as a foreign language remains strong in Veracruz, it is explained much more by the long and friendly relationship that people in the state have had with French people (and their culture) than it is by any instrumental needs of learning their language. This is seen in the fact that students here consistently describe their motivation for learning French from an emotional or affective standpoint rather than from professional one. It seems that the ties between the Mexican and French people remain solid. Another interesting characteristic of students of French in Veracruz is the positive attitude they seem to have regarding languages in general, which in turn enables them to take further advantage of the benefits made available from globalization. In reality, there exists no rivalry between French and English and therefore it is unnecessary to adopt measures that would address such struggle. It is however a matter of great urgency that authorities in the arenas of politics and academia take a closer look at the policies they design regarding the study of foreign languages in general, and that they consider, specifically, a wholly alternative to the one language model of teaching and learning of foreign language – in this case English-, a model that for all intents and purposes has failed. In the midst of a globalized world, and during this current period of increased linguistic activity, the aforementioned assertions serve not only to support my initial hypothesis, but also to help shake off the dust of some out-dated belief systems and lay down the framework for a new, better informed and well thought-out policy of foreign languages planning.
Resumo:
All aspects of the concept of collocation – the phenomenon whereby words naturally tend to occur in the company of a restricted set of other words – are covered in this book. It deals in detail with the history of the word collocation, the concepts associated with it and its use in a linguistic context. The authors show the practical means by which the collocational behaviour of words can be explored using illustrative computer programs and examine applications in teaching, lexicography and natural language processing that use collocation in formation. The book investigates the place that collocation occupies in theories of language and provides a thoroughly comprehensive and up-to-date survey of the current position of collocation in language studies and applied linguistics. This text presents a comprehensive description of collocation, covering both the theoretical and practical background and the implications and applications of the concept as language model and analytical tool. It provides a definitive survey of currently available techniques and a detailed description of their implementation.
Resumo:
Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.
Resumo:
Information systems have developed to the stage that there is plenty of data available in most organisations but there are still major problems in turning that data into information for management decision making. This thesis argues that the link between decision support information and transaction processing data should be through a common object model which reflects the real world of the organisation and encompasses the artefacts of the information system. The CORD (Collections, Objects, Roles and Domains) model is developed which is richer in appropriate modelling abstractions than current Object Models. A flexible Object Prototyping tool based on a Semantic Data Storage Manager has been developed which enables a variety of models to be stored and experimented with. A statistical summary table model COST (Collections of Objects Statistical Table) has been developed within CORD and is shown to be adequate to meet the modelling needs of Decision Support and Executive Information Systems. The COST model is supported by a statistical table creator and editor COSTed which is also built on top of the Object Prototyper and uses the CORD model to manage its metadata.
Resumo:
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.
Resumo:
The recent explosive growth of voice over IP (VoIP) solutions calls for accurate modelling of VoIP traffic. This study presents measurements of ON and OFF periods of VoIP activity from a significantly large database of VoIP call recordings consisting of native speakers speaking in some of the world's most widely spoken languages. The impact of the languages and the varying dynamics of caller interaction on the ON and OFF period statistics are assessed. It is observed that speaker interactions dominate over language dependence which makes monologue-based data unreliable for traffic modelling. The authors derive a semi-Markov model which accurately reproduces the statistics of composite dialogue measurements. © The Institution of Engineering and Technology 2013.
Resumo:
This thesis provides a set of tools for managing uncertainty in Web-based models and workflows.To support the use of these tools, this thesis firstly provides a framework for exposing models through Web services. An introduction to uncertainty management, Web service interfaces,and workflow standards and technologies is given, with a particular focus on the geospatial domain.An existing specification for exposing geospatial models and processes, theWeb Processing Service (WPS), is critically reviewed. A processing service framework is presented as a solutionto usability issues with the WPS standard. The framework implements support for Simple ObjectAccess Protocol (SOAP), Web Service Description Language (WSDL) and JavaScript Object Notation (JSON), allowing models to be consumed by a variety of tools and software. Strategies for communicating with models from Web service interfaces are discussed, demonstrating the difficultly of exposing existing models on the Web. This thesis then reviews existing mechanisms for uncertainty management, with an emphasis on emulator methods for building efficient statistical surrogate models. A tool is developed to solve accessibility issues with such methods, by providing a Web-based user interface and backend to ease the process of building and integrating emulators. These tools, plus the processing service framework, are applied to a real case study as part of the UncertWeb project. The usability of the framework is proved with the implementation of aWeb-based workflow for predicting future crop yields in the UK, also demonstrating the abilities of the tools for emulator building and integration. Future directions for the development of the tools are discussed.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
This paper presents a causal explanation of formative variables that unpacks and clarifies the generally accepted idea that formative indicators are ‘causes’ of the focal formative variable. In doing this, we explore the recent paper by Diamantopoulos and Temme (AMS Review, 3(3), 160-171, 2013) and show that the latter misunderstand the stance of Lee, Cadogan, and Chamberlain (AMS Review, 3(1), 3-17, 2013; see also Cadogan, Lee, and Chamberlain, AMS Review, 3(1), 38-49, 2013). By drawing on the multiple ways that one can interpret the idea of causality within the MIMIC model, we then demonstrate how the continued defense of the MIMIC model as a tool to validate formative indicators and to identify formative variables in structural models is misguided. We also present unambiguous recommendations on how formative variables can be modelled in lieu of the formative MIMIC model.
Resumo:
In this letter, we propose an analytical approach to model uplink intercell interference (ICI) in hexagonal grid based orthogonal frequency division multiple access (OFMDA) cellular networks. The key idea is that the uplink ICI from individual cells is approximated with a lognormal distribution with statistical parameters being determined analytically. Accordingly, the aggregated uplink ICI is approximated with another lognormal distribution and its statistical parameters can be determined from those of individual cells using Fenton-Wilkson method. Analytic expressions of uplink ICI are derived with two traditional frequency reuse schemes, namely integer frequency reuse schemes with factor 1 (IFR-1) and factor 3 (IFR-3). Uplink fractional power control and lognormal shadowing are modeled. System performances in terms of signal to interference plus noise ratio (SINR) and spectrum efficiency are also derived. The proposed model has been validated by simulations. © 2013 IEEE.