971 resultados para Language Model


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current models of word production assume that words are stored as linear sequences of phonemes which are structured into syllables only at the moment of production. This is because syllable structure is always recoverable from the sequence of phonemes. In contrast, we present theoretical and empirical evidence that syllable structure is lexically represented. Storing syllable structure would have the advantage of making representations more stable and resistant to damage. On the other hand, re-syllabifications affect only a minimal part of phonological representations and occur only in some languages and depending on speech register. Evidence for these claims comes from analyses of aphasic errors which not only respect phonotactic constraints, but also avoid transformations which move the syllabic structure of the word further away from the original structure, even when equating for segmental complexity. This is true across tasks, types of errors, and, crucially, types of patients. The same syllabic effects are shown by apraxic patients and by phonological patients who have more central difficulties in retrieving phonological representations. If syllable structure was only computed after phoneme retrieval, it would have no way to influence the errors of phonological patients. Our results have implications for psycholinguistic and computational models of language as well as for clinical and educational practices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis provides a set of tools for managing uncertainty in Web-based models and workflows.To support the use of these tools, this thesis firstly provides a framework for exposing models through Web services. An introduction to uncertainty management, Web service interfaces,and workflow standards and technologies is given, with a particular focus on the geospatial domain.An existing specification for exposing geospatial models and processes, theWeb Processing Service (WPS), is critically reviewed. A processing service framework is presented as a solutionto usability issues with the WPS standard. The framework implements support for Simple ObjectAccess Protocol (SOAP), Web Service Description Language (WSDL) and JavaScript Object Notation (JSON), allowing models to be consumed by a variety of tools and software. Strategies for communicating with models from Web service interfaces are discussed, demonstrating the difficultly of exposing existing models on the Web. This thesis then reviews existing mechanisms for uncertainty management, with an emphasis on emulator methods for building efficient statistical surrogate models. A tool is developed to solve accessibility issues with such methods, by providing a Web-based user interface and backend to ease the process of building and integrating emulators. These tools, plus the processing service framework, are applied to a real case study as part of the UncertWeb project. The usability of the framework is proved with the implementation of aWeb-based workflow for predicting future crop yields in the UK, also demonstrating the abilities of the tools for emulator building and integration. Future directions for the development of the tools are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared toa few competitive baselines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the process of load balancing in simulation system Triad.Net, the architecture of load balancing subsystem. The main features of static and dynamic load balancing are discussed and new approach, controlled dynamic load balancing, needed for regular mapping of simulation model on the network of computers is proposed. The paper considers linguistic constructions of Triad language for different load balancing algorithms description.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper has been presented at the International Conference Pioneers of Bulgarian Mathematics, Dedicated to Nikola Obreshko ff and Lubomir Tschakaloff , Sofi a, July, 2006.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A model of the cognitive process of natural language processing has been developed using the formalism of generalized nets. Following this stage-simulating model, the treatment of information inevitably includes phases, which require joint operations in two knowledge spaces – language and semantics. In order to examine and formalize the relations between the language and the semantic levels of treatment, the language is presented as an information system, conceived on the bases of human cognitive resources, semantic primitives, semantic operators and language rules and data. This approach is applied for modeling a specific grammatical rule – the secondary predication in Russian. Grammatical rules of the language space are expressed as operators in the semantic space. Examples from the linguistics domain are treated and several conclusions for the semantics of the modeled rule are made. The results of applying the information system approach to the language turn up to be consistent with the stages of treatment modeled with the generalized net.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is proposed to use one common model of computer for teaching different parts of the informatics course, connected with both hardware and software subjects. Reasoning of such slant is presented; the most suitable themes of the course, where it is practical, are enumerated. The own author's development (including software support) – the educational model of virtual computer "E97" and compiler from Pascal language for it – are described. It is accented, that the discussed ideas are helpful for any other similar model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clinical decision support systems (CDSSs) often base their knowledge and advice on human expertise. Knowledge representation needs to be in a format that can be easily understood by human users as well as supporting ongoing knowledge engineering, including evolution and consistency of knowledge. This paper reports on the development of an ontology specification for managing knowledge engineering in a CDSS for assessing and managing risks associated with mental-health problems. The Galatean Risk and Safety Tool, GRiST, represents mental-health expertise in the form of a psychological model of classification. The hierarchical structure was directly represented in the machine using an XML document. Functionality of the model and knowledge management were controlled using attributes in the XML nodes, with an accompanying paper manual for specifying how end-user tools should behave when interfacing with the XML. This paper explains the advantages of using the web-ontology language, OWL, as the specification, details some of the issues and problems encountered in translating the psychological model to OWL, and shows how OWL benefits knowledge engineering. The conclusions are that OWL can have an important role in managing complex knowledge domains for systems based on human expertise without impeding the end-users' understanding of the knowledge base. The generic classification model underpinning GRiST makes it applicable to many decision domains and the accompanying OWL specification facilitates its implementation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper aims to represent a bilingual online dictionary as a useful tool helping preservation of the natural languages. The author focuses on the approach that was taken to develop compatible bilingual lexical database for the Bulgarian-Polish online dictionary. A formal model for the dictionary encoding is developed in accordance with the complex structures of the dictionary entries. These structures vary depending on the grammatical characteristics of Bulgarian headwords. The Web-application for presentation of the bilingual dictionary is also describred.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 91E45.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015