985 resultados para Textual complexity for Romanian language
Resumo:
Ponent: Erica Burman / Presenta: Jordi Bonet
Resumo:
Ponent: Erica Burman / Presenta: Jordi Bonet
Resumo:
Ponent: Erica Burman / Presenta: Jordi Bonet
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.
Resumo:
The span of writer identification extends to broad domes like digital rights administration, forensic expert decisionmaking systems, and document analysis systems and so on. As the success rate of a writer identification scheme is highly dependent on the features extracted from the documents, the phase of feature extraction and therefore selection is highly significant for writer identification schemes. In this paper, the writer identification in Malayalam language is sought for by utilizing feature extraction technique such as Scale Invariant Features Transform (SIFT).The schemes are tested on a test bed of 280 writers and performance evaluated
Resumo:
This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices.Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS systemwas implemented inMATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).
Resumo:
Restarting automata are a restricted model of computation that was introduced by Jancar et.al. to model the so-called analysis by reduction. A computation of a restarting automaton consists of a sequence of cycles such that in each cycle the automaton performs exactly one rewrite step, which replaces a small part of the tape content by another, even shorter word. Thus, each language accepted by a restarting automaton belongs to the complexity class $CSL cap NP$. Here we consider a natural generalization of this model, called shrinking restarting automaton, where we do no longer insist on the requirement that each rewrite step decreases the length of the tape content. Instead we require that there exists a weight function such that each rewrite step decreases the weight of the tape content with respect to that function. The language accepted by such an automaton still belongs to the complexity class $CSL cap NP$. While it is still unknown whether the two most general types of one-way restarting automata, the RWW-automaton and the RRWW-automaton, differ in their expressive power, we will see that the classes of languages accepted by the shrinking RWW-automaton and the shrinking RRWW-automaton coincide. As a consequence of our proof, it turns out that there exists a reduction by morphisms from the language class $cL(RRWW)$ to the class $cL(RWW)$. Further, we will see that the shrinking restarting automaton is a rather robust model of computation. Finally, we will relate shrinking RRWW-automata to finite-change automata. This will lead to some new insights into the relationships between the classes of languages characterized by (shrinking) restarting automata and some well-known time and space complexity classes.
Resumo:
Analysis by reduction is a method used in linguistics for checking the correctness of sentences of natural languages. This method is modelled by restarting automata. Here we study a new type of restarting automaton, the so-called t-sRL-automaton, which is an RL-automaton that is rather restricted in that it has a window of size 1 only, and that it works under a minimal acceptance condition. On the other hand, it is allowed to perform up to t rewrite (that is, delete) steps per cycle. We focus on the descriptional complexity of these automata, establishing two complexity measures that are both based on the description of t-sRL-automata in terms of so-called meta-instructions. We present some hierarchy results as well as a non-recursive trade-off between deterministic 2-sRL-automata and finite-state acceptors.
Resumo:
This paper contributes to the study of Freely Rewriting Restarting Automata (FRR-automata) and Parallel Communicating Grammar Systems (PCGS), which both are useful models in computational linguistics. For PCGSs we study two complexity measures called 'generation complexity' and 'distribution complexity', and we prove that a PCGS Pi, for which the generation complexity and the distribution complexity are both bounded by constants, can be transformed into a freely rewriting restarting automaton of a very restricted form. From this characterization it follows that the language L(Pi) generated by Pi is semi-linear, that its characteristic analysis is of polynomial size, and that this analysis can be computed in polynomial time.
Resumo:
Software Defined Radio (SDR) hardware platforms use parallel architectures. Current concepts of developing applications (such as WLAN) for these platforms are complex, because developers describe an application with hardware-specifics that are relevant to parallelism such as mapping and scheduling. To reduce this complexity, we have developed a new programming approach for SDR applications, called Virtual Radio Engine (VRE). VRE defines a language for describing applications, and a tool chain that consists of a compiler kernel and other tools (such as a code generator) to generate executables. The thesis presents this concept, as well as describes the language and the compiler kernel that have been developed by the author. The language is hardware-independent, i.e., developers describe tasks and dependencies between them. The compiler kernel performs automatic parallelization, i.e., it is capable of transforming a hardware-independent program into a hardware-specific program by solving hardware-specifics, in particular mapping, scheduling and synchronizations. Thus, VRE simplifies programming tasks as developers do not solve hardware-specifics manually.
Resumo:
Cooperative behaviour of agents within highly dynamic and nondeterministic domains is an active field of research. In particular establishing highly responsive teamwork, where agents are able to react on dynamic changes in the environment while facing unreliable communication and sensory noise, is an open problem. Moreover, modelling such responsive, cooperative behaviour is difficult. In this work, we specify a novel model for cooperative behaviour geared towards highly dynamic domains. In our approach, agents estimate each other’s decision and correct these estimations once they receive contradictory information. We aim at a comprehensive approach for agent teamwork featuring intuitive modelling capabilities for multi-agent activities, abstractions over activities and agents, and a clear operational semantic for the new model. This work encompasses a complete specification of the new language, ALICA.
Resumo:
This paper introduces a framework that supports users to implement enterprise modelling within collaborative companies. These enterprise models are the basis for a holistic interoperability measurement and management methodology which will be presented in the second part of the paper. The discipline of enterprise modelling aims at capturing all dimensions of an enterprise in a simplified model. Thus enterprise models are the appropriate basis for managing collaborative enterprise as they reduce the complexity of interoperability problems. Therefore, a first objective of this paper is to present an approach that enables companies to get the most effect out of enterprise modelling in a collaborative environment based on the maturity of their organisation relative to modelling. Within this first step, the user will get recommendations e.g. for the correct modelling language as well as the right level of detail.
Resumo:
Summary: Recent research on the evolution of language and verbal displays (e.g., Miller, 1999, 2000a, 2000b, 2002) indicated that language is not only the result of natural selection but serves as a sexually-selected fitness indicator that is an adaptation showing an individual’s suitability as a reproductive mate. Thus, language could be placed within the framework of concepts such as the handicap principle (Zahavi, 1975). There are several reasons for this position: Many linguistic traits are highly heritable (Stromswold, 2001, 2005), while naturally-selected traits are only marginally heritable (Miller, 2000a); men are more prone to verbal displays than women, who in turn judge the displays (Dunbar, 1996; Locke & Bogin, 2006; Lange, in press; Miller, 2000a; Rosenberg & Tunney, 2008); verbal proficiency universally raises especially male status (Brown, 1991); many linguistic features are handicaps (Miller, 2000a) in the Zahavian sense; most literature is produced by men at reproduction-relevant age (Miller, 1999). However, neither an experimental study investigating the causal relation between verbal proficiency and attractiveness, nor a study showing a correlation between markers of literary and mating success existed. In the current studies, it was aimed to fill these gaps. In the first one, I conducted a laboratory experiment. Videos in which an actor and an actress performed verbal self-presentations were the stimuli for counter-sex participants. Content was always alike, but the videos differed on three levels of verbal proficiency. Predictions were, among others, that (1) verbal proficiency increases mate value, but that (2) this applies more to male than to female mate value due to assumed past sex-different selection pressures causing women to be very demanding in mate choice (Trivers, 1972). After running a two-factorial analysis of variance with the variables sex and verbal proficiency as factors, the first hypothesis was supported with high effect size. For the second hypothesis, there was only a trend going in the predicted direction. Furthermore, it became evident that verbal proficiency affects long-term more than short-term mate value. In the second study, verbal proficiency as a menstrual cycle-dependent mate choice criterion was investigated. Basically the same materials as in the former study were used with only marginal changes in the used questionnaire. The hypothesis was that fertile women rate high verbal proficiency in men higher than non-fertile women because of verbal proficiency being a potential indicator of “good genes”. However, no significant result could be obtained in support of the hypothesis in the current study. In the third study, the hypotheses were: (1) most literature is produced by men at reproduction-relevant age. (2) The more works of high literary quality a male writer produces, the more mates and children he has. (3) Lyricists have higher mating success than non-lyric writers because of poetic language being a larger handicap than other forms of language. (4) Writing literature increases a man’s status insofar that his offspring shows a significantly higher male-to-female sex ratio than in the general population, as the Trivers-Willard hypothesis (Trivers & Willard, 1973) applied to literature predicts. In order to test these hypotheses, two famous literary canons were chosen. Extensive biographical research was conducted on the writers’ mating successes. The first hypothesis was confirmed; the second one, controlling for life age, only for number of mates but not entirely regarding number of children. The latter finding was discussed with respect to, among others, the availability of effective contraception especially in the 20th century. The third hypothesis was not satisfactorily supported. The fourth hypothesis was partially supported. For the 20th century part of the German list, the secondary sex ratio differed with high statistical significance from the ratio assumed to be valid for a general population.