1000 resultados para Sayula language
Resumo:
This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
Malayalam is one of the 22 scheduled languages in India with more than 130 million speakers. This paper presents a report on the development of a speaker independent, continuous transcription system for Malayalam. The system employs Hidden Markov Model (HMM) for acoustic modeling and Mel Frequency Cepstral Coefficient (MFCC) for feature extraction. It is trained with 21 male and female speakers in the age group ranging from 20 to 40 years. The system obtained a word recognition accuracy of 87.4% and a sentence recognition accuracy of 84%, when tested with a set of continuous speech data.
Resumo:
A connected digit speech recognition is important in many applications such as automated banking system, catalogue-dialing, automatic data entry, automated banking system, etc. This paper presents an optimum speaker-independent connected digit recognizer forMalayalam language. The system employs Perceptual Linear Predictive (PLP) cepstral coefficient for speech parameterization and continuous density Hidden Markov Model (HMM) in the recognition process. Viterbi algorithm is used for decoding. The training data base has the utterance of 21 speakers from the age group of 20 to 40 years and the sound is recorded in the normal office environment where each speaker is asked to read 20 set of continuous digits. The system obtained an accuracy of 99.5 % with the unseen data.
Resumo:
The span of writer identification extends to broad domes like digital rights administration, forensic expert decisionmaking systems, and document analysis systems and so on. As the success rate of a writer identification scheme is highly dependent on the features extracted from the documents, the phase of feature extraction and therefore selection is highly significant for writer identification schemes. In this paper, the writer identification in Malayalam language is sought for by utilizing feature extraction technique such as Scale Invariant Features Transform (SIFT).The schemes are tested on a test bed of 280 writers and performance evaluated
Resumo:
Cooperative behaviour of agents within highly dynamic and nondeterministic domains is an active field of research. In particular establishing highly responsive teamwork, where agents are able to react on dynamic changes in the environment while facing unreliable communication and sensory noise, is an open problem. Moreover, modelling such responsive, cooperative behaviour is difficult. In this work, we specify a novel model for cooperative behaviour geared towards highly dynamic domains. In our approach, agents estimate each other’s decision and correct these estimations once they receive contradictory information. We aim at a comprehensive approach for agent teamwork featuring intuitive modelling capabilities for multi-agent activities, abstractions over activities and agents, and a clear operational semantic for the new model. This work encompasses a complete specification of the new language, ALICA.
Resumo:
Summary: Recent research on the evolution of language and verbal displays (e.g., Miller, 1999, 2000a, 2000b, 2002) indicated that language is not only the result of natural selection but serves as a sexually-selected fitness indicator that is an adaptation showing an individual’s suitability as a reproductive mate. Thus, language could be placed within the framework of concepts such as the handicap principle (Zahavi, 1975). There are several reasons for this position: Many linguistic traits are highly heritable (Stromswold, 2001, 2005), while naturally-selected traits are only marginally heritable (Miller, 2000a); men are more prone to verbal displays than women, who in turn judge the displays (Dunbar, 1996; Locke & Bogin, 2006; Lange, in press; Miller, 2000a; Rosenberg & Tunney, 2008); verbal proficiency universally raises especially male status (Brown, 1991); many linguistic features are handicaps (Miller, 2000a) in the Zahavian sense; most literature is produced by men at reproduction-relevant age (Miller, 1999). However, neither an experimental study investigating the causal relation between verbal proficiency and attractiveness, nor a study showing a correlation between markers of literary and mating success existed. In the current studies, it was aimed to fill these gaps. In the first one, I conducted a laboratory experiment. Videos in which an actor and an actress performed verbal self-presentations were the stimuli for counter-sex participants. Content was always alike, but the videos differed on three levels of verbal proficiency. Predictions were, among others, that (1) verbal proficiency increases mate value, but that (2) this applies more to male than to female mate value due to assumed past sex-different selection pressures causing women to be very demanding in mate choice (Trivers, 1972). After running a two-factorial analysis of variance with the variables sex and verbal proficiency as factors, the first hypothesis was supported with high effect size. For the second hypothesis, there was only a trend going in the predicted direction. Furthermore, it became evident that verbal proficiency affects long-term more than short-term mate value. In the second study, verbal proficiency as a menstrual cycle-dependent mate choice criterion was investigated. Basically the same materials as in the former study were used with only marginal changes in the used questionnaire. The hypothesis was that fertile women rate high verbal proficiency in men higher than non-fertile women because of verbal proficiency being a potential indicator of “good genes”. However, no significant result could be obtained in support of the hypothesis in the current study. In the third study, the hypotheses were: (1) most literature is produced by men at reproduction-relevant age. (2) The more works of high literary quality a male writer produces, the more mates and children he has. (3) Lyricists have higher mating success than non-lyric writers because of poetic language being a larger handicap than other forms of language. (4) Writing literature increases a man’s status insofar that his offspring shows a significantly higher male-to-female sex ratio than in the general population, as the Trivers-Willard hypothesis (Trivers & Willard, 1973) applied to literature predicts. In order to test these hypotheses, two famous literary canons were chosen. Extensive biographical research was conducted on the writers’ mating successes. The first hypothesis was confirmed; the second one, controlling for life age, only for number of mates but not entirely regarding number of children. The latter finding was discussed with respect to, among others, the availability of effective contraception especially in the 20th century. The third hypothesis was not satisfactorily supported. The fourth hypothesis was partially supported. For the 20th century part of the German list, the secondary sex ratio differed with high statistical significance from the ratio assumed to be valid for a general population.
Resumo:
In der psycholinguistischen Forschung ist die Annahme weitverbreitet, dass die Bewertung von Informationen hinsichtlich ihres Wahrheitsgehaltes oder ihrer Plausibilität (epistemische Validierung; Richter, Schroeder & Wöhrmann, 2009) ein strategischer, optionaler und dem Verstehen nachgeschalteter Prozess ist (z.B. Gilbert, 1991; Gilbert, Krull & Malone, 1990; Gilbert, Tafarodi & Malone, 1993; Herbert & Kübler, 2011). Eine zunehmende Anzahl an Studien stellt dieses Zwei-Stufen-Modell von Verstehen und Validieren jedoch direkt oder indirekt in Frage. Insbesondere Befunde zu Stroop-artigen Stimulus-Antwort-Kompatibilitätseffekten, die auftreten, wenn positive und negative Antworten orthogonal zum aufgaben-irrelevanten Wahrheitsgehalt von Sätzen abgegeben werden müssen (z.B. eine positive Antwort nach dem Lesen eines falschen Satzes oder eine negative Antwort nach dem Lesen eines wahren Satzes; epistemischer Stroop-Effekt, Richter et al., 2009), sprechen dafür, dass Leser/innen schon beim Verstehen eine nicht-strategische Überprüfung der Validität von Informationen vornehmen. Ausgehend von diesen Befunden war das Ziel dieser Dissertation eine weiterführende Überprüfung der Annahme, dass Verstehen einen nicht-strategischen, routinisierten, wissensbasierten Validierungsprozesses (epistemisches Monitoring; Richter et al., 2009) beinhaltet. Zu diesem Zweck wurden drei empirische Studien mit unterschiedlichen Schwerpunkten durchgeführt. Studie 1 diente der Untersuchung der Fragestellung, ob sich Belege für epistemisches Monitoring auch bei Informationen finden lassen, die nicht eindeutig wahr oder falsch, sondern lediglich mehr oder weniger plausibel sind. Mithilfe des epistemischen Stroop-Paradigmas von Richter et al. (2009) konnte ein Kompatibilitätseffekt von aufgaben-irrelevanter Plausibilität auf die Latenzen positiver und negativer Antworten in zwei unterschiedlichen experimentellen Aufgaben nachgewiesen werden, welcher dafür spricht, dass epistemisches Monitoring auch graduelle Unterschiede in der Übereinstimmung von Informationen mit dem Weltwissen berücksichtigt. Darüber hinaus belegen die Ergebnisse, dass der epistemische Stroop-Effekt tatsächlich auf Plausibilität und nicht etwa auf der unterschiedlichen Vorhersagbarkeit von plausiblen und unplausiblen Informationen beruht. Das Ziel von Studie 2 war die Prüfung der Hypothese, dass epistemisches Monitoring keinen evaluativen Mindset erfordert. Im Gegensatz zu den Befunden anderer Autoren (Wiswede, Koranyi, Müller, Langner, & Rothermund, 2013) zeigte sich in dieser Studie ein Kompatibilitätseffekt des aufgaben-irrelevanten Wahrheitsgehaltes auf die Antwortlatenzen in einer vollständig nicht-evaluativen Aufgabe. Die Ergebnisse legen nahe, dass epistemisches Monitoring nicht von einem evaluativen Mindset, möglicherweise aber von der Tiefe der Verarbeitung abhängig ist. Studie 3 beleuchtete das Verhältnis von Verstehen und Validieren anhand einer Untersuchung der Online-Effekte von Plausibilität und Vorhersagbarkeit auf Augenbewegungen beim Lesen kurzer Texte. Zusätzlich wurde die potentielle Modulierung dieser Effeke durch epistemische Marker, die die Sicherheit von Informationen anzeigen (z.B. sicherlich oder vielleicht), untersucht. Entsprechend der Annahme eines schnellen und nicht-strategischen epistemischen Monitoring-Prozesses zeigten sich interaktive Effekte von Plausibilität und dem Vorhandensein epistemischer Marker auf Indikatoren früher Verstehensprozesse. Dies spricht dafür, dass die kommunizierte Sicherheit von Informationen durch den Monitoring-Prozess berücksichtigt wird. Insgesamt sprechen die Befunde gegen eine Konzeptualisierung von Verstehen und Validieren als nicht-überlappenden Stufen der Informationsverarbeitung. Vielmehr scheint eine Bewertung des Wahrheitsgehalts oder der Plausibilität basierend auf dem Weltwissen – zumindest in gewissem Ausmaß – eine obligatorische und nicht-strategische Komponente des Sprachverstehens zu sein. Die Bedeutung der Befunde für aktuelle Modelle des Sprachverstehens und Empfehlungen für die weiterführende Forschung zum Vehältnis von Verstehen und Validieren werden aufgezeigt.
Resumo:
Computational models are arising is which programs are constructed by specifying large networks of very simple computational devices. Although such models can potentially make use of a massive amount of concurrency, their usefulness as a programming model for the design of complex systems will ultimately be decided by the ease in which such networks can be programmed (constructed). This thesis outlines a language for specifying computational networks. The language (AFL-1) consists of a set of primitives, ad a mechanism to group these elements into higher level structures. An implementation of this language runs on the Thinking Machines Corporation, Connection machine. Two significant examples were programmed in the language, an expert system (CIS), and a planning system (AFPLAN). These systems are explained and analyzed in terms of how they compare with similar systems written in conventional languages.
Resumo:
Free-word order languages have long posed significant problems for standard parsing algorithms. This thesis presents an implemented parser, based on Government-Binding (GB) theory, for a particular free-word order language, Warlpiri, an aboriginal language of central Australia. The words in a sentence of a free-word order language may swap about relatively freely with little effect on meaning: the permutations of a sentence mean essentially the same thing. It is assumed that this similarity in meaning is directly reflected in the syntax. The parser presented here properly processes free word order because it assigns the same syntactic structure to the permutations of a single sentence. The parser also handles fixed word order, as well as other phenomena. On the view presented here, there is no such thing as a "configurational" or "non-configurational" language. Rather, there is a spectrum of languages that are more or less ordered. The operation of this parsing system is quite different in character from that of more traditional rule-based parsing systems, e.g., context-free parsers. In this system, parsing is carried out via the construction of two different structures, one encoding precedence information and one encoding hierarchical information. This bipartite representation is the key to handling both free- and fixed-order phenomena. This thesis first presents an overview of the portion of Warlpiri that can be parsed. Following this is a description of the linguistic theory on which the parser is based. The chapter after that describes the representations and algorithms of the parser. In conclusion, the parser is compared to related work. The appendix contains a substantial list of test cases ??th grammatical and ungrammatical ??at the parser has actually processed.
Resumo:
Fine-grained parallel machines have the potential for very high speed computation. To program massively-concurrent MIMD machines, programmers need tools for managing complexity. These tools should not restrict program concurrency. Concurrent Aggregates (CA) provides multiple-access data abstraction tools, Aggregates, which can be used to implement abstractions with virtually unlimited potential for concurrency. Such tools allow programmers to modularize programs without reducing concurrency. I describe the design, motivation, implementation and evaluation of Concurrent Aggregates. CA has been used to construct a number of application programs. Multi-access data abstractions are found to be useful in constructing highly concurrent programs.
Resumo:
The central thesis of this report is that human language is NP-complete. That is, the process of comprehending and producing utterances is bounded above by the class NP, and below by NP-hardness. This constructive complexity thesis has two empirical consequences. The first is to predict that a linguistic theory outside NP is unnaturally powerful. The second is to predict that a linguistic theory easier than NP-hard is descriptively inadequate. To prove the lower bound, I show that the following three subproblems of language comprehension are all NP-hard: decide whether a given sound is possible sound of a given language; disambiguate a sequence of words; and compute the antecedents of pronouns. The proofs are based directly on the empirical facts of the language user's knowledge, under an appropriate idealization. Therefore, they are invariant across linguistic theories. (For this reason, no knowledge of linguistic theory is needed to understand the proofs, only knowledge of English.) To illustrate the usefulness of the upper bound, I show that two widely-accepted analyses of the language user's knowledge (of syntactic ellipsis and phonological dependencies) lead to complexity outside of NP (PSPACE-hard and Undecidable, respectively). Next, guided by the complexity proofs, I construct alternate linguisitic analyses that are strictly superior on descriptive grounds, as well as being less complex computationally (in NP). The report also presents a new framework for linguistic theorizing, that resolves important puzzles in generative linguistics, and guides the mathematical investigation of human language.