877 resultados para natural language
Resumo:
Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.
Resumo:
In many real world prediction problems the output is a structured object like a sequence or a tree or a graph. Such problems range from natural language processing to compu- tational biology or computer vision and have been tackled using algorithms, referred to as structured output learning algorithms. We consider the problem of structured classifi- cation. In the last few years, large margin classifiers like sup-port vector machines (SVMs) have shown much promise for structured output learning. The related optimization prob -lem is a convex quadratic program (QP) with a large num-ber of constraints, which makes the problem intractable for large data sets. This paper proposes a fast sequential dual method (SDM) for structural SVMs. The method makes re-peated passes over the training set and optimizes the dual variables associated with one example at a time. The use of additional heuristics makes the proposed method more efficient. We present an extensive empirical evaluation of the proposed method on several sequence learning problems.Our experiments on large data sets demonstrate that the proposed method is an order of magnitude faster than state of the art methods like cutting-plane method and stochastic gradient descent method (SGD). Further, SDM reaches steady state generalization performance faster than the SGD method. The proposed SDM is thus a useful alternative for large scale structured output learning.
Resumo:
In structured output learning, obtaining labeled data for real-world applications is usually costly, while unlabeled examples are available in abundance. Semisupervised structured classification deals with a small number of labeled examples and a large number of unlabeled structured data. In this work, we consider semisupervised structural support vector machines with domain constraints. The optimization problem, which in general is not convex, contains the loss terms associated with the labeled and unlabeled examples, along with the domain constraints. We propose a simple optimization approach that alternates between solving a supervised learning problem and a constraint matching problem. Solving the constraint matching problem is difficult for structured prediction, and we propose an efficient and effective label switching method to solve it. The alternating optimization is carried out within a deterministic annealing framework, which helps in effective constraint matching and avoiding poor local minima, which are not very useful. The algorithm is simple and easy to implement. Further, it is suitable for any structured output learning problem where exact inference is available. Experiments on benchmark sequence labeling data sets and a natural language parsing data set show that the proposed approach, though simple, achieves comparable generalization performance.
Resumo:
[EN]This paper deals with the so-called Person Case Constraint (Bonet, 1991), a universal constraint blocking accusative clitics and object agreement morphemes other than third person when a dative is inserted in the same clitic/agreement cluster. The aim of this paper is twofold. First, we argue that the scope of the PCC is considerably broader than assumed in previous work, and that neither its formulation in terms of person (1st/2nd vs. 3rd)-case (accusative vs. dative) restrictions nor its morphological nature are part of the right descriptive generalization.We present evidence (i) that the PCC is triggered by the presence of an animacy feature in the object’s agreement set; (ii) that it is not case dependent, also showing up in languages that lack dative case; and (iii) that it is not morphologically bound. Second, we argue that the PCC, even if it is modified accordingly, still puts together two different properties of the agreement system that should be set apart: (i) a cross linguistic sensitivity of object agreement to animacy and (ii) a similarly widespread restriction on multiple object agreement observed crosslinguistically. These properties lead us to propose a new generalization, the Object Agreement Constraint (OAC): if the verbal complex encodes object agreement, no other argument can be licensed through verbal agreement.
Resumo:
Máster y Doctorado en Sistemas Informáticos Avanzados, Informatika Fakultatea - Facultad de Informática
Resumo:
[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.
Resumo:
[ES] En este trabajo se define el cambio sintáctico, se analizan los factores que lo causan o facilitan y se estudian sus tipos principales en griego antiguo.
Resumo:
The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.