4 resultados para Linguistic Knowledge Base
em DRUM (Digital Repository at the University of Maryland)
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
Humans use their grammatical knowledge in more than one way. On one hand, they use it to understand what others say. On the other hand, they use it to say what they want to convey to others (or to themselves). In either case, they need to assemble the structure of sentences in a systematic fashion, in accordance with the grammar of their language. Despite the fact that the structures that comprehenders and speakers assemble are systematic in an identical fashion (i.e., obey the same grammatical constraints), the two ‘modes’ of assembling sentence structures might or might not be performed by the same cognitive mechanisms. Currently, the field of psycholinguistics implicitly adopts the position that they are supported by different cognitive mechanisms, as evident from the fact that most psycholinguistic models seek to explain either comprehension or production phenomena. The potential existence of two independent cognitive systems underlying linguistic performance doubles the problem of linking the theory of linguistic knowledge and the theory of linguistic performance, making the integration of linguistics and psycholinguistic harder. This thesis thus aims to unify the structure building system in comprehension, i.e., parser, and the structure building system in production, i.e., generator, into one, so that the linking theory between knowledge and performance can also be unified into one. I will discuss and unify both existing and new data pertaining to how structures are assembled in understanding and speaking, and attempt to show that the unification between parsing and generation is at least a plausible research enterprise. In Chapter 1, I will discuss the previous and current views on how parsing and generation are related to each other. I will outline the challenges for the current view that the parser and the generator are the same cognitive mechanism. This single system view is discussed and evaluated in the rest of the chapters. In Chapter 2, I will present new experimental evidence suggesting that the grain size of the pre-compiled structural units (henceforth simply structural units) is rather small, contrary to some models of sentence production. In particular, I will show that the internal structure of the verb phrase in a ditransitive sentence (e.g., The chef is donating the book to the monk) is not specified at the onset of speech, but is specified before the first internal argument (the book) needs to be uttered. I will also show that this timing of structural processes with respect to the verb phrase structure is earlier than the lexical processes of verb internal arguments. These two results in concert show that the size of structure building units in sentence production is rather small, contrary to some models of sentence production, yet structural processes still precede lexical processes. I argue that this view of generation resembles the widely accepted model of parsing that utilizes both top-down and bottom-up structure building procedures. In Chapter 3, I will present new experimental evidence suggesting that the structural representation strongly constrains the subsequent lexical processes. In particular, I will show that conceptually similar lexical items interfere with each other only when they share the same syntactic category in sentence production. The mechanism that I call syntactic gating, will be proposed, and this mechanism characterizes how the structural and lexical processes interact in generation. I will present two Event Related Potential (ERP) experiments that show that the lexical retrieval in (predictive) comprehension is also constrained by syntactic categories. I will argue that the syntactic gating mechanism is operative both in parsing and generation, and that the interaction between structural and lexical processes in both parsing and generation can be characterized in the same fashion. In Chapter 4, I will present a series of experiments examining the timing at which verbs’ lexical representations are planned in sentence production. It will be shown that verbs are planned before the articulation of their internal arguments, regardless of the target language (Japanese or English) and regardless of the sentence type (active object-initial sentence in Japanese, passive sentences in English, and unaccusative sentences in English). I will discuss how this result sheds light on the notion of incrementality in generation. In Chapter 5, I will synthesize the experimental findings presented in this thesis and in previous research to address the challenges to the single system view I outlined in Chapter 1. I will then conclude by presenting a preliminary single system model that can potentially capture both the key sentence comprehension and sentence production data without assuming distinct mechanisms for each.
Resumo:
Spelling is an important literacy skill, and learning to spell is an important component of learning to write. Learners with strong spelling skills also exhibit greater reading, vocabulary, and orthographic knowledge than those with poor spelling skills (Ehri & Rosenthal, 2007; Ehri & Wilce, 1987; Rankin, Bruning, Timme, & Katkanant, 1993). English, being a deep orthography, has inconsistent sound-to-letter correspondences (Seymour, 2005; Ziegler & Goswami, 2005). This poses a great challenge for learners in gaining spelling fluency and accuracy. The purpose of the present study is to examine cross-linguistic transfer of English vowel spellings in Spanish-speaking adult ESL learners. The research participants were 129 Spanish-speaking adult ESL learners and 104 native English-speaking GED students enrolled in a community college located in the South Atlantic region of the United States. The adult ESL participants were in classes at three different levels of English proficiency: advanced, intermediate, and beginning. An experimental English spelling test was administered to both the native English-speaking and ESL participants. In addition, the adult ESL participants took the standardized spelling tests to rank their spelling skills in both English and Spanish. The data were analyzed using robust regression and Poisson regression procedures, Mann-Whitney test, and descriptive statistics. The study found that both Spanish spelling skills and English proficiency are strong predictors of English spelling skills. Spanish spelling is also a strong predictor of level of L1-influenced transfer. More proficient Spanish spellers made significantly fewer L1-influenced spelling errors than less proficient Spanish spellers. L1-influenced transfer of spelling knowledge from Spanish to English likely occurred in three vowel targets (/ɑɪ/ spelled as ae, ai, or ay, /ɑʊ/ spelled as au, and /eɪ/ spelled as e). The ESL participants and the native English-speaking participants produced highly similar error patterns of English vowel spellings when the errors did not indicate L1-influenced transfer, which implies that the two groups might follow similar trajectories of developing English spelling skills. The findings may help guide future researchers or practitioners to modify and develop instructional spelling intervention to meet the needs of adult ESL learners and help them gain English spelling competence.
Resumo:
The main purpose of the current study was to examine the role of vocabulary knowledge (VK) and syntactic knowledge (SK) in L2 listening comprehension, as well as their relative significance. Unlike previous studies, the current project employed assessment tasks to measure aural and proceduralized VK and SK. In terms of VK, to avoid under-representing the construct, measures of both breadth (VB) and depth (VD) were included. Additionally, the current study examined the role of VK and SK by accounting for individual differences in two important cognitive factors in L2 listening: metacognitive knowledge (MK) and working memory (WM). Also, to explore the role of VK and SK more fully, the current study accounted for the negative impact of anxiety on WM and L2 listening. The study was carried out in an English as a Foreign Language (EFL) context, and participants were 263 Iranian learners at a wide range of English proficiency from lower-intermediate to advanced. Participants took a battery of ten linguistic, cognitive and affective measures. Then, the collected data were subjected to several preliminary analyses, but structural equation modeling (SEM) was then used as the primary analysis method to answer the study research questions. Results of the preliminary analyses revealed that MK and WM were significant predictors of L2 listening ability; thus, they were kept in the main SEM analyses. The significant role of WM was only observed when the negative effect of anxiety on WM was accounted for. Preliminary analyses also showed that VB and VD were not distinct measures of VK. However, the results also showed that if VB and VD were considered separate, VD was a better predictor of L2 listening success. The main analyses of the current study revealed a significant role for both VK and SK in explaining success in L2 listening comprehension, which differs from findings from previous empirical studies. However, SEM analysis did not reveal a statistically significant difference in terms of the predictive power of the two linguistic factors. Descriptive results of the SEM analysis, along with results from regression analysis, indicated to a more significant role for VK.