958 resultados para lexical
Resumo:
Esta tese, elaborada a partir do nosso interesse em determinar os níveis de conhecimento linguístico das crianças à entrada e saída do primeiro ano de escolaridade, teve como objetivo verificar de que forma esse conhecimento influencia a aprendizagem da leitura e da escrita. Para tal, realizámos parte de dois testes que permitem avaliar as competências linguísticas de crianças que possuem o Português como Língua Materna, o Teste de Identificação de Competências Linguísticas (TICL), de Fernanda Leopoldina Viana (2004) e o Teste de Avaliação da Linguagem Oral (TALO), de Inês Sim-Sim (2003) numa turma a iniciar e a concluir o primeiro ano de escolaridade. Com a análise de dados almejamos concluir que os alunos que apresentam um capital lexical mais variado à entrada no primeiro Ciclo do Ensino Básico apresentam melhores resultados na aprendizagem da leitura e da escrita.
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
Introduction Many bilinguals will have had the experience of unintentionally reading something in a language other than the intended one (e.g. MUG to mean mosquito in Dutch rather than a receptacle for a hot drink, as one of the possible intended English meanings), of finding themselves blocked on a word for which many alternatives suggest themselves (but, somewhat annoyingly, not in the right language), of their accent changing when stressed or tired and, occasionally, of starting to speak in a language that is not understood by those around them. These instances where lexical access appears compromised and control over language behavior is reduced hint at the intricate structure of the bilingual lexical architecture and the complexity of the processes by which knowledge is accessed and retrieved. While bilinguals might tend to blame word finding and other language problems on their bilinguality, these difficulties per se are not unique to the bilingual population. However, what is unique, and yet far more common than is appreciated by monolinguals, is the cognitive architecture that subserves bilingual language processing. With bilingualism (and multilingualism) the rule rather than the exception (Grosjean, 1982), this architecture may well be the default structure of the language processing system. As such, it is critical that we understand more fully not only how the processing of more than one language is subserved by the brain, but also how this understanding furthers our knowledge of the cognitive architecture that encapsulates the bilingual mental lexicon. The neurolinguistic approach to bilingualism focuses on determining the manner in which the two (or more) languages are stored in the brain and how they are differentially (or similarly) processed. The underlying assumption is that the acquisition of more than one language requires at the very least a change to or expansion of the existing lexicon, if not the formation of language-specific components, and this is likely to manifest in some way at the physiological level. There are many sources of information, ranging from data on bilingual aphasic patients (Paradis, 1977, 1985, 1997) to lateralization (Vaid, 1983; see Hull & Vaid, 2006, for a review), recordings of event-related potentials (ERPs) (e.g. Ardal et al., 1990; Phillips et al., 2006), and positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) studies of neurologically intact bilinguals (see Indefrey, 2006; Vaid & Hull, 2002, for reviews). Following the consideration of methodological issues and interpretative limitations that characterize these approaches, the chapter focuses on how the application of these approaches has furthered our understanding of (1) selectivity of bilingual lexical access, (2) distinctions between word types in the bilingual lexicon and (3) control processes that enable language selection.
Resumo:
The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a non-dictionary based technique with a dictionary-based technique. This method first applies the Thai language grammar rules to the text for identifying syllables. The hidden Markov model is then used for merging possible syllables into words. The identified words are verified with a lexical dictionary and a decision tree is employed to discover the words unidentified by the lexical dictionary. Documents used in the litigation process of Thai court proceedings have been used in experiments. The results which are segmented words, obtained by the proposed method outperform the results obtained by other existing methods.
Resumo:
The educational landscape around middle schooling reform is a contemporary focus of the Australian school education agenda. The University of Queensland Middle Years of Schooling pre-service teacher education program develops specialist teachers for this crucial phase of schooling. This program has become a national leader for middle school teacher education. This paper reports on aspects of a longitudinal study that began with the first cohort of students in the program in 2003. To date 234 students have been involved as participants in the study. The findings demonstrate that students: can articulate what is meant by the term middle years and can identify with a need for a philosophy of middle schooling; are aware that they are part of a reform movement which has swept the nation and which has implications for teaching in schools in the twenty first century; are confident the program is producing highly skilled professional teachers willing to take on the challenges of teaching in the middle years; can say how their training has helped them understand and account for the educational experiences of students in a time of transition; and hold quiet, yet firm beliefs about teaching in the middle years. Furthermore, using a measure of lexical density to analyze the verbs used by respondents, it seems that this quiet confidence has grown in the period from 2003 – 2006.
Resumo:
To date, studies have focused on the acquisition of alphabetic second languages (L2s) in alphabetic first language (L1) users, demonstrating significant transfer effects. The present study examined the process from a reverse perspective, comparing logographic (Mandarin-Chinese) and alphabetic (English) L1 users in the acquisition of an artificial logographic script, in order to determine whether similar language-specific advantageous transfer effects occurred. English monolinguals, English-French bilinguals and Chinese-English bilinguals learned a small set of symbols in an artificial logographic script and were subsequently tested on their ability to process this script in regard to three main perspectives: L2 reading, L2 working memory (WM), and inner processing strategies. In terms of L2 reading, a lexical decision task on the artificial symbols revealed markedly faster response times in the Chinese-English bilinguals, indicating a logographic transfer effect suggestive of a visual processing advantage. A syntactic decision task evaluated the degree to which the new language was mastered beyond the single word level. No L1-specific transfer effects were found for artificial language strings. In order to investigate visual processing of the artificial logographs further, a series of WM experiments were conducted. Artificial logographs were recalled under concurrent auditory and visuo-spatial suppression conditions to disrupt phonological and visual processing, respectively. No L1-specific transfer effects were found, indicating no visual processing advantage of the Chinese-English bilinguals. However, a bilingual processing advantage was found indicative of a superior ability to control executive functions. In terms of L1 WM, the Chinese-English bilinguals outperformed the alphabetic L1 users when processing L1 words, indicating a language experience-specific advantage. Questionnaire data on the cognitive strategies that were deployed during the acquisition and processing of the artificial logographic script revealed that the Chinese-English bilinguals rated their inner speech as lower than the alphabetic L1 users, suggesting that they were transferring their phonological processing skill set to the acquisition and use of an artificial script. Overall, evidence was found to indicate that language learners transfer specific L1 orthographic processing skills to L2 logographic processing. Additionally, evidence was also found indicating that a bilingual history enhances cognitive performance in L2.
Resumo:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.
Resumo:
My research investigates why nouns are learned disproportionately more frequently than other kinds of words during early language acquisition (Gentner, 1982; Gleitman, et al., 2004). This question must be considered in the context of cognitive development in general. Infants have two major streams of environmental information to make meaningful: perceptual and linguistic. Perceptual information flows in from the senses and is processed into symbolic representations by the primitive language of thought (Fodor, 1975). These symbolic representations are then linked to linguistic input to enable language comprehension and ultimately production. Yet, how exactly does perceptual information become conceptualized? Although this question is difficult, there has been progress. One way that children might have an easier job is if they have structures that simplify the data. Thus, if particular sorts of perceptual information could be separated from the mass of input, then it would be easier for children to refer to those specific things when learning words (Spelke, 1990; Pylyshyn, 2003). It would be easier still, if linguistic input was segmented in predictable ways (Gentner, 1982; Gleitman, et al., 2004) Unfortunately the frequency of patterns in lexical or grammatical input cannot explain the cross-cultural and cross-linguistic tendency to favor nouns over verbs and predicates. There are three examples of this failure: 1) a wide variety of nouns are uttered less frequently than a smaller number of verbs and yet are learnt far more easily (Gentner, 1982); 2) word order and morphological transparency offer no insight when you contrast the sentence structures and word inflections of different languages (Slobin, 1973) and 3) particular language teaching behaviors (e.g. pointing at objects and repeating names for them) have little impact on children's tendency to prefer concrete nouns in their first fifty words (Newport, et al., 1977). Although the linguistic solution appears problematic, there has been increasing evidence that the early visual system does indeed segment perceptual information in specific ways before the conscious mind begins to intervene (Pylyshyn, 2003). I argue that nouns are easier to learn because their referents directly connect with innate features of the perceptual faculty. This hypothesis stems from work done on visual indexes by Zenon Pylyshyn (2001, 2003). Pylyshyn argues that the early visual system (the architecture of the "vision module") segments perceptual data into pre-conceptual proto-objects called FINSTs. FINSTs typically correspond to physical things such as Spelke objects (Spelke, 1990). Hence, before conceptualization, visual objects are picked out by the perceptual system demonstratively, like a finger pointing indicating ‘this’ or ‘that’. I suggest that this primitive system of demonstration elaborates on Gareth Evan's (1982) theory of nonconceptual content. Nouns are learnt first because their referents attract demonstrative visual indexes. This theory also explains why infants less often name stationary objects such as plate or table, but do name things that attract the focal attention of the early visual system, i.e., small objects that move, such as ‘dog’ or ‘ball’. This view leaves open the question how blind children learn words for visible objects and why children learn category nouns (e.g. 'dog'), rather than proper nouns (e.g. 'Fido') or higher taxonomic distinctions (e.g. 'animal').
Resumo:
It is recognised that individuals do not always respond honestly when completing psychological tests. One of the foremost issues for research in this area is the inability to detect individuals attempting to fake. While a number of strategies have been identified in faking, a commonality of these strategies is the latent role of long term memory. Seven studies were conducted in order to examine whether it is possible to detect the activation of faking related cognitions using a lexical decision task. Study 1 found that engagement with experiential processing styles predicted the ability to fake successfully, confirming the role of associative processing styles in faking. After identifying appropriate stimuli for the lexical decision task (Studies 2A and 2B), Studies 3 to 5 examined whether a cognitive state of faking could be primed and subsequently identified, using a lexical decision task. Throughout the course of these studies, the experimental methodology was increasingly refined in an attempt to successfully identify the relevant priming mechanisms. The results were consistent and robust throughout the three priming studies: faking good on a personality test primed positive faking related words in the lexical decision tasks. Faking bad, however, did not result in reliable priming of negative faking related cognitions. To more completely address potential issues with the stimuli and the possible role of affective priming, two additional studies were conducted. Studies 6A and 6B revealed that negative faking related words were more arousing than positive faking related words, and that positive faking related words were more abstract than negative faking related words and neutral words. Study 7 examined whether the priming effects evident in the lexical decision tasks occurred as a result of an unintentional mood induction while faking the psychological tests. Results were equivocal in this regard. This program of research aligned the fields of psychological assessment and cognition to inform the preliminary development and validation of a new tool to detect faking. Consequently, an implicit technique to identify attempts to fake good on a psychological test has been identified, using long established and robust cognitive theories in a novel and innovative way. This approach represents a new paradigm for the detection of individuals responding strategically to psychological testing. With continuing development and validation, this technique may have immense utility in the field of psychological assessment.
Resumo:
Gray‘s (2000) revised Reinforcement Sensitivity Theory (r-RST) was used to investigate personality effects on information processing biases to gain-framed and loss-framed anti-speeding messages and the persuasiveness of these messages. The r-RST postulates that behaviour is regulated by two major motivational systems: reward system or punishment system. It was hypothesised that both message processing and persuasiveness would be dependent upon an individual‘s sensitivity to reward or punishment. Student drivers (N = 133) were randomly assigned to view one of four anti-speeding messages or no message (control group). Individual processing differences were then measured using a lexical decision task, prior to participants completing a personality and persuasion questionnaire. Results indicated that participants who were more sensitive to reward showed a marginally significant (p = .050) tendency to report higher intentions to comply with the social gain-framed message and demonstrate a cognitive processing bias towards this message, than those with lower reward sensitivity.
Resumo:
Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.
Resumo:
Using Gray and McNaughton’s (2000) revised Reinforcement Sensitivity Theory (r-RST), we examined the influence of personality on processing of words presented in gain-framed and loss-framed anti-speeding messages and how the processing biases associated with personality influenced message acceptance. The r-RST predicts that the nervous system regulates personality and that behaviour is dependent upon the activation of the Behavioural Activation System (BAS), activated by reward cues and the Fight-Flight-Freeze System (FFFS), activated by punishment cues. According to r-RST, individuals differ in the sensitivities of their BAS and FFFS (i.e., weak to strong), which in turn leads to stable patterns of behaviour in the presence of rewards and punishments, respectively. It was hypothesised that individual differences in personality (i.e., strength of the BAS and the FFFS) would influence the degree of both message processing (as measured by reaction time to previously viewed message words) and message acceptance (measured three ways by perceived message effectiveness, behavioural intentions, and attitudes). Specifically, it was anticipated that, individuals with a stronger BAS would process the words presented in the gain-frame messages faster than those with a weaker BAS and individuals with a stronger FFFS would process the words presented in the loss-frame messages faster than those with a weaker FFFS. Further, it was expected that greater processing (faster reaction times) would be associated with greater acceptance for that message. Driver licence holding students (N = 108) were recruited to view one of four anti-speeding messages (i.e., social gain-frame, social loss-frame, physical gain-frame, and physical loss-frame). A computerised lexical decision task assessed participants’ subsequent reaction times to message words, as an indicator of the extent of processing of the previously viewed message. Self-report measures assessed personality and the three message acceptance measures. As predicted, the degree of initial processing of the content of the social gain-framed message mediated the relationship between the reward sensitive trait and message effectiveness. Initial processing of the physical loss-framed message partially mediated the relationship between the punishment sensitive trait and both message effectiveness and behavioural intention ratings. These results show that reward sensitivity and punishment sensitivity traits influence cognitive processing of gain-framed and loss-framed message content, respectively, and subsequently, message effectiveness and behavioural intention ratings. Specifically, a range of road safety messages (i.e., gain-frame and loss-frame messages) could be designed which align with the processing biases associated with personality and which would target those individuals who are sensitive to rewards and those who are sensitive to punishments.
Resumo:
The cross-sections of the Social Web and the Semantic Web has put folksonomy in the spot light for its potential in overcoming knowledge acquisition bottleneck and providing insight for "wisdom of the crowds". Folksonomy which comes as the results of collaborative tagging activities has provided insight into user's understanding about Web resources which might be useful for searching and organizing purposes. However, collaborative tagging vocabulary poses some challenges since tags are freely chosen by users and may exhibit synonymy and polysemy problem. In order to overcome these challenges and boost the potential of folksonomy as emergence semantics we propose to consolidate the diverse vocabulary into a consolidated entities and concepts. We propose to extract a tag ontology by ontology learning process to represent the semantics of a tagging community. This paper presents a novel approach to learn the ontology based on the widely used lexical database WordNet. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. We provide empirical evaluations by using the semantic information contained in the ontology in a tag recommendation experiment. The results show that by using the semantic relationships on the ontology the accuracy of the tag recommender has been improved.
Resumo:
Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. One of the most popular web personalization systems is recommender systems. In recommender systems choosing user information that can be used to profile users is very crucial for user profiling. In Web 2.0, one facility that can help users organize Web resources of their interest is user tagging systems. Exploring user tagging behavior provides a promising way for understanding users’ information needs since tags are given directly by users. However, free and relatively uncontrolled vocabulary makes the user self-defined tags lack of standardization and semantic ambiguity. Also, the relationships among tags need to be explored since there are rich relationships among tags which could provide valuable information for us to better understand users. In this paper, we propose a novel approach for learning tag ontology based on the widely used lexical database WordNet for capturing the semantics and the structural relationships of tags. We present personalization strategies to disambiguate the semantics of tags by combining the opinion of WordNet lexicographers and users’ tagging behavior together. To personalize further, clustering of users is performed to generate a more accurate ontology for a particular group of users. In order to evaluate the usefulness of the tag ontology, we use the tag ontology in a pilot tag recommendation experiment for improving the recommendation performance by exploiting the semantic information in the tag ontology. The initial result shows that the personalized information has improved the accuracy of the tag recommendation.
Resumo:
This paper proposes a framework to analyse performance on multiple choice questions with the focus on linguistic factors. Item Response Theory (IRT) is deployed to estimate ability and question difficulty levels. A logistic regression model is used to detect Differential Item Functioning questions. Probit models testify relationships between performance and linguistic factors controlling the effects of question construction and students’ background. Empirical results have important implications. The lexical density of stems affects performance. The use of non-Economics specialised vocabulary has differing impacts on the performance of students with different language backgrounds. The IRT-based ability and difficulty help explain performance variations.