55 resultados para lexical semantics


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Crowdsourcing linguistic phenomena with smartphone applications is relatively new. In linguistics, apps have predominantly been developed to create pronunciation dictionaries, to train acoustic models, and to archive endangered languages. This paper presents the first account of how apps can be used to collect data suitable for documenting language change: we created an app, Dialäkt Äpp (DÄ), which predicts users’ dialects. For 16 linguistic variables, users select a dialectal variant from a drop-down menu. DÄ then geographically locates the user’s dialect by suggesting a list of communes where dialect variants most similar to their choices are used. Underlying this prediction are 16 maps from the historical Linguistic Atlas of German-speaking Switzerland, which documents the linguistic situation around 1950. Where users disagree with the prediction, they can indicate what they consider to be their dialect’s location. With this information, the 16 variables can be assessed for language change. Thanks to the playfulness of its functionality, DÄ has reached many users; our linguistic analyses are based on data from nearly 60,000 speakers. Results reveal a relative stability for phonetic variables, while lexical and morphological variables seem more prone to change. Crowdsourcing large amounts of dialect data with smartphone apps has the potential to complement existing data collection techniques and to provide evidence that traditional methods cannot, with normal resources, hope to gather. Nonetheless, it is important to emphasize a range of methodological caveats, including sparse knowledge of users’ linguistic backgrounds (users only indicate age, sex) and users’ self-declaration of their dialect. These are discussed and evaluated in detail here. Findings remain intriguing nevertheless: as a means of quality control, we report that traditional dialectological methods have revealed trends similar to those found by the app. This underlines the validity of the crowdsourcing method. We are presently extending DÄ architecture to other languages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Crowdsourcing linguistic phenomena with smartphone applications is relatively new. Apps have been used to train acoustic models for automatic speech recognition (de Vries et al. 2014) and to archive endangered languages (Iwaidja Inyaman Team 2012). Leemann and Kolly (2013) developed a free app for iOS—Dialäkt Äpp (DÄ) (>78k downloads)—to document language change in Swiss German. Here, we present results of sound change based on DÄ data. DÄ predicts the users’ dialects: for 16 variables, users select their dialectal variant. DÄ then tells users which dialect they speak. Underlying this prediction are maps from the Linguistic Atlas of German-speaking Switzerland (SDS, 1962-2003), which documents the linguistic situation around 1950. If predicted wrongly, users indicate their actual dialect. With this information, the 16 variables can be assessed for language change. Results revealed robustness of phonetic variables; lexical and morphological variables were more prone to change. Phonetic variables like to lift (variants: /lupfə, lʏpfə, lipfə/) revealed SDS agreement scores of nearly 85%, i.e., little sound change. Not all phonetic variables are equally robust: ladle (variants: /xælə, xællə, xæuə, xæɫə, xæɫɫə/) exhibited significant sound change. We will illustrate the results using maps that show details of the sound changes at hand.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The lexical items like and well can serve as discourse markers (DMs), but can also play numerous other roles, such as verb or adverb. Identifying the occurrences that function as DMs is an important step for language understanding by computers. In this study, automatic classifiers using lexical, prosodic/positional and sociolinguistic features are trained over transcribed dialogues, manually annotated with DM information. The resulting classifiers improve state-of-the-art performance of DM identification, at about 90% recall and 79% precision for like (84.5% accuracy, κ = 0.69), and 99% recall and 98% precision for well (97.5% accuracy, κ = 0.88). Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well. The differentiated processing of each type of DM improves classification accuracy, suggesting that these types should be treated individually.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discourse connectives are lexical items indicating coherence relations between discourse segments. Even though many languages possess a whole range of connectives, important divergences exist cross-linguistically in the number of connectives that are used to express a given relation. For this reason, connectives are not easily paired with a univocal translation equivalent across languages. This paper is a first attempt to design a reliable method to annotate the meaning of discourse connectives cross-linguistically using corpus data. We present the methodological choices made to reach this aim and report three annotation experiments using the framework of the Penn Discourse Tree Bank.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research has mainly focussed on the perceptual nature of synaesthesia. However, synaesthetic experiences are also semantically represented. It was our aim to develop a task to investigate the semantic representation of the concurrent and its relation to the inducer in grapheme-colour synaesthesia. Non-synaesthetes were either tested with a lexical-decision (i.e., word / non-word) or a semantic-classification (i.e., edibility decision) task. Targets consisted of words which were strongly associated with a specific colour (e.g., banana - yellow) and words which were neutral and not associated with a specific colour (e.g., aunt). Target words were primed with colours: the prime target relationship was either intramodal (i.e., word - word) or crossmodal (colour patch - word). Each of the four task versions consisted of three conditions: congruent (same colour for prime and target), incongruent (different colour), and unrelated (neutral target). For both tasks (i.e., lexical and semantic) and both versions of the task (i.e., intramodal and crossmodal), we expected faster reaction times (RTs) in the congruent condition than in the neutral condition and slower RTs in the incongruent condition than the neutral condition. Stronger effects were expected in the intramodal condition due to the overlap in the prime target modality. The results suggest that the hypotheses were partly confirmed. We conclude that the tasks and hypotheses can be readily adopted to investigate the nature of the representation of the synaesthetic experiences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of the present thesis was to investigate the production of code-switched utterances in bilinguals’ speech production. This study investigates the availability of grammatical-category information during bilingual language processing. The specific aim is to examine the processes involved in the production of Persian-English bilingual compound verbs (BCVs). A bilingual compound verb is formed when the nominal constituent of a compound verb is replaced by an item from the other language. In the present cases of BCVs the nominal constituents are replaced by a verb from the other language. The main question addressed is how a lexical element corresponding to a verb node can be placed in a slot that corresponds to a noun lemma. This study also investigates how the production of BCVs might be captured within a model of BCVs and how such a model may be integrated within incremental network models of speech production. In the present study, both naturalistic and experimental data were used to investigate the processes involved in the production of BCVs. In the first part of the present study, I collected 2298 minutes of a popular Iranian TV program and found 962 code-switched utterances. In 83 (8%) of the switched cases, insertions occurred within the Persian compound verb structure, hence, resulting in BCVs. As to the second part of my work, a picture-word interference experiment was conducted. This study addressed whether in the case of the production of Persian-English BCVs, English verbs compete with the corresponding Persian compound verbs as a whole, or whether English verbs compete with the nominal constituents of Persian compound verbs only. Persian-English bilinguals named pictures depicting actions in 4 conditions in Persian (L1). In condition 1, participants named pictures of action using the whole Persian compound verb in the context of its English equivalent distractor verb. In condition 2, only the nominal constituent was produced in the presence of the light verb of the target Persian compound verb and in the context of a semantically closely related English distractor verb. In condition 3, the whole Persian compound verb was produced in the context of a semantically unrelated English distractor verb. In condition 4, only the nominal constituent was produced in the presence of the light verb of the target Persian compound verb and in the context of a semantically unrelated English distractor verb. The main effect of linguistic unit was significant by participants and items. Naming latencies were longer in the nominal linguistic unit compared to the compound verb (CV) linguistic unit. That is, participants were slower to produce the nominal constituent of compound verbs in the context of a semantically closely related English distractor verb compared to producing the whole compound verbs in the context of a semantically closely related English distractor verb. The three-way interaction between version of the experiment (CV and nominal versions), linguistic unit (nominal and CV linguistic units), and relation (semantically related and unrelated distractor words) was significant by participants. In both versions, naming latencies were longer in the semantically related nominal linguistic unit compared to the response latencies in the semantically related CV linguistic unit. In both versions, naming latencies were longer in the semantically related nominal linguistic unit compared to response latencies in the semantically unrelated nominal linguistic unit. Both the analysis of the naturalistic data and the results of the experiment revealed that in the case of the production of the nominal constituent of BCVs, a verb from the other language may compete with a noun from the base language, suggesting that grammatical category does not necessarily provide a constraint on lexical access during the production of the nominal constituent of BCVs. There was a minimal context in condition 2 (the nominal linguistic unit) in which the nominal constituent was produced in the presence of its corresponding light verb. The results suggest that generating words within a context may not guarantee that the effect of grammatical class becomes available. A model is proposed in order to characterize the processes involved in the production of BCVs. Implications for models of bilingual language production are discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing —- bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reading strategies vary across languages according to orthographic depth - the complexity of the grapheme in relation to phoneme conversion rules - notably at the level of eye movement patterns. We recently demonstrated that a group of early bilinguals, who learned both languages equally under the age of seven, presented a first fixation location (FFL) closer to the beginning of words when reading in German as compared with French. Since German is known to be orthographically more transparent than French, this suggested that different strategies were being engaged depending on the orthographic depth of the used language. Opaque languages induce a global reading strategy, and transparent languages force a local/serial strategy. Thus, pseudo-words were processed using a local strategy in both languages, suggesting that the link between word forms and their lexical representation may also play a role in selecting a specific strategy. In order to test whether corresponding effects appear in late bilinguals with low proficiency in their second language (L2), we present a new study in which we recorded eye movements while two groups of late German-French and French-German bilinguals read aloud isolated French and German words and pseudo-words. Since, a transparent reading strategy is local and serial, with a high number of fixations per stimuli, and the level of the bilingual participants' L2 is low, the impact of language opacity should be observed in L1. We therefore predicted a global reading strategy if the bilinguals' L1 was French (FFL close to the middle of the stimuli with fewer fixations per stimuli) and a local and serial reading strategy if it was German. Thus, the L2 of each group, as well as pseudo-words, should also require a local and serial reading strategy. Our results confirmed these hypotheses, suggesting that global word processing is only achieved by bilinguals with an opaque L1 when reading in an opaque language; the low level in the L2 gives way to a local and serial reading strategy. These findings stress the fact that reading behavior is influenced not only by the linguistic mode but also by top-down factors, such as readers' proficiency.