945 resultados para Language Resources


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Researchers and developers in academia and industry would benefit from a facility that enables them to easily locate, licence and use the kind of empirical data they need for testing and refining their hypotheses and to deposit and disseminate their data e.g. to support replication and validation of reported scientific experiments. To answer these needs initially in Finland, there is an ongoing project at University of Helsinki and its collaborators to create a user-friendly web service for researchers and developers in Finland and other countries. In our talk, we describe ongoing work to create a palette of extensive but easily available Finnish language resources and technologies for the research community, including lexical resources, wordnets, morphologically tagged corpora, dependency syntactic treebanks and parsebanks, open-source finite state toolkits and libraries and language models to support text analysis and processing at customer site. Also first publicly available results are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces the META-NORD project which develops Nordic and Baltic part of the European open language resource infrastructure. META-NORD works on assembling, linking across languages, and making widely available the basic language resources used by developers, professionals and researchers to build specific products and applications. The goals of the project, overall approach and specific focus lines on wordnets, terminology resources and treebanks are described. Moreover, results achieved in first five months of the project, i.e. language whitepapers, metadata specification and IPR, are presented.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conferencia por invitación, impartida el 31 d mayo de 2014 en el Workshop on Language Technology Service Platforms: Synergies, Standards, Sharing at LREC2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a methodology for legacy language resource adaptation that generates domain-specific sentiment lexicons organized around domain entities described with lexical information and sentiment words described in the context of these entities. We explain the steps of the methodology and we give a working example of our initial results. The resulting lexicons are modelled as Linked Data resources by use of established formats for Linguistic Linked Data (lemon, NIF) and for linked sentiment expressions (Marl), thereby contributing and linking to existing Language Resources in the Linguistic Linked Open Data cloud.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper examines the adaptations of the writing system in Internet language in mainland China from a sociolinguistic perspective. A comparison is also made of the adaptations in mainland China with those that Su (2003) found in Taiwan. In Computer-Mediated Communication (CMC), writing systems are often adapted to compensate for their inherent inadequacies (such as difficulty in input). Su (2003) investigates the creative uses of the writing system on the electronic bulletin boards (BBS) of two college student organizations in Taipei, Taiwan, and identifies four popular and creative uses of the Chinese writing system: stylized English, stylized Taiwanese-accented Mandarin, stylized Taiwanese, and the recycling of a transliteration alphabet used in elementary education. According to Coupland (2001; cited in Su 2003), stylization is “the knowing deployment of culturally familiar styles and identities that are marked as deviating from those predictably associated with the current speaking context”. Within this framework and drawing on the data in previous publications on Internet language and online sources, this study identifies five types of adaptations in mainland China’s Internet language: stylized Mandarin (e.g., 漂漂 piāopiāo for 漂亮 ‘beautiful’), stylized dialect-accented Mandarin (e.g., 灰常 huīcháng for 非常 ‘very much’), stylized English (e.g., 伊妹儿 yīmèier for ‘email’), stylized initials (e.g., bt 变态 biàntài for ‘abnormal’; pk, short form for ‘player kill’), and stylized numbers (e.g., 9494 jiùshi jiùshi 就是就是 ‘that is it’). The Internet community is composed of highly mobile individuals and thus forms a weak-tie social network. According to Milroy and Milroy (1992), a social network with weak ties is often where language innovation takes place. Adaptations of the Chinese writing system in Internet language provide interesting evidence for the innovations within a weak-tie social network. Our comparison of adaptations in mainland China and Taiwan shows that, in maximizing the effectiveness and functionality of their communication, participants of Internet communication are confronted with different language resources and situations, including differences in Romanization systems, English proficiency level, and attitudes towards English usage. As argued by Milroy and Milroy (1992), a weak-tie social network model can bridge the social class and social network. In the Internet community, the degree of diversity of the stylized linguistic varieties indexes the virtual and/or social status of its participants: the more diversified one’s Internet language is, the higher is his/her virtual and/or social status.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in ‘‘data silos’’ and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gaps

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the main arguments, and they are aligned among languages. In this paper we present an automatic system for the extraction and alignment of syntactic semantic patterns from two manually annotated corpora, and evaluate the main linguistic problems that we must deal with in the alignment process.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This is a multiple case study of the leadership language of three senior women working in a large corporation in Bahrain. The study’s main aim is to explore the linguistic practices the women leaders use with their colleagues and subordinates in corporate meetings. Adopting a Foucauldian (1972) notion of ‘discourses’ as social practices and a view of gender as socially constructed and discursively performed (Butler 1990), this research aims to unveil the competing discourses which may shape the leadership language of senior women in their communities of practice. The research is situated within the broader field of Sociolinguistics and the specific field of Language and Gender. To address the research aim, a case study approach incorporating multiple methods of qualitative data collection (observation, interviews, and shadowing) was utilised to gather information about the three women leaders and produce a rich description of their use of language in and out of meeting contexts. For analysis, principles of Qualitative Data Analysis (QDA) were used to organise and sort the large amount of data. Also, Feminist Post- Structuralist Discourse Analysis (FPDA) was adopted to produce a multi-faceted analysis of the subjects, their language leadership, power relations, and competing discourses in the context. It was found that the three senior women enact leadership differently making variable use of a repertoire of conventionally masculine and feminine linguistic practices. However, they all appear to have limited language resources and even more limiting subject positions; and they all have to exercise considerable linguistic expertise to police and modify their language in order to avoid the ‘double bind’. Yet, the extent of this limitation and constraints depends on the community of practice with its prevailing discourses, which appear to have their roots in Islamic and cultural practices as well as some Western influences acquired throughout the company’s history. It is concluded that it may be particularly challenging for Middle Eastern women to achieve any degree of equality with men in the workplace because discourses of Gender difference lie at the core of Islamic teaching and ideology.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowledge in machine interpretable formats from instructions has become an increasingly popular research topic due to their potential applications in process automation. However, it has been insufficiently addressed. This paper presents an approach and an implemented system to assist users to automatically acquire procedural knowledge in structured forms from instructions. We introduce a generic semantic representation of procedures for analysing instructions, using which natural language techniques are applied to automatically extract structured procedures from instructions. The method is evaluated in three domains to justify the generality of the proposed semantic representation as well as the effectiveness of the implemented automatic system.