76 resultados para Wikipedia, crowdsourcing, traduzione collaborativa

em Queensland University of Technology - ePrints Archive


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since its debut in 2001 Wikipedia has attracted the attention of many researchers in different fields. In recent years researchers in the area of ontology learning have realised the huge potential of Wikipedia as a source of semi-structured knowledge and several systems have used it as their main source of knowledge. However, the techniques used to extract semantic information vary greatly, as do the resulting ontologies. This paper introduces a framework to compare ontology learning systems that use Wikipedia as their main source of knowledge. Six prominent systems are compared and contrasted using the framework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Crowdsourcing harnesses the potential of large and open networks of people. It is a relatively new phenomenon and attracted substantial interest in practice. Related research, however, lacks a theoretical foundation. We propose a system-theoretical perspective on crowdsourcing systems to address this gap and illustrate its applicability by using it to classify crowdsourcing systems. By deriving two principal dimensions from theory, we identify four fundamental types of crowdsourcing systems that help to distinguish important features of such systems. We analyse their respective characteristics and discuss implications and requirements for various aspects related to the design of such systems. Our results demonstrate that systems theory can inform the study of crowdsourcing systems. The identified system types and the implications on their design may prove useful for researchers to frame future studies and for practitioners to identify the right crowdsourcing systems for a particular purpose.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analysis of Wikipedia's inter-language links provides insight into a new mechanism of knowledge sharing and linking worldwide.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Web 2.0 technologies have mobilised collaborative peer production and participatory cultures for online content creation. However, not all online communities engaging in these activities are independently facilitated and often operate within the auspices of the cultural institutions that develop and resource them. Borrowing from the principles of Wikipedia that supports collaborative online content creation and online community, ABC Pool (abc.net.au/pool) is one such institutional online community operating with the support of the Australian Public Service Broadcaster (PSB), the Australian Broadcasting Corporation (ABC). This paper explores the collaborative, creative, and governance activities of an institutional online community and how the role of the community manager is an intermediary within these arrangements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we examine automated Chinese to English link discovery in Wikipedia and the effects of Chinese segmentation and Chinese to English translation on the hyperlink recommendation. Our experimental results show that the implemented link discovery framework can effectively recommend Chinese-to-English cross-lingual links. The techniques described here can assist bi-lingual users where a particular topic is not covered in Chinese, is not equally covered in both languages, or is biased in one language; as well as for language learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Crowdsourcing has become a popular approach for capitalizing on the potential of large and open crowds of people external to the organization. While crowdsourcing as a phenomenon is studied in a variety of fields, research mostly focuses on isolated aspects and little is known about the integrated design of crowdsourcing efforts. We introduce a socio-technical systems perspective on crowdsourcing, which provides a deeper understanding of the components and relationships in crowdsourcing systems. By considering the function of crowdsourcing systems within their organizational context, we develop a typology of four distinct system archetypes. We analyze the characteristics of each type and derive a number of design requirements for the respective system components. The paper lays a foundation for IS-based crowdsourcing research, channels related academic work, and helps guiding the study and design of crowdsourcing information systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Building and maintaining software are not easy tasks. However, thanks to advances in web technologies, a new paradigm is emerging in software development. The Service Oriented Architecture (SOA) is a relatively new approach that helps bridge the gap between business and IT and also helps systems remain exible. However, there are still several challenges with SOA. As the number of available services grows, developers are faced with the problem of discovering the services they need. Public service repositories such as Programmable Web provide only limited search capabilities. Several mechanisms have been proposed to improve web service discovery by using semantics. However, most of these require manually tagging the services with concepts in an ontology. Adding semantic annotations is a non-trivial process that requires a certain skill-set from the annotator and also the availability of domain ontologies that include the concepts related to the topics of the service. These issues have prevented these mechanisms becoming widespread. This thesis focuses on two main problems. First, to avoid the overhead of manually adding semantics to web services, several automatic methods to include semantics in the discovery process are explored. Although experimentation with some of these strategies has been conducted in the past, the results reported in the literature are mixed. Second, Wikipedia is explored as a general-purpose ontology. The benefit of using it as an ontology is assessed by comparing these semantics-based methods to classic term-based information retrieval approaches. The contribution of this research is significant because, to the best of our knowledge, a comprehensive analysis of the impact of using Wikipedia as a source of semantics in web service discovery does not exist. The main output of this research is a web service discovery engine that implements these methods and a comprehensive analysis of the benefits and trade-offs of these semantics-based discovery approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents research that investigated the role of conflict in the editorial process of the online encyclopedia, Wikipedia. The study used a grounded approach to analyzing 147 conversations about quality from the archived history of the Wikipedia article 'Australia'. It found that conflict in Wikipedia is a generative friction, regulated by references to policy as part of a coordinated effort within the community to improve the quality of articles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.