230 resultados para Wikipedia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since its debut in 2001 Wikipedia has attracted the attention of many researchers in different fields. In recent years researchers in the area of ontology learning have realised the huge potential of Wikipedia as a source of semi-structured knowledge and several systems have used it as their main source of knowledge. However, the techniques used to extract semantic information vary greatly, as do the resulting ontologies. This paper introduces a framework to compare ontology learning systems that use Wikipedia as their main source of knowledge. Six prominent systems are compared and contrasted using the framework.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analysis of Wikipedia's inter-language links provides insight into a new mechanism of knowledge sharing and linking worldwide.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Web 2.0 technologies have mobilised collaborative peer production and participatory cultures for online content creation. However, not all online communities engaging in these activities are independently facilitated and often operate within the auspices of the cultural institutions that develop and resource them. Borrowing from the principles of Wikipedia that supports collaborative online content creation and online community, ABC Pool (abc.net.au/pool) is one such institutional online community operating with the support of the Australian Public Service Broadcaster (PSB), the Australian Broadcasting Corporation (ABC). This paper explores the collaborative, creative, and governance activities of an institutional online community and how the role of the community manager is an intermediary within these arrangements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we examine automated Chinese to English link discovery in Wikipedia and the effects of Chinese segmentation and Chinese to English translation on the hyperlink recommendation. Our experimental results show that the implemented link discovery framework can effectively recommend Chinese-to-English cross-lingual links. The techniques described here can assist bi-lingual users where a particular topic is not covered in Chinese, is not equally covered in both languages, or is biased in one language; as well as for language learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Building and maintaining software are not easy tasks. However, thanks to advances in web technologies, a new paradigm is emerging in software development. The Service Oriented Architecture (SOA) is a relatively new approach that helps bridge the gap between business and IT and also helps systems remain exible. However, there are still several challenges with SOA. As the number of available services grows, developers are faced with the problem of discovering the services they need. Public service repositories such as Programmable Web provide only limited search capabilities. Several mechanisms have been proposed to improve web service discovery by using semantics. However, most of these require manually tagging the services with concepts in an ontology. Adding semantic annotations is a non-trivial process that requires a certain skill-set from the annotator and also the availability of domain ontologies that include the concepts related to the topics of the service. These issues have prevented these mechanisms becoming widespread. This thesis focuses on two main problems. First, to avoid the overhead of manually adding semantics to web services, several automatic methods to include semantics in the discovery process are explored. Although experimentation with some of these strategies has been conducted in the past, the results reported in the literature are mixed. Second, Wikipedia is explored as a general-purpose ontology. The benefit of using it as an ontology is assessed by comparing these semantics-based methods to classic term-based information retrieval approaches. The contribution of this research is significant because, to the best of our knowledge, a comprehensive analysis of the impact of using Wikipedia as a source of semantics in web service discovery does not exist. The main output of this research is a web service discovery engine that implements these methods and a comprehensive analysis of the benefits and trade-offs of these semantics-based discovery approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents research that investigated the role of conflict in the editorial process of the online encyclopedia, Wikipedia. The study used a grounded approach to analyzing 147 conversations about quality from the archived history of the Wikipedia article 'Australia'. It found that conflict in Wikipedia is a generative friction, regulated by references to policy as part of a coordinated effort within the community to improve the quality of articles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Wikipedia is often held up as an example of the potential of the internet to foster open, free and non-commercial collaboration. However such discourses often conflate these values without recognising how they play out in reality in a peer-production community. As Wikipedia is evolving, it is an ideal time to examine these discourses and the tensions that exist between its initial ideals and the reality of commercial activity in the encyclopaedia. Through an analysis of three failed proposals to ban paid advocacy editing in the English language Wikipedia, this paper highlights the shift in values from the early editorial community that forked encyclopaedic content over the threat of commercialisation, to one that today values the freedom that allows anyone to edit the encyclopaedia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.