76 resultados para Wikipedia, crowdsourcing, traduzione collaborativa
Resumo:
The Link the Wiki track at INEX 2008 offered two tasks, file-to-file link discovery and anchor-to-BEP link discovery. In the former 6600 topics were used and in the latter 50 were used. Manual assessment of the anchor-to-BEP runs was performed using a tool developed for the purpose. Runs were evaluated using standard precision & recall measures such as MAP and precision / recall graphs. 10 groups participated and the approaches they took are discussed. Final evaluation results for all runs are presented.
Resumo:
This paper presents an overview of NTCIR-9 Cross-lingual Link Discovery (Crosslink) task. The overview includes: the motivation of cross-lingual link discovery; the Crosslink task definition; the run submission specification; the assessment and evaluation framework; the evaluation metrics; and the evaluation results of submitted runs. Cross-lingual link discovery (CLLD) is a way of automatically finding potential links between documents in different languages. The goal of this task is to create a reusable resource for evaluating automated CLLD approaches. The results of this research can be used in building and refining systems for automated link discovery. The task is focused on linking between English source documents and Chinese, Korean, and Japanese target documents.
Resumo:
This paper describes the evaluation in benchmarking the effectiveness of cross-lingual link discovery (CLLD). Cross lingual link discovery is a way of automatically finding prospective links between documents in different languages, which is particularly helpful for knowledge discovery of different language domains. A CLLD evaluation framework is proposed for system performance benchmarking. The framework includes standard document collections, evaluation metrics, and link assessment and evaluation tools. The evaluation methods described in this paper have been utilised to quantify the system performance at NTCIR-9 Crosslink task. It is shown that using the manual assessment for generating gold standard can deliver a more reliable evaluation result.
Resumo:
Purpose: The purpose of this paper is to clarify how end-users’ tacit knowledge can be captured and integrated in an overall business process management (BPM) approach. Current approaches to support stakeholders’ collaboration in the modelling of business processes envision an egalitarian environment where stakeholders interact in the same context, using the same languages and sharing the same perspectives on the business process. Therefore, such stakeholders have to collaborate in the context of process modelling using a language that some of them do not master, and have to integrate their various perspectives. Design/methodology/approach: The paper applies the SECI knowledge management process to analyse the problems of traditional top-down BPM approaches and BPM collaborative modelling tools. Besides, the SECI model is also applied to Wikipedia, a successful Web 2.0-based knowledge management environment, to identify how tacit knowledge is captured in a bottom-up approach. Findings – The paper identifies a set of requirements for a hybrid BPM approach, both top-down and bottom-up, and describes a new BPM method based on a stepwise discovery of knowledge. Originality/value: This new approach, Processpedia, enhances collaborative modelling among stakeholders without enforcing egalitarianism. In Processpedia tacit knowledge is captured and standardised into the organisation’s business processes by fostering an ecological participation of all the stakeholders and capitalising on stakeholders’ distinctive characteristics.
Resumo:
Language-use has proven to be the most complex and complicating of all Internet features, yet people and institutions invest enormously in language and crosslanguage features because they are fundamental to the success of the Internet’s past, present and future. The thesis takes into focus the developments of the latter – features that facilitate and signify linking between or across languages – both in their historical and current contexts. In the theoretical analysis, the conceptual platform of inter-language linking is developed to both accommodate efforts towards a new social complexity model for the co-evolution of languages and language content, as well as to create an open analytical space for language and cross-language related features of the Internet and beyond. The practiced uses of inter-language linking have changed over the last decades. Before and during the first years of the WWW, mechanisms of inter-language linking were at best important elements used to create new institutional or content arrangements, but on a large scale they were just insignificant. This has changed with the emergence of the WWW and its development into a web in which content in different languages co-evolve. The thesis traces the inter-language linking mechanisms that facilitated these dynamic changes by analysing what these linking mechanisms are, how their historical as well as current contexts can be understood and what kinds of cultural-economic innovation they enable and impede. The study discusses this alongside four empirical cases of bilingual or multilingual media use, ranging from television and web services for languages of smaller populations, to large-scale, multiple languages involving web ventures by the British Broadcasting Corporation, the Special Broadcasting Service Australia, Wikipedia and Google. To sum up, the thesis introduces the concepts of ‘inter-language linking’ and the ‘lateral web’ to model the social complexity and co-evolution of languages online. The resulting model reconsiders existing social complexity models in that it is the first that can explain the emergence of large-scale, networked co-evolution of languages and language content facilitated by the Internet and the WWW. Finally, the thesis argues that the Internet enables an open space for language and crosslanguage related features and investigates how far this process is facilitated by (1) amateurs and (2) human-algorithmic interaction cultures.
Resumo:
The Time magazine ‘Person of theYear’ award is a venerable institution. Established by Time’s founder Henry Luce in 1927 as ‘Man of the Year’, it is an annual award given to ‘a person, couple, group, idea, place, or machine that ‘for better or for worse ... has done the most to influence the events of the year’ (Time 2002, p. 1). In 2010, the award was given to Mark Zuckerberg, the founder and CEO of the social networking site Facebook.There was, however, a strong campaign for the ‘People’s Choice’ award to be given to Julian Assange, the founder and editor-in-chief of Wikileaks, the online whistleblowing site. Earlier in the year Wikileaks had released more than 250 000 US government diplomatic cables through the internet, and the subsequent controver- sies around the actions of Wikileaks and Assange came to be known worldwide as ‘Cablegate’. The focus of this chapter is not on the implications of ‘Cablegate’ for international diplomacy, which continue to have great significance, but rather upon what the emergence of Wikileaks has meant for journalism, and whether it provides insights into the future of journalism. Both Facebook and Wikileaks, as well as social media platforms such as Twitter and YouTube, and independent media practices such as blogging, citizen journalism and crowdsourcing, are manifestations of the rise of social media, or what has also been termed web 2.0.The term ‘web 2.0’ was coined by Tim O’Reilly, and captures the rise of online social media platforms and services, that better realise the collaborative potential of digitally networked media. They do this by moving from the relatively static, top-down notions of interactivity that informed early internet development, towards more open and evolutionary models that better harness collective intelligence by enabling users to become the creators and collaborators in the development of online media content (Musser and O’Reilly 2007; Bruns 2008).
Resumo:
Collaborative user-led content creation by online communities, or produsage (Bruns 2008), has generated a variety of useful and important resources and other valuable outcomes, from open source software through the Wikipedia to a variety of smaller-scale, specialist projects. These are often seen as standing in an inherent opposition to commercial interests, and attempts to develop collaborations between community content creators and commercial partners have had mixed success rates to date. However, such tension between community and commerce is not inevitable, and there is substantial potential for more fruitful exchanges and collaboration. This article contributes to the development of this understanding by outlining the key underlying principles of such participatory community processes and exploring the potential tensions which could arise between these communities and their potential external partners. It also sketches out potential approaches to resolving them.
Resumo:
This article provides an overview on some of the key aspects that relate to the co-evolution of languages and its associated content in the Internet environment. A focus on such a co-evolution is pertinent as the evolution of languages in the Internet environment can be better understood if the development of its existing and emerging content, that is, the content in the respective language, is taken into consideration. By doing so, this article examines two related aspects: the governance of languages at critical sites of the Internet environment, including ICANN, Wikipedia and Google Translate. Following on from this examination, the second part outlines how the co-evolution of languages and associated content in the Internet environment extends policy-making related to linguistic pluralism. It is argued that policies which centre on language availability in the Internet environment must shift their focus to the dynamics of available content instead. The notion of language pairs as a new regime of intersection for both languages and content is discussed to introduce an extended understanding of the uses of linguistic pluralism in the Internet environment. The ultimate extrapolation of such an enhanced approach, it is argued, centres less on 6,000 languages but, instead, on 36 million language pairs. This article describes how such a powerful resource evolves in the Internet environment.
Resumo:
Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, as the gathered information is from the crowd, the data quality is always hard to manage. There are many ways to manage data quality, and reputation management is one of the common approaches. In recent year, many research teams have deployed many audio or image sensors in natural environment in order to monitor the status of animals or plants. The collected data will be analysed by ecologists. However, as the amount of collected data is exceedingly huge and the number of ecologists is very limited, it is impossible for scientists to manually analyse all these data. The functions of existing automated tools to process the data are still very limited and the results are still not very accurate. Therefore, researchers have turned to recruiting general citizens who are interested in helping scientific research to do the pre-processing tasks such as species tagging. Although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Therefore, this research aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we aim to investigate how to use reputation management to enhance data reliability. Reputation systems have been used to solve the uncertainty and improve data quality in many marketing and E-Commerce domains. The commercial organizations which have chosen to embrace the reputation management and implement the technology have gained many benefits. Data quality issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. However, research on reputation management in this area is relatively new. We therefore start our investigation by examining existing reputation systems in different domains. Then we design novel reputation management approaches for Citizen Science projects to categorise participants and data. We have investigated some critical elements which may influence data reliability in Citizen Science projects. These elements include personal information such as location and education and performance information such as the ability to recognise certain bird calls. The designed reputation framework is evaluated by a series of experiments involving many participants for collecting and interpreting data, in particular, environmental acoustic data. Our research in exploring the advantages of reputation management in Citizen Science (or crowdsourcing in general) will help increase awareness among organizations that are unacquainted with its potential benefits.
Resumo:
Despite the rapidly urbanising population, public transport usage in metropolitan areas is not growing at a level that corresponds to the trend. Many people are reluctant to travel using public transport, as it is commonly associated with unpleasant experiences such as limited services, long wait time, and crowded spaces. This study aims to explore the use of mobile spatial interactions and services, and investigate their potential to increase the enjoyment of our everyday commuting experience. The main goal is to develop and evaluate mobile-mediated design interventions to foster interactions for and among passengers, as well as between passengers and public transport infrastructures, with the aim to positively influence the experience of commuting. Ultimately, this study hopes to generate findings and knowledge towards creating a more enjoyable public transport experience, as well as to explore innovative uses of mobile technologies and context-aware services for the urban lifestyle.
Resumo:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.
Resumo:
Indem sie Informationen zusammenstellt, sortiert und aktualisiert, betreibt die Wikipedia eine Form der Nachrichtenkuration. Besonders daran ist aber nicht allein, dass nicht Journalisten die Inhalte produzieren, sondern dass ein Kollektiv aus "Produtzern" dahintersteht: Der Nutzer wird zum Produzenten.
Resumo:
Being able to innovate has become a critical capability for many contemporary organizations in an effort to sustain their operations in the long run. However, existing innovation models that attempt to guide organizations emphasize different aspects of innovation (e.g., products, services or business models), different stages of innovation (e.g., ideation, implementation or operation) or different skills (e.g., development or crowdsourcing) that are necessary to innovate, in turn creating isolated pockets of understanding about different aspects of innovation. In order to yield more predictable innovation outcomes organizations need to understand what exactly they need to focus on, what capabilities they need to have and what is necessary in order to take an idea to market. This paper aims at constructing a framework for innovation that contributes to this understanding. We will focus on a number of different stages in the innovation process and highlight different types and levels of organizational, technological, individual and process capabilities required to manage the organizational innovation process. Our work offers a comprehensive conceptualization of innovation as a multi-level process model, and provides a range of implications for further empirical and theoretical examination.
Resumo:
Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9. This framework includes topics, document collections, assessments, metrics, and a toolkit for pooling, assessment, and evaluation. The assessments are further divided into two separate sets: manual assessments performed by human assessors; and automatic assessments based on links extracted from Wikipedia itself. Using this framework we show that manual assessment is more robust than automatic assessment in the context of cross-lingual link discovery.
Resumo:
At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the cross-lingual linking method achieved promising results.