Biblioteca Digital

17 resultados para Domain-specific languages

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland

Applications of graph transformation in tools for domain-specific modeling languages

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of domain-specific languages (DSLs) has been proposed as an approach to cost-e ectively develop families of software systems in a restricted application domain. Domain-specific languages in combination with the accumulated knowledge and experience of previous implementations, can in turn be used to generate new applications with unique sets of requirements. For this reason, DSLs are considered to be an important approach for software reuse. However, the toolset supporting a particular domain-specific language is also domain-specific and is per definition not reusable. Therefore, creating and maintaining a DSL requires additional resources that could be even larger than the savings associated with using them. As a solution, di erent tool frameworks have been proposed to simplify and reduce the cost of developments of DSLs. Developers of tool support for DSLs need to instantiate, customize or configure the framework for a particular DSL. There are di erent approaches for this. An approach is to use an application programming interface (API) and to extend the basic framework using an imperative programming language. An example of a tools which is based on this approach is Eclipse GEF. Another approach is to configure the framework using declarative languages that are independent of the underlying framework implementation. We believe this second approach can bring important benefits as this brings focus to specifying what should the tool be like instead of writing a program specifying how the tool achieves this functionality. In this thesis we explore this second approach. We use graph transformation as the basic approach to customize a domain-specific modeling (DSM) tool framework. The contributions of this thesis includes a comparison of di erent approaches for defining, representing and interchanging software modeling languages and models and a tool architecture for an open domain-specific modeling framework that e ciently integrates several model transformation components and visual editors. We also present several specific algorithms and tool components for DSM framework. These include an approach for graph query based on region operators and the star operator and an approach for reconciling models and diagrams after executing model transformation programs. We exemplify our approach with two case studies MICAS and EFCO. In these studies we show how our experimental modeling tool framework has been used to define tool environments for domain-specific languages.

Consistency of UML based designs using ontology reasoners

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software plays an important role in our society and economy. Software development is an intricate process, and it comprises many different tasks: gathering requirements, designing new solutions that fulfill these requirements, as well as implementing these designs using a programming language into a working system. As a consequence, the development of high quality software is a core problem in software engineering. This thesis focuses on the validation of software designs. The issue of the analysis of designs is of great importance, since errors originating from designs may appear in the final system. It is considered economical to rectify the problems as early in the software development process as possible. Practitioners often create and visualize designs using modeling languages, one of the more popular being the Uni ed Modeling Language (UML). The analysis of the designs can be done manually, but in case of large systems, the need of mechanisms that automatically analyze these designs arises. In this thesis, we propose an automatic approach to analyze UML based designs using logic reasoners. This approach firstly proposes the translations of the UML based designs into a language understandable by reasoners in the form of logic facts, and secondly shows how to use the logic reasoners to infer the logical consequences of these logic facts. We have implemented the proposed translations in the form of a tool that can be used with any standard compliant UML modeling tool. Moreover, we authenticate the proposed approach by automatically validating hundreds of UML based designs that consist of thousands of model elements available in an online model repository. The proposed approach is limited in scope, but is fully automatic and does not require any expertise of logic languages from the user. We exemplify the proposed approach with two applications, which include the validation of domain specific languages and the validation of web service interfaces.

Modeling information in software languages

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ohjelmiston kehitystyökalut käyttävät infromaatiota kehittäjän tuottamasta lähdekoodista. Informaatiota hyödynnetään ohjelmistoprojektin eri vaiheissa ja eri tarkoituksissa. Moderneissa ohjelmistoprojekteissa käytetyn informaation määrä voi kasvaa erittäin suureksi. Ohjelmistotyökaluilla on omat informaatiomallinsa ja käyttömekanisminsa. Informaation määrä sekä erilliset työkaluinformaatiomallit tekevät erittäin hankalaksi rakentaa joustavaa työkaluympäristöä, erityisesti ongelma-aluekohtaiseen ohjelmiston kehitysprosessiin. Tässä työssä on analysoitu perusinformaatiometamalleja Unified Modeling language kielestä, Python ohjelmointikielestä ja C++ ohjelmointikielestä. Metainformaation taso on rajoitettu rakenteelliselle tasolle. Ajettavat rakenteet on jätetty pois. ModelBase metamalli on yhdistetty olemassa olevista analysoiduista metamalleista. Tätä metamallia voidaan käyttää tulevaisuudessa ohjelmistotyökalujen kehitykseen.

Perspectives interculturelles et interlinguistiques sur le discours académique – Crosscultural and Crosslinguistic Perspectives on Academic Discourse. Volume 1

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Department of French Studies of the University of Turku (Finland) organized an International Bilingual Conference on Crosscultural and Crosslinguistic Perspectives on Academic Discourse from 2022 May 2005. The event hosted specialists on Academic Discourse from Belgium, Finland, France, Germany, Italy, Norway, Spain, and the USA. This book is the first volume in our series of publications on Academic Discourse (AD hereafter). The following pages are composed of selected papers from the conference and focus on different aspects and analytical frameworks of Academic Discourse. One of the motivations behind organizing the conference was to examine and expand research on AD in different languages. Another one was to question to what extent academic genres are culturebound and language specific or primarily field or domain specific. The research carried out on AD has been mainly concerned with the use of English in different academic settings for a long time now – mainly written contexts – and at the expense of other languages. Alternatively the academic genre conventions of English and English speaking world have served as a basis for comparison with other languages and cultures. We consider this first volume to be a strong contribution to the spreading out of researches based on other languages than English in AD, namely Finnish, French, Italian, Norwegian and Romanian in this book. All the following articles have a strong link with the French language: either French is constitutive of the AD corpora under examination or the article was written in French. The structure of the book suggests and provides evidence that the concept of AD is understood and tackled to varying degrees by different scholars. Our first volume opens up the discussion on what AD is and backs dissemination, overlapping and expansion of current research questions and methodologies. The book is divided into three parts and contains four articles in English and six articles in French. The papers in part one and part two cover what we call the prototypical genre of written AD, i.e. the research article. Part one follows up on issues linked to the 13 Research Article (RA hereafter). Kjersti Fløttum asks wether a typical RA exists and concentrates on authors’ voices in RA (self and other dimensions), whereas Didriksen and Gjesdal’s article focuses on individual variation of the author’s voice in RA. The last article in this section is by Nadine Rentel and deals with evaluation in the writing of RA. Part two concentrates on the teaching and learning of AD within foreign language learning, another more or less canonical genre of AD. Two aspects of writing are covered in the first two articles: foreign students’ representations on rhetorical traditions (Hidden) and a contrastive assessment of written exercices in French and Finnish in Higher Education (Suzanne). The last contribution in this section on AD moves away from traditional written forms and looks at how argumentation is constructed in students’ oral presentations (Dervin and Fauveau). The last part of the book continues the extension by featuring four articles written in French exploring institutional and scientific discourses. Institutional discourses under scrutiny include the European Bologna Process (Galatanu) and Romanian reform texts (Moilanen). As for scientific discourses, the next paper in this section deconstructs an ideological discourse on the didactics of French as a foreign language (Pescheux). Finally, the last paper in part three reflects on varied forms of AD at university (Defays). We hope that this book will add some fuel to continue discussing diverse forms of and approches to AD – in different languages and voices! No need to say that with the current upsurge in academic mobility, reflecting on crosscultural and crosslinguistic AD has just but started.

Data Collection Agent Framework

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ohjelmistoteollisuudessa pitkiä ja vaikeita kehityssyklejä voidaan helpottaa käyttämällä hyväksi ohjelmistokehyksiä (frameworks). Ohjelmistokehykset edustavat kokoelmaa luokkia, jotka tarjoavat yleisiä ratkaisuja tietyn ongelmakentän tarpeisiin vapauttaen ohjelmistokehittäjät keskittymään sovelluskohtaisiin vaatimuksiin. Hyvin suunniteltujen ohjelmistokehyksien käyttö lisää suunnitteluratkaisujen sekä lähdekoodin uudelleenkäytettävyyttä enemmän kuin mikään muu suunnittelulähestymistapa. Tietyn kohdealueen tietämys voidaan tallentaa ohjelmistokehyksiin, joista puolestaan voidaan erikoistaa viimeisteltyjä ohjelmistotuotteita. Tässä diplomityössä kuvataan ohjelmistoagentteihin (software agents) perustuvaa ohjelmistokehyksen suunnittelua toteutusta. Pääpaino työssä on vaatimusmäärittelyä vastaavan suunnitelman sekä toteutuksen kuvaaminen ohjelmistokehykselle, josta voidaan erikoistaa erilaiseen tiedonkeruuseen kykeneviä ohjelmistoja Internet ympäristöön. Työn kokeellisessa osuudessa esitellään myös esimerkkisovellus, joka perustuu työssä kehitettyyn ohjelmistokehykseen.

Searching electronic marketplaces

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The main objective of this thesis was to analyze the usability of registers and indexes of electronic marketplaces. The work is focused on UDDI-based electronic marketplaces, which are standardized by the W3C. UDDI-registers are usable in intranets, extranets and in Internet. Using UDDI-registers Web-services can be searched in many ways, including alphabetical and domain specific searches. Humans and machines can use the features UDDI-registers. The thesis deals the design principles, architectures and specifications of UDDI-registers. In addition, the thesis includes the design and the specifications of an electronic marketplace developed for supporting electronic logistics services.

Comparison of application frameworks designed to support application extensibility

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Reusability has become more popular factor in modern software engineering. This is mainly because object-orientation has brought methods that allow reusing more easily. Today more and more application developer thinks how they can reuse already existing applications in their work. If the developer wants to use existing components outside the current project, he can use design patterns, class libraries or frameworks. These provide solution for specific or general problems that has been already encountered. Application frameworks are collection of classes that provides base for the developer. Application frameworks are mostly implementation phase tools, but can also be used in application design. The main purpose of the frameworks is separate domain specific functionalities from the application specific. Usually the frameworks are divided into two categories: black and white box. Difference between those categories is the way the reuse is done. The application frameworks provide properties that can be examined and compared between different frameworks. These properties are: extensibility, reusability, modularity and scalability. These examine how framework will handle different platforms, changes in framework, increasing demand for resources, etc. Generally application frameworks do have these properties in good level. When comparing general purpose framework and more specific purpose framework, the main difference can be located in reusability of frameworks. It is mainly because the framework designed to specific domain can have constraints from external systems and resources. With general purpose framework these are set by the application developed based on the framework.

Search Interfaces on the Web: Querying and Characterizing

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.

Eliminating Software Failures - A Literature Survey

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Software faults are expensive and cause serious damage, particularly if discovered late or not at all. Some software faults tend to be hidden. One goal of the thesis is to figure out the status quo in the field of software fault elimination since there are no recent surveys of the whole area. Basis for a structural framework is proposed for this unstructured field, paying attention to compatibility and how to find studies. Bug elimination means are surveyed, including bug knowhow, defect prevention and prediction, analysis, testing, and fault tolerance. The most common research issues for each area are identified and discussed, along with issues that do not get enough attention. Recommendations are presented for software developers, researchers, and teachers. Only the main lines of research are figured out. The main emphasis is on technical aspects. The survey was done by performing searches in IEEE, ACM, Elsevier, and Inspect databases. In addition, a systematic search was done for a few well-known related journals from recent time intervals. Some other journals, some conference proceedings and a few books, reports, and Internet articles have been investigated, too. The following problems were found and solutions for them discussed. Quality assurance is testing only is a common misunderstanding, and many checks are done and some methods applied only in the late testing phase. Many types of static review are almost forgotten even though they reveal faults that are hard to be detected by other means. Other forgotten areas are knowledge of bugs, knowing continuously repeated bugs, and lightweight means to increase reliability. Compatibility between studies is not always good, which also makes documents harder to understand. Some means, methods, and problems are considered method- or domain-specific when they are not. The field lacks cross-field research.

A Model-Based Development and Verification Framework for Distributed System-on-Chip Architecture

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The capabilities and thus, design complexity of VLSI-based embedded systems have increased tremendously in recent years, riding the wave of Moore’s law. The time-to-market requirements are also shrinking, imposing challenges to the designers, which in turn, seek to adopt new design methods to increase their productivity. As an answer to these new pressures, modern day systems have moved towards on-chip multiprocessing technologies. New architectures have emerged in on-chip multiprocessing in order to utilize the tremendous advances of fabrication technology. Platform-based design is a possible solution in addressing these challenges. The principle behind the approach is to separate the functionality of an application from the organization and communication architecture of hardware platform at several levels of abstraction. The existing design methodologies pertaining to platform-based design approach don’t provide full automation at every level of the design processes, and sometimes, the co-design of platform-based systems lead to sub-optimal systems. In addition, the design productivity gap in multiprocessor systems remain a key challenge due to existing design methodologies. This thesis addresses the aforementioned challenges and discusses the creation of a development framework for a platform-based system design, in the context of the SegBus platform - a distributed communication architecture. This research aims to provide automated procedures for platform design and application mapping. Structural verification support is also featured thus ensuring correct-by-design platforms. The solution is based on a model-based process. Both the platform and the application are modeled using the Unified Modeling Language. This thesis develops a Domain Specific Language to support platform modeling based on a corresponding UML profile. Object Constraint Language constraints are used to support structurally correct platform construction. An emulator is thus introduced to allow as much as possible accurate performance estimation of the solution, at high abstraction levels. VHDL code is automatically generated, in the form of “snippets” to be employed in the arbiter modules of the platform, as required by the application. The resulting framework is applied in building an actual design solution for an MP3 stereo audio decoder application.

Hands-On: Interlinking Institutional Repositories and Europe PubMed Central

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Guilt, shame, emotion regulation, and social cognition: Understanding their associations with preadolescents' social behavior

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this thesis, two negatively valenced emotions are approached as reflecting children’s self-consciousness, namely guilt and shame. Despite the notable role of emotions in the psychological research, empirical research findings on the links between guilt, shame, and children’s social behavior – and particularly aggression – have been modest, inconsistent, and sometimes contradictory. This thesis contains four studies on the associations of guilt, shame, emotion regulation, and social cognitions with children’s social behavior. The longitudinal material of the thesis was collected as a survey among a relatively large amount of Finnish preadolescents. In Study I, the distinctiveness of guilt and shame in children’s social behavior were investigated. The more specific links of emotions and aggressive behavior were explored in Study II, in which emotion regulation and negative emotionality were treated as the moderators between guilt, shame, and children’s aggressive behavior. The role of emotion management was further evaluated in Study III, in which effortful control and anger were treated as the moderators between domain-specific aggressive cognitions and children’s aggressive behavior. In the light of the results from the Studies II and III, it seems that for children with poor emotion management the effects of emotions and social cognitions on aggressive behavior are straight-forward, whereas effective emotion management allows for reframing the situation. Finally, in Study IV, context effects on children’s anticipated emotions were evaluated, such that children were presented a series of hypothetical vignettes, in which the child was acting as the aggressor. Furthermore, the identity of the witnesses and victim’s reactions were systematically manipulated. Children anticipated the most shame in situations, in which all of the class was witnessing the aggressive act, whereas both guilt and shame were anticipated the most in the situations, in which the victim was reacting with sadness. Girls and low-aggressive children were more sensitive to contextual cues than boys and high-aggressive children. Overall, the results of this thesis suggest that the influences of guilt, shame, and social cognition on preadolescents’ aggressive behavior depend significantly on the nature of individual emotion regulation, as well as situational contexts. Both theoretical and practical implications of this study highlight a need to acknowledge effective emotion management as enabling the justification of one’s own immoral behavior.

A System-Level Simulation Model for a Protocol Processor

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the development of integrated circuit technology continues to follow Moore’s law the complexity of circuits increases exponentially. Traditional hardware description languages such as VHDL and Verilog are no longer powerful enough to cope with this level of complexity and do not provide facilities for hardware/software codesign. Languages such as SystemC are intended to solve these problems by combining the powerful expression of high level programming languages and hardware oriented facilities of hardware description languages. To fully replace older languages in the desing flow of digital systems SystemC should also be synthesizable. The devices required by modern high speed networks often share the same tight constraints for e.g. size, power consumption and price with embedded systems but have also very demanding real time and quality of service requirements that are difficult to satisfy with general purpose processors. Dedicated hardware blocks of an application specific instruction set processor are one way to combine fast processing speed, energy efficiency, flexibility and relatively low time-to-market. Common features can be identified in the network processing domain making it possible to develop specialized but configurable processor architectures. One such architecture is the TACO which is based on transport triggered architecture. The architecture offers a high degree of parallelism and modularity and greatly simplified instruction decoding. For this M.Sc.(Tech) thesis, a simulation environment for the TACO architecture was developed with SystemC 2.2 using an old version written with SystemC 1.0 as a starting point. The environment enables rapid design space exploration by providing facilities for hw/sw codesign and simulation and an extendable library of automatically configured reusable hardware blocks. Other topics that are covered are the differences between SystemC 1.0 and 2.2 from the viewpoint of hardware modeling, and compilation of a SystemC model into synthesizable VHDL with Celoxica Agility SystemC Compiler. A simulation model for a processor for TCP/IP packet validation was designed and tested as a test case for the environment.

Nichesourcing the Uralic Languages for the Benefit of Research and Societies

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Presentation of Jussi-Pekka Hakkarainen, held at the Emtacl15 conference on the 20th of April 2015 in Trondheim, Norway.

Nichesourcing The Uralic Languages For The Benefit Of Linguistic Research And Lingual Societies

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The emerging technologies have recently challenged the libraries to reconsider their role as a mere mediator between the collections, researchers, and wider audiences (Sula, 2013), and libraries, especially the nationwide institutions like national libraries, haven’t always managed to face the challenge (Nygren et al., 2014). In the Digitization Project of Kindred Languages, the National Library of Finland has become a node that connects the partners to interplay and work for shared goals and objectives. In this paper, I will be drawing a picture of the crowdsourcing methods that have been established during the project to support both linguistic research and lingual diversity. The National Library of Finland has been executing the Digitization Project of Kindred Languages since 2012. The project seeks to digitize and publish approximately 1,200 monograph titles and more than 100 newspapers titles in various, and in some cases endangered Uralic languages. Once the digitization has been completed in 2015, the Fenno-Ugrica online collection will consist of 110,000 monograph pages and around 90,000 newspaper pages to which all users will have open access regardless of their place of residence. The majority of the digitized literature was originally published in the 1920s and 1930s in the Soviet Union, and it was the genesis and consolidation period of literary languages. This was the era when many Uralic languages were converted into media of popular education, enlightenment, and dissemination of information pertinent to the developing political agenda of the Soviet state. The ‘deluge’ of popular literature in the 1920s to 1930s suddenly challenged the lexical orthographic norms of the limited ecclesiastical publications from the 1880s onward. Newspapers were now written in orthographies and in word forms that the locals would understand. Textbooks were written to address the separate needs of both adults and children. New concepts were introduced in the language. This was the beginning of a renaissance and period of enlightenment (Rueter, 2013). The linguistically oriented population can also find writings to their delight, especially lexical items specific to a given publication, and orthographically documented specifics of phonetics. The project is financially supported by the Kone Foundation in Helsinki and is part of the Foundation’s Language Programme. One of the key objectives of the Kone Foundation Language Programme is to support a culture of openness and interaction in linguistic research, but also to promote citizen science as a tool for the participation of the language community in research. In addition to sharing this aspiration, our objective within the Language Programme is to make sure that old and new corpora in Uralic languages are made available for the open and interactive use of the academic community as well as the language societies. Wordlists are available in 17 languages, but without tokenization, lemmatization, and so on. This approach was verified with the scholars, and we consider the wordlists as raw data for linguists. Our data is used for creating the morphological analyzers and online dictionaries at the Helsinki and Tromsø Universities, for instance. In order to reach the targets, we will produce not only the digitized materials but also their development tools for supporting linguistic research and citizen science. The Digitization Project of Kindred Languages is thus linked with the research of language technology. The mission is to improve the usage and usability of digitized content. During the project, we have advanced methods that will refine the raw data for further use, especially in the linguistic research. How does the library meet the objectives, which appears to be beyond its traditional playground? The written materials from this period are a gold mine, so how could we retrieve these hidden treasures of languages out of the stack that contains more than 200,000 pages of literature in various Uralic languages? The problem is that the machined-encoded text (OCR) contains often too many mistakes to be used as such in research. The mistakes in OCRed texts must be corrected. For enhancing the OCRed texts, the National Library of Finland developed an open-source code OCR editor that enabled the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary to implement, since these rare and peripheral prints did often include already perished characters, which are sadly neglected by the modern OCR software developers, but belong to the historical context of kindred languages and thus are an essential part of the linguistic heritage (van Hemel, 2014). Our crowdsourcing tool application is essentially an editor of Alto XML format. It consists of a back-end for managing users, permissions, and files, communicating through a REST API with a front-end interface—that is, the actual editor for correcting the OCRed text. The enhanced XML files can be retrieved from the Fenno-Ugrica collection for further purposes. Could the crowd do this work to support the academic research? The challenge in crowdsourcing lies in its nature. The targets in the traditional crowdsourcing have often been split into several microtasks that do not require any special skills from the anonymous people, a faceless crowd. This way of crowdsourcing may produce quantitative results, but from the research’s point of view, there is a danger that the needs of linguists are not necessarily met. Also, the remarkable downside is the lack of shared goal or the social affinity. There is no reward in the traditional methods of crowdsourcing (de Boer et al., 2012). Also, there has been criticism that digital humanities makes the humanities too data-driven and oriented towards quantitative methods, losing the values of critical qualitative methods (Fish, 2012). And on top of that, the downsides of the traditional crowdsourcing become more imminent when you leave the Anglophone world. Our potential crowd is geographically scattered in Russia. This crowd is linguistically heterogeneous, speaking 17 different languages. In many cases languages are close to extinction or longing for language revitalization, and the native speakers do not always have Internet access, so an open call for crowdsourcing would not have produced appeasing results for linguists. Thus, one has to identify carefully the potential niches to complete the needed tasks. When using the help of a crowd in a project that is aiming to support both linguistic research and survival of endangered languages, the approach has to be a different one. In nichesourcing, the tasks are distributed amongst a small crowd of citizen scientists (communities). Although communities provide smaller pools to draw resources, their specific richness in skill is suited for complex tasks with high-quality product expectations found in nichesourcing. Communities have a purpose and identity, and their regular interaction engenders social trust and reputation. These communities can correspond to research more precisely (de Boer et al., 2012). Instead of repetitive and rather trivial tasks, we are trying to utilize the knowledge and skills of citizen scientists to provide qualitative results. In nichesourcing, we hand in such assignments that would precisely fill the gaps in linguistic research. A typical task would be editing and collecting the words in such fields of vocabularies where the researchers do require more information. For instance, there is lack of Hill Mari words and terminology in anatomy. We have digitized the books in medicine, and we could try to track the words related to human organs by assigning the citizen scientists to edit and collect words with the OCR editor. From the nichesourcing’s perspective, it is essential that altruism play a central role when the language communities are involved. In nichesourcing, our goal is to reach a certain level of interplay, where the language communities would benefit from the results. For instance, the corrected words in Ingrian will be added to an online dictionary, which is made freely available for the public, so the society can benefit, too. This objective of interplay can be understood as an aspiration to support the endangered languages and the maintenance of lingual diversity, but also as a servant of ‘two masters’: research and society.

«
1
2
»