984 resultados para Inferência lexical
Resumo:
The present dissertation examined reading development during elementary school years by means of eye movement tracking. Three different but related issues in this field were assessed. First of all, the development of parafoveal processing skills in reading was investigated. Second, it was assessed whether and to what extent sublexical units such as syllables and morphemes are used in processing Finnish words and whether the use of these sublexical units changes as a function of reading proficiency. Finally, the developmental trend in the speed of visual information extraction during reading was examined. With regard to parafoveal processing skills, it was shown that 2nd graders extract letter identity information approx. 5 characters to the right of fixation, 4th graders approx. 7 characters to the right of fixation, and 6th graders and adults approx. 9 characters to the right of fixation. Furthermore, it was shown that all age groups extract more parafoveal information within compound words than across adjectivenoun pairs of similar length. In compounds, parafoveal word information can be extracted in parallel with foveal word information, if the compound in question is of high frequency. With regard to the use of sublexical units in Finnish word processing, it was shown that less proficient 2nd graders use both syllables and morphemes in the course of lexical access. More proficient 2nd graders as well as older readers seem to process words more holistically. Finally, it was shown that 60 ms is enough for 4th graders and adults to extract visual information from both 4-letter and 8-letter words, whereas 2nd graders clearly needed more than 60 ms to extract all information from 8- letter words for processing to proceed smoothly. The present dissertation demonstrates that Finnish 2nd graders develop their reading skills rapidly and are already at an adult level in some aspects of reading. This is not to say that there are no differences between less proficient (e.g., 2nd graders) and more proficient readers (e.g., adults) but in some respects it seems that the visual system used in extracting information from the text is matured by the 2nd grade. Furthermore, the present dissertation demonstrates that the allocation of attention in reading depends much on textual properties such as word frequency and whether words are spatially unified (as in compounds) or not. This flexibility of the attentional system naturally needs to be captured in word processing models. Finally, individual differences within age groups are quite substantial but it seems that by the end of the 2nd grade practically all Finnish children have reached a reasonable level of reading proficiency.
Resumo:
Os delineamentos sistemáticos se destacam pela sua compacidade e abrangência e por permitir testar maior número de espaçamentos possíveis. No entanto, não é utilizado devido ao arranjo sistemático (não casualizado) das plantas e à alta sensibilidade para valores perdidos. O objetivo deste trabalho foi descrever o modelo geoestatístico e métodos associados de inferência no contexto de análise de experimentos não aleatorizados, reportando resultados aplicados para identificar a dependência espacial em um particular experimento em delineamento sistemático tipo leque de Eucalyptus dunnii. Também foram propostas, analisadas e comparadas diferentes alternativas para tratar dados faltantes que pudessem advir de falhas e, ou, mortalidade de plantas. Os dados foram analisados seguindo-se três modelos que diferiram, com co-variáveis, na forma de tratar os dados faltantes. Para cada um destes foi construído um semivariograma, com o ajuste de três modelos de função de correlação, sendo os parâmetros estimados pelo método de máxima verossimilhança e selecionados pelo critério de Akaike. Esses modelos, com e sem o componente espacial, foram comparados pelo teste da razão de verossimilhança. De acordo com os resultados, verificou-se que: (1) as co-variáveis interagiram positivamente com a variável de resposta, evitando que dados coletados sejam desperdiçados; (2) a comparação dos modelos, com e sem o componente espacial, não confirmou a existência de dependência; (3) a incorporação da estrutura de dependência espacial aos modelos observacionais recuperou a capacidade de fazer inferências válidas na ausência de aleatorização, permitindo contornar problemas operacionais e, assim, garantindo que os dados possam ser submetidos a uma análise clássica.
Resumo:
Neste trabalho foi considerado o modelo de Curtis para a relação hipsométrica em clones de Eucalyptus sp. com os parâmetros sujeitos a restrições. Para fazer a inferência dos parâmetros do modelo com restrições, utilizou-se uma abordagem bayesiana com densidade a priori construída empiricamente. As estimativas bayesianas são calculadas com a técnica de simulação de Monte Carlo em Cadeia de Markov (MCMC). O método proposto foi aplicado a diferentes conjuntos de dados reais, dos quais foram selecionados cinco para exemplificar os resultados. Estes foram comparados com os resultados obtidos pelo método de mínimos quadrados, destacando-se a superioridade da abordagem bayesiana proposta.
Resumo:
O objetivo desta pesquisa consistiu na avaliação do ambiente de alojamento, estimando as condições favoráveis ao melhor desempenho de matrizes gestantes. O experimento foi realizado no período compreendido entre 4-1 e 11-3-2005, em propriedade de produção industrial de suínos, localizada no município de Elias Fausto - SP. A pesquisa foi desenvolvida no setor de gestação, com 24 matrizes primíparas, 12 fêmeas alojadas em baias individuais (T1) e 12 em baias coletivas (T2). O trabalho foi dividido em duas etapas, em função da forma de avaliação dos dados: análise bioclimática e da qualidade do ar, e estimativa dos padrões de conforto térmico ambiental. As variáveis bioclimáticas T (ºC), UR (%), Tgn (ºC) e fisiológicas, taxa respiratória (mov min-1) e temperatura retal (ºC) apontam o sistema de confinamento em baias coletivas como o que possibilitou melhor condicionamento térmico natural às matrizes em gestação. O uso da teoria dos conjuntos fuzzy permitiu que se fizesse inferência entre os dados resultantes do trabalho experimental com os estabelecidos pela literatura, por intermédio de base de regras, para a determinação do conforto ambiental aplicado a matrizes na fase de gestação.
Resumo:
Este trabalho apresenta um estudo da influência de diferentes materiais de cobertura no conforto térmico de instalações destinadas à criação de frangos de corte. A pesquisa foi desenvolvida no Câmpus Experimental da UNESP de Dracena - SP. Quatro protótipos em escala real foram construídos, com área de 28 m² cada, cobertos com telha reciclada à base de embalagens longa vida, telha cerâmica, telha cerâmica pintada de branco e telha de fibrocimento. Os dados foram coletados durante o período de inverno de 2007, totalizando 90 dias. Com esses dados, foram calculados os índices de conforto térmico Carga Térmica Radiante (CTR) e a variável ambiental (Ta). Uma análise estatística por inferência e descritiva foi realizada com os valores do índice de conforto térmico e da variável ambiental. Com os resultados obtidos, é possível afirmar que a telha reciclada apresentou índices de conforto térmico semelhantes àqueles encontrados para as telhas cerâmicas. O protótipo coberto com telha de fibrocimento apresentou os maiores índices, e o coberto com telha cerâmica branca, os menores índices de conforto térmico. No entanto para o período de inverno e para os horários avaliados, todas as instalações apresentaram índices de conforto térmico fora da zona de termoneutralidade do frango de corte.
Resumo:
Diante do alto grau de mecanização a que as atividades agrícolas estão sendo submetidas, objetivou-se, com esta pesquisa, desenvolver um modelo fuzzy capaz de avaliar e classificar o nível de insalubridade em diversos ambientes de trabalho. O modelo desenvolvido tem como variáveis de entrada: o índice de bulbo úmido e temperatura de globo (IBUTG, °C), o nível de ruído (dBA), a taxa de metabolismo (W m-2) e o tempo de descanso (%) e, como variável de saída, o índice de bem-estar humano (IBEH). O método de inferência utilizado foi o de Mandani e, na defuzificacão, utilizou-se o método do centro de gravidade. O sistema de regras foi desenvolvido com base nas combinações das variáveis de entrada. Foram definidas 400 regras com pesos iguais a 1, sendo que, na elaboração das regras, um especialista da área foi consultado. Foram utilizados dados de campo visando a testar o sistema desenvolvido, e os resultados mostraram que a modelagem proposta é uma ferramenta promissora na determinação do IBEH, apresentando tempo de descanso ideal variando de 64,2% (motosserra, próximo ao ouvido do operador) até 25% (derriçadora, 20 m de distância do operador), sendo que, diante de um cenário predefinido do ambiente térmico e acústico, foi possível determinar o grau de bem-estar humano e o tempo de descanso ideal para cada equipamento avaliado.
Resumo:
Um sistema de inferência fuzzy foi desenvolvido baseado em dados da literatura para predição do consumo de ração, ganho de peso e conversão alimentar de frangos de corte com idade variando de 1 a 21, dias submetidos a diferentes condições térmicas. O sistema fuzzy foi estruturado com base em três variáveis de entrada: idade das aves (semanas), temperatura (°C) e umidade relativa (%) ambientes, sendo que as variáveis de saída consideradas foram: ganho de peso, consumo de ração e conversão alimentar. A inferência foi realizada por meio do método de Mamdani, que consistiu na elaboração de 45 regras e a defuzzificação por meio do método do Centro de Gravidade. Com base nos resultados, ao se compararem os dados da literatura com os obtidos pelo sistema fuzzy proposto, verificou-se desempenho satisfatório na predição das variáveis respostas, com R² da ordem de 0,995; 0,998 e 0,976, respectivamente. O ganho de peso predito pela lógica fuzzy foi validado com dados experimentais de campo, no qual se obteve R² = 0,975, apresentando grande potencial de uso em sistemas de climatização automatizado.
Resumo:
OBJETIVO: avaliar fatores de risco para candidíase vulvovaginal identificados ao exame e anamnese ginecológica, em amostra de conveniência. MÉTODO: estudo transversal, com amostra de conveniência, envolvendo todas as trabalhadoras (135) de uma indústria de confecção em Criciúma (SC), sintomáticas e assintomáticas, no período de julho a setembro de 2002. Foi utilizada, como técnica de coleta dos dados, entrevista roteirizada investigando-se possíveis fatores de risco. O exame ginecológico detectou a presença ou não de sinais clínicos de vulvovaginites. Realizou-se cultura de secreção vaginal em meio de ágar Sabouraud para isolamento de Candida sp. Os dados foram processados e analisados no programa Epi-Info, versão 6.0. A medida de intensidade de associação usada foi a razão de prevalência. O intervalo de confiança adotado para inferência estatística foi de 95%. A análise multivariada dos dados foi realizada pelo programa SPSS versão 10.0, empregando-se modelo de regressão logística. RESULTADOS: os resultados mostraram que a prevalência da candidíase vulvovaginal foi de 19,3%. A freqüência de vulvovaginite diagnosticada pelo exame clínico foi de 17%, com sensibilidade de 38% e especificidade de 88%. O fator de risco significante para vulvovaginite por Candida nessa população foi a presença de ciclos menstruais regulares e para vulvovaginite clínica foi o uso de hormônios e idade entre 25 e 34 anos. CONCLUSÃO: a prevalência de candidíase vulvovaginal é alta entre mulheres consideradas hígidas e o fator de risco encontrado com significância estatística foi a presença de ciclos menstruais regulares, reforçando a importância de possível relação entre ciclo hormonal e esta infecção. Devido à limitação do presente estudo, esta possível associação, juntamente com outras, devem ser estudadas num futuro desenho de coorte com amostra de tamanho apropriado e medidas de níveis hormonais ao longo do ciclo menstrual.
Resumo:
This study examines the structure of the Russian Reflexive Marker ( ся/-сь) and offers a usage-based model building on Construction Grammar and a probabilistic view of linguistic structure. Traditionally, reflexive verbs are accounted for relative to non-reflexive verbs. These accounts assume that linguistic structures emerge as pairs. Furthermore, these accounts assume directionality where the semantics and structure of a reflexive verb can be derived from the non-reflexive verb. However, this directionality does not necessarily hold diachronically. Additionally, the semantics and the patterns associated with a particular reflexive verb are not always shared with the non-reflexive verb. Thus, a model is proposed that can accommodate the traditional pairs as well as for the possible deviations without postulating different systems. A random sample of 2000 instances marked with the Reflexive Marker was extracted from the Russian National Corpus and the sample used in this study contains 819 unique reflexive verbs. This study moves away from the traditional pair account and introduces the concept of Neighbor Verb. A neighbor verb exists for a reflexive verb if they share the same phonological form excluding the Reflexive Marker. It is claimed here that the Reflexive Marker constitutes a system in Russian and the relation between the reflexive and neighbor verbs constitutes a cross-paradigmatic relation. Furthermore, the relation between the reflexive and the neighbor verb is argued to be of symbolic connectivity rather than directionality. Effectively, the relation holding between particular instantiations can vary. The theoretical basis of the present study builds on this assumption. Several new variables are examined in order to systematically model variability of this symbolic connectivity, specifically the degree and strength of connectivity between items. In usage-based models, the lexicon does not constitute an unstructured list of items. Instead, items are assumed to be interconnected in a network. This interconnectedness is defined as Neighborhood in this study. Additionally, each verb carves its own niche within the Neighborhood and this interconnectedness is modeled through rhyme verbs constituting the degree of connectivity of a particular verb in the lexicon. The second component of the degree of connectivity concerns the status of a particular verb relative to its rhyme verbs. The connectivity within the neighborhood of a particular verb varies and this variability is quantified by using the Levenshtein distance. The second property of the lexical network is the strength of connectivity between items. Frequency of use has been one of the primary variables in functional linguistics used to probe this. In addition, a new variable called Constructional Entropy is introduced in this study building on information theory. It is a quantification of the amount of information carried by a particular reflexive verb in one or more argument constructions. The results of the lexical connectivity indicate that the reflexive verbs have statistically greater neighborhood distances than the neighbor verbs. This distributional property can be used to motivate the traditional observation that the reflexive verbs tend to have idiosyncratic properties. A set of argument constructions, generalizations over usage patterns, are proposed for the reflexive verbs in this study. In addition to the variables associated with the lexical connectivity, a number of variables proposed in the literature are explored and used as predictors in the model. The second part of this study introduces the use of a machine learning algorithm called Random Forests. The performance of the model indicates that it is capable, up to a degree, of disambiguating the proposed argument construction types of the Russian Reflexive Marker. Additionally, a global ranking of the predictors used in the model is offered. Finally, most construction grammars assume that argument construction form a network structure. A new method is proposed that establishes generalization over the argument constructions referred to as Linking Construction. In sum, this study explores the structural properties of the Russian Reflexive Marker and a new model is set forth that can accommodate both the traditional pairs and potential deviations from it in a principled manner.
Resumo:
A dinâmica da população de plantas daninhas pode ser representada por um sistema de equações que relaciona as densidades de sementes produzidas e de plântulas em áreas de cultivo. Os valores dos parâmetros dos modelos podem ser inferidos diretamente de experimentação e análise estatística ou extraídos da literatura. O presente trabalho teve por objetivo estimar os parâmetros do modelo de densidade populacional de plantas daninhas, a partir de um experimento conduzido na área experimental da Embrapa Milho e Sorgo, Sete Lagoas, MG, via os procedimentos de inferências clássica e Bayesiana.
Resumo:
As plantas aquáticas têm papel fundamental no equilíbrio dos ecossistemas, porém seu crescimento desequilibrado pode obstruir canais, represas e reservatórios e afetar múltiplos usos da água. Em relação a plantas aquáticas submersas, a utilização de medidas de controle torna-se mais complexa, em face da dificuldade em mapear e quantificar volumetricamente as áreas colonizadas. Nessas situações, considera-se que o uso de dados hidroacústicos possibilite o mapeamento e a mensuração dessas áreas, auxiliando na elaboração de propostas de manejo sustentáveis desse tipo de vegetação aquática. Assim, o presente trabalho utilizou dados acústicos e a técnica de krigagem para realizar a inferência espacial do biovolume de plantas aquáticas submersas. Os dados foram obtidos em três levantamentos ecobatimétricos realizados em uma área de estudos localizada no rio Paraná, caracterizada por condições favoráveis para proliferação de vegetação aquática submersa e dificuldade de navegação. Para delimitar as áreas caracterizadas pela presença de plantas aquáticas submersas, utilizou-se uma imagem multiespectral de alta resolução espacial World View-2. O mapeamento do biovolume das plantas aquáticas submersas nas áreas de ocorrência do fenômeno foi realizado a partir da inferência do biovolume por krigagem e do fatiamento dos valores inferidos em intervalos de 15%. A partir do mapa gerado, foi possível identificar os locais de maior concentração de macrófitas submersas, com predominância de valores de biovolume entre 15-30% e 30-45%, confirmando a viabilidade da utilização da krigagem na inferência espacial do biovolume, a partir de medidas ecobatimétricas georreferenciadas e com o suporte de imagem de alta resolução espacial.
Resumo:
Can crowdsourcing solutions serve many masters? Can they be beneficial for both, for the layman or native speakers of minority languages on the one hand and serious linguistic research on the other? How did an infrastructure that was designed to support linguistics turn out to be a solution for raising awareness of native languages? Since 2012 the National Library of Finland has been developing the Digitisation Project for Kindred Languages, in which the key objective is to support a culture of openness and interaction in linguistic research, but also to promote crowdsourcing as a tool for participation of the language community in research. In the course of the project, over 1,200 monographs and nearly 111,000 pages of newspapers in Finno-Ugric languages will be digitised and made available in the Fenno-Ugrica digital collection. This material was published in the Soviet Union in the 1920s and 1930s, and users have had only sporadic access to the material. The publication of open-access and searchable materials from this period is a goldmine for researchers. Historians, social scientists and laymen with an interest in specific local publications can now find text materials pertinent to their studies. The linguistically-oriented population can also find writings to delight them: (1) lexical items specific to a given publication, and (2) orthographically-documented specifics of phonetics. In addition to the open access collection, we developed an open source code OCR editor that enables the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary since these rare and peripheral prints often include already archaic characters, which are neglected by modern OCR software developers but belong to the historical context of kindred languages, and are thus an essential part of the linguistic heritage. When modelling the OCR editor, it was essential to consider both the needs of researchers and the capabilities of lay citizens, and to have them participate in the planning and execution of the project from the very beginning. By implementing the feedback iteratively from both groups, it was possible to transform the requested changes as tools for research that not only supported the work of linguistics but also encouraged the citizen scientists to face the challenge and work with the crowdsourcing tools for the benefit of research. This presentation will not only deal with the technical aspects, developments and achievements of the infrastructure but will highlight the way in which user groups, researchers and lay citizens were engaged in a process as an active and communicative group of users and how their contributions were made to mutual benefit.
Resumo:
Novel word learning has been rarely studied in people with aphasia (PWA), although it can provide a relatively pure measure of their learning potential, and thereby contribute to the development of effective aphasia treatment methods. The main aim of the present thesis was to explore the capacity of PWA for associative learning of word–referent pairings and cognitive-linguistic factors related to it. More specifically, the thesis examined learning and long-term maintenance of the learned pairings, the role of lexical-semantic abilities in learning as well as acquisition of phonological versus semantic information in associative novel word learning. Furthermore, the effect of modality on associative novel word learning and the neural underpinnings of successful learning were explored. The learning experiments utilized the Ancient Farming Equipment (AFE) paradigm that employs drawings of unfamiliar referents and their unfamiliar names. Case studies of Finnishand English-speaking people with chronic aphasia (n = 6) were conducted in the investigation. The learning results of PWA were compared to those of healthy control participants, and active production of the novel words and their semantic definitions was used as learning outcome measures. PWA learned novel word–novel referent pairings, but the variation between individuals was very wide, from more modest outcomes (Studies I–II) up to levels on a par with healthy individuals (Studies III–IV). In incidental learning of semantic definitions, none of the PWA reached the performance level of the healthy control participants. Some PWA maintained part of the learning outcomes up to months post-training, and one individual showed full maintenance of the novel words at six months post-training (Study IV). Intact lexical-semantic processing skills promoted learning in PWA (Studies I–II) but poor phonological short-term memory capacities did not rule out novel word learning. In two PWA with successful learning and long-term maintenance of novel word–novel referent pairings, learning relied on orthographic input while auditory input led to significantly inferior learning outcomes (Studies III–IV). In one of these individuals, this previously undetected modalityspecific learning ability was successfully translated into training with familiar but inaccessible everyday words (Study IV). Functional magnetic resonance imaging revealed that this individual had a disconnected dorsal speech processing pathway in the left hemisphere, but a right-hemispheric neural network mediated successful novel word learning via reading. Finally, the results of Study III suggested that the cognitive-linguistic profile may not always predict the optimal learning channel for an individual with aphasia. Small-scale learning probes seem therefore useful in revealing functional learning channels in post-stroke aphasia.
Resumo:
The topic of the present doctoral dissertation is the analysis of the phonological and tonal structures of a previously largely undescribed language, namely Samue. It is a Gur language belonging to the Niger-Congo language phulym, which is spoken in Burkina Faso. The data were collected during the fieldwork period in a Sama village; the data include 1800 lexical items, thousands of elicited sentences and 30 oral texts. The data were first transcribed phonetically and then the phonological and tonal analyses were conducted. The results show that the phonological system of Samue with the phoneme inventory and phonological processes has the same characteristics as other related Gur languages, although some particularities were found, such as the voicing and lenition of stop consonants in medial positions. Tonal analysis revealed three level tones, which have both lexical and grammatical functions. A particularity of the tonal system is the regressive Mid tone spreading in the verb phrase. The theoretical framework used in the study is Optimality theory. Optimality theory is rarely used in the analysis of an entire language system, and thus an objective was to see whether the theory was applicable to this type of work. Within the tonal analysis especially, some language specific constraints had to be created, although the basic Optimality Theory principle is the universal nature of the constraints. These constraints define the well-formedness of the language structures and they are differently ranked in different languages. This study gives new insights about typological phenomena in Gur languages. It is also a fundamental starting point for the Samue language in relation to the establishment of an orthography. From the theoretical point of view, the study proves that Optimality theory is largely applicable in the analysis of an entire sound system.
Resumo:
The emerging technologies have recently challenged the libraries to reconsider their role as a mere mediator between the collections, researchers, and wider audiences (Sula, 2013), and libraries, especially the nationwide institutions like national libraries, haven’t always managed to face the challenge (Nygren et al., 2014). In the Digitization Project of Kindred Languages, the National Library of Finland has become a node that connects the partners to interplay and work for shared goals and objectives. In this paper, I will be drawing a picture of the crowdsourcing methods that have been established during the project to support both linguistic research and lingual diversity. The National Library of Finland has been executing the Digitization Project of Kindred Languages since 2012. The project seeks to digitize and publish approximately 1,200 monograph titles and more than 100 newspapers titles in various, and in some cases endangered Uralic languages. Once the digitization has been completed in 2015, the Fenno-Ugrica online collection will consist of 110,000 monograph pages and around 90,000 newspaper pages to which all users will have open access regardless of their place of residence. The majority of the digitized literature was originally published in the 1920s and 1930s in the Soviet Union, and it was the genesis and consolidation period of literary languages. This was the era when many Uralic languages were converted into media of popular education, enlightenment, and dissemination of information pertinent to the developing political agenda of the Soviet state. The ‘deluge’ of popular literature in the 1920s to 1930s suddenly challenged the lexical orthographic norms of the limited ecclesiastical publications from the 1880s onward. Newspapers were now written in orthographies and in word forms that the locals would understand. Textbooks were written to address the separate needs of both adults and children. New concepts were introduced in the language. This was the beginning of a renaissance and period of enlightenment (Rueter, 2013). The linguistically oriented population can also find writings to their delight, especially lexical items specific to a given publication, and orthographically documented specifics of phonetics. The project is financially supported by the Kone Foundation in Helsinki and is part of the Foundation’s Language Programme. One of the key objectives of the Kone Foundation Language Programme is to support a culture of openness and interaction in linguistic research, but also to promote citizen science as a tool for the participation of the language community in research. In addition to sharing this aspiration, our objective within the Language Programme is to make sure that old and new corpora in Uralic languages are made available for the open and interactive use of the academic community as well as the language societies. Wordlists are available in 17 languages, but without tokenization, lemmatization, and so on. This approach was verified with the scholars, and we consider the wordlists as raw data for linguists. Our data is used for creating the morphological analyzers and online dictionaries at the Helsinki and Tromsø Universities, for instance. In order to reach the targets, we will produce not only the digitized materials but also their development tools for supporting linguistic research and citizen science. The Digitization Project of Kindred Languages is thus linked with the research of language technology. The mission is to improve the usage and usability of digitized content. During the project, we have advanced methods that will refine the raw data for further use, especially in the linguistic research. How does the library meet the objectives, which appears to be beyond its traditional playground? The written materials from this period are a gold mine, so how could we retrieve these hidden treasures of languages out of the stack that contains more than 200,000 pages of literature in various Uralic languages? The problem is that the machined-encoded text (OCR) contains often too many mistakes to be used as such in research. The mistakes in OCRed texts must be corrected. For enhancing the OCRed texts, the National Library of Finland developed an open-source code OCR editor that enabled the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary to implement, since these rare and peripheral prints did often include already perished characters, which are sadly neglected by the modern OCR software developers, but belong to the historical context of kindred languages and thus are an essential part of the linguistic heritage (van Hemel, 2014). Our crowdsourcing tool application is essentially an editor of Alto XML format. It consists of a back-end for managing users, permissions, and files, communicating through a REST API with a front-end interface—that is, the actual editor for correcting the OCRed text. The enhanced XML files can be retrieved from the Fenno-Ugrica collection for further purposes. Could the crowd do this work to support the academic research? The challenge in crowdsourcing lies in its nature. The targets in the traditional crowdsourcing have often been split into several microtasks that do not require any special skills from the anonymous people, a faceless crowd. This way of crowdsourcing may produce quantitative results, but from the research’s point of view, there is a danger that the needs of linguists are not necessarily met. Also, the remarkable downside is the lack of shared goal or the social affinity. There is no reward in the traditional methods of crowdsourcing (de Boer et al., 2012). Also, there has been criticism that digital humanities makes the humanities too data-driven and oriented towards quantitative methods, losing the values of critical qualitative methods (Fish, 2012). And on top of that, the downsides of the traditional crowdsourcing become more imminent when you leave the Anglophone world. Our potential crowd is geographically scattered in Russia. This crowd is linguistically heterogeneous, speaking 17 different languages. In many cases languages are close to extinction or longing for language revitalization, and the native speakers do not always have Internet access, so an open call for crowdsourcing would not have produced appeasing results for linguists. Thus, one has to identify carefully the potential niches to complete the needed tasks. When using the help of a crowd in a project that is aiming to support both linguistic research and survival of endangered languages, the approach has to be a different one. In nichesourcing, the tasks are distributed amongst a small crowd of citizen scientists (communities). Although communities provide smaller pools to draw resources, their specific richness in skill is suited for complex tasks with high-quality product expectations found in nichesourcing. Communities have a purpose and identity, and their regular interaction engenders social trust and reputation. These communities can correspond to research more precisely (de Boer et al., 2012). Instead of repetitive and rather trivial tasks, we are trying to utilize the knowledge and skills of citizen scientists to provide qualitative results. In nichesourcing, we hand in such assignments that would precisely fill the gaps in linguistic research. A typical task would be editing and collecting the words in such fields of vocabularies where the researchers do require more information. For instance, there is lack of Hill Mari words and terminology in anatomy. We have digitized the books in medicine, and we could try to track the words related to human organs by assigning the citizen scientists to edit and collect words with the OCR editor. From the nichesourcing’s perspective, it is essential that altruism play a central role when the language communities are involved. In nichesourcing, our goal is to reach a certain level of interplay, where the language communities would benefit from the results. For instance, the corrected words in Ingrian will be added to an online dictionary, which is made freely available for the public, so the society can benefit, too. This objective of interplay can be understood as an aspiration to support the endangered languages and the maintenance of lingual diversity, but also as a servant of ‘two masters’: research and society.