28 resultados para Human-Machine Interaction
Resumo:
Image filtering is a highly demanded approach of image enhancement in digital imaging systems design. It is widely used in television and camera design technologies to improve the quality of an output image to avoid various problems such as image blurring problem thatgains importance in design of displays of large sizes and design of digital cameras. This thesis proposes a new image filtering method basedon visual characteristics of human eye such as MTF. In contrast to the traditional filtering methods based on human visual characteristics this thesis takes into account the anisotropy of the human eye vision. The proposed method is based on laboratory measurements of the human eye MTF and takes into account degradation of the image by the latter. This method improves an image in the way it will be degraded by human eye MTF to give perception of the original image quality. This thesis gives a basic understanding of an image filtering approach and the concept of MTF and describes an algorithm to perform an image enhancement based on MTF of human eye. Performed experiments have shown quite good results according to human evaluation. Suggestions to improve the algorithm are also given for the future improvements.
Resumo:
Mottling is one of the key defects in offset-printing. Mottling can be defined as unwanted unevenness of print. In this work, diameter of a mottle spot is defined between 0.5-10.0 mm. There are several types of mottling, but the reason behind the problem is still not fully understood. Several commercial machine vision products for the evaluation of print unevenness have been presented. Two of these methods used in these products have been implemented in this thesis. The one is the cluster method and the other is the band-pass method. The properties of human vision system have been taken into account in the implementation of these two methods. An index produced by the cluster method is a weighted sum of the number of found spots, and an index produced by band-pass method is a weighted sum of coefficients of variations of gray-levels for each spatial band. Both methods produce larger indices for visually poor samples, so they can discern good samples from the poor ones. The difference between the indices for good and poor samples is slightly larger produced by the cluster method. 11 However, without the samples evaluated by human experts, the goodness of these results is still questionable. This comparison will be left to the next phase of the project.
Resumo:
In the paper machine, it is not a desired feature for the boundary layer flows in the fabric and the roll surfaces to travel into the closing nips, creating overpressure. In this thesis, the aerodynamic behavior of the grooved roll and smooth rolls is compared in order to understand the nip flow phenomena, which is the main reason why vacuum and grooved roll constructions are designed. A common method to remove the boundary layer flow from the closing nip is to use the vacuum roll construction. The downside of the use of vacuum rolls is high operational costs due to pressure losses in the vacuum roll shell. The deep grooved roll has the same goal, to create a pressure difference over the paper web and keep the paper attached to the roll or fabric surface in the drying pocket of the paper machine. A literature review revealed that the aerodynamic functionality of the grooved roll is not very well known. In this thesis, the aerodynamic functionality of the grooved roll in interaction with a permeable or impermeable wall is studied by varying the groove properties. Computational fluid dynamics simulations are utilized as the research tool. The simulations have been performed with commercial fluid dynamics software, ANSYS Fluent. Simulation results made with 3- and 2-dimensional fluid dynamics models are compared to laboratory scale measurements. The measurements have been made with a grooved roll simulator designed for the research. The variables in the comparison are the paper or fabric wrap angle, surface velocities, groove geometry and wall permeability. Present-day computational and modeling resources limit grooved roll fluid dynamics simulations in the paper machine scale. Based on the analysis of the aerodynamic functionality of the grooved roll, a grooved roll simulation tool is proposed. The smooth roll simulations show that the closing nip pressure does not depend on the length of boundary layer development. The surface velocity increase affects the pressure distribution in the closing and opening nips. The 3D grooved roll model reveals the aerodynamic functionality of the grooved roll. With the optimal groove size it is possible to avoid closing nip overpressure and keep the web attached to the fabric surface in the area of the wrap angle. The groove flow friction and minor losses play a different role when the wrap angle is changed. The proposed 2D grooved roll simulation tool is able to replicate the grooved aerodynamic behavior with reasonable accuracy. A small wrap angle predicts the pressure distribution correctly with the chosen approach for calculating the groove friction losses. With a large wrap angle, the groove friction loss shows too large pressure gradients, and the way of calculating the air flow friction losses in the groove has to be reconsidered. The aerodynamic functionality of the grooved roll is based on minor and viscous losses in the closing and opening nips as well as in the grooves. The proposed 2D grooved roll model is a simplification in order to reduce computational and modeling efforts. The simulation tool makes it possible to simulate complex paper machine constructions in the paper machine scale. In order to use the grooved roll as a replacement for the vacuum roll, the grooved roll properties have to be considered on the basis of the web handling application.
Resumo:
Previous studies of the local involvement of multinational corporation (MNC) subsidiaries focus on host-country firms and local business partners such as suppliers and customers. The role of host-country universities in the same context of innovation networks is neglected. Furthermore, there are many organizational culture- and knowledge-related differences between universities and companies, and this is likely to pose additional challenges for successful collaboration. Early university-industry (U-I) studies have primarily been limited within a national boundary, being concerned with a single level of culture (i.e., at an organizational level) and one-way knowledge transfer from university to industry. Research on more dynamic knowledge interaction in multinational settings is lacking. This is particularly true in the business context of China. In today’s globalizing and rapidly changing organizations, addressing cultural differences and clashes is an everyday reality, and inter-cultural U-I collaboration is becoming a key asset for gaining global competitiveness. This study deals with Finnish MNC subsidiaries’ research collaboration with Chinese universities. It aims to explore the essence of such U-I collaboration and knowledge interaction, uncovering the deep functioning mechanisms of culture underlying effective collaborative knowledge creation and innovation. The study reviews critically different bodies of literature including knowledge management theories and studies, U-I collaboration and knowledge interaction, and cross-cultural research in terms of organizational knowledge generation and utilization. It adopts a case study strategy with qualitative research methods, and data is collected through in-depth interviews and participant observation. The study presents the following major findings: 1. In the light of a comprehensive analysis of U-I collaboration, an effective matching strategy is proposed, in the assumption that good alignment of knowledge interaction strategies and approaches with their corresponding knowledge type, capability development and research task may greatly enhance the effectiveness of cross-cultural U-I collaboration and knowledge interaction. 2. It is proposed that in the Chinese MNC context more dynamic types of knowledge interaction like knowledge co-creation should be of key concern particularly when dealing simultaneously with multi-disciplinary applied research of human factors and technologies. U-I knowledge interaction, otherwise, pays attention only to the study of one-way technology and knowledge transfer. 3. It is posited that the influence of culture on collaborative knowledge interaction can be studied in a valuable way when knowledge-related variables are simultaneously taken into account. A systematic analysis of the role of knowledge in cross-cultural knowledge interaction could best be approached from multi-aspects of knowledge including not only nature, characteristics and types of knowledge but also the process of knowledge (e.g., intensifications of knowledge interaction). 4. The study demonstrates the significant role of aspects of the host-country culture (e.g., Chinese guanxi) in U-I collaboration and knowledge interaction. This is evident, for instance, in issues related to interpersonal relationships and trust, true interest and the relatedness of the research, mutual commitment and learning, communication intensity and interaction, and awareness of cultural and knowledge-related differences between collaboration partners. Theoretical and practical implications of the findings are suggested and discussed.
Resumo:
Print quality and the printability of paper are very important attributes when modern printing applications are considered. In prints containing images, high print quality is a basic requirement. Tone unevenness and non uniform glossiness of printed products are the most disturbing factors influencing overall print quality. These defects are caused by non ideal interactions of paper, ink and printing devices in high speed printing processes. Since print quality is a perceptive characteristic, the measurement of unevenness according to human vision is a significant problem. In this thesis, the mottling phenomenon is studied. Mottling is a printing defect characterized by a spotty, non uniform appearance in solid printed areas. Print mottle is usually the result of uneven ink lay down or non uniform ink absorption across the paper surface, especially visible in mid tone imagery or areas of uniform color, such as solids and continuous tone screen builds. By using existing knowledge on visual perception and known methods to quantify print tone variation, a new method for print unevenness evaluation is introduced. The method is compared to previous results in the field and is supported by psychometric experiments. Pilot studies are made to estimate the effect of optical paper characteristics prior to printing, on the unevenness of the printed area after printing. Instrumental methods for print unevenness evaluation have been compared and the results of the comparison indicate that the proposed method produces better results in terms of visual evaluation correspondence. The method has been successfully implemented as ail industrial application and is proved to be a reliable substitute to visual expertise.
Resumo:
Inhimilliseen turvallisuuteen kriisinhallinnan kautta – oppimisen mahdollisuuksia ja haasteita Kylmän sodan jälkeen aseelliset konfliktit ovat yleensä alkaneet niin sanotuissa hauraissa valtioissa ja köyhissä maissa, ne ovat olleet valtioiden sisäisiä ja niihin on osallistunut ei-valtiollisia aseellisia ryhmittymiä. Usein ne johtavat konfliktikierteeseen, jossa sota ja vakaammat olot vaihtelevat. Koska kuolleisuus konflikteissa voi jäädä alle kansainvälisen määritelmän (1000 kuollutta vuodessa), kutsun tällaisia konflikteja ”uusiksi konflikteiksi”. Kansainvälinen yhteisö on pyrkinyt kehittämään kriisinhallinnan ja rauhanrakentamisen malleja, jotta pysyvä rauhantila saataisiin aikaiseksi. Inhimillinen turvallisuus perustuu näkemykseen, jossa kunnioitetaan jokaisen yksilön ihmisoikeuksia ja jolla on vaikutusta myös kriisinhallinnan ja rauhanrakentamisen toteuttamiseen. Tutkimukseen kuuluu kaksi empiiristä osaa: Delfoi tulevaisuuspaneeliprosessin sekä kriisinhallintahenkilöstön haastattelut. Viisitoista eri alojen kriisinhallinta-asiantuntijaa osallistui paneeliin, joka toteutettiin vuonna 2008. Paneelin tulosten mukaan tulevat konfliktit usein ovat uusien konfliktien kaltaisia. Lisäksi kriisinhallintahenkilöstöltä edellytetään vuorovaikutus- ja kommunikaatiokykyä ja luonnollisesti myös varsinaisia ammatillisia valmiuksia. Tulevaisuuspaneeli korosti vuorovaikutus- ja kommunikaatiotaitoja erityisesti siviilikriisinhallintahenkilöstön kompetensseissa, mutta samat taidot painottuivat sotilaallisen kriisinhallinnan henkilöstön kompetensseissakin. Kriisinhallinnassa tarvitaan myös selvää työnjakoa eri toimijoiden kesken. Kosovossa työskennelleen henkilöstön haastatteluaineisto koostui yhteensä 27 teemahaastattelusta. Haastateltavista 9 oli ammattiupseeria, 10 reservistä rekrytoitua rauhanturvaajaa ja 8 siviilikriisinhallinnassa työskennellyttä henkilöä. Haastattelut toteutettiin helmi- ja kesäkuun välisenä aikana vuonna 2008. Haastattelutuloksissa korostui vuorovaikutus- ja kommunikaatiotaitojen merkitys, sillä monissa käytännön tilanteissa haastateltavat olivat ratkoneet ongelmia yhteistyössä muun kriisinhallintahenkilöstön tai paikallisten asukkaiden kanssa. Kriisinhallinnassa toteutui oppimisprosesseja, jotka usein olivat luonteeltaan myönteisiä ja informaalisia. Tällaisten onnistumisten vaikutus yksilön minäkuvaan oli myönteinen. Tällaisia prosesseja voidaan kuvata ”itseä koskeviksi oivalluksiksi”. Kriisinhallintatehtävissä oppimisella on erityinen merkitys, jos halutaan kehittää toimintoja inhimillisen turvallisuuden edistämiseksi. Siksi on tärkeää, että kriisinhallintakoulutusta ja kriisinhallintatyössä oppimista kehitetään ottamaan huomioon oppimisen eri tasot ja ulottuvuudet sekä niiden merkitys. Informaaliset oppimisen muodot olisi otettava paremmin huomioon kriisinhallintakoulutusta ja kriisinhallintatehtävissä oppimista kehitettäessä. Palautejärjestelmää olisi kehitettävä eri tavoin. Koko kriisinhallintaoperaation on saatava tarvittaessa myös kriittistä palautetta onnistumisista ja epäonnistumisista. Monet kriisinhallinnassa työskennelleet kaipaavat kunnollista palautetta työrupeamastaan. Liian rutiininomaiseksi koettu palaute ei edistä yksilön oppimista. Spontaanisti monet haastatellut pitivät tärkeänä, että kriisinhallinnassa työskennelleillä olisi mahdollisuus debriefing- tyyppiseen kotiinpaluukeskusteluun. Pelkkä tällainen mahdollisuus ilmeisesti voisi olla monelle myönteinen uutinen, vaikka tilaisuutta ei hyödynnettäisikään. Paluu kriisinhallintatehtävistä Suomeen on monelle haasteellisempaa kuin näissä tehtävissä työskentelyn aloittaminen ulkomailla. Tutkimuksen tulokset kannustavat tutkimaan kriisinhallintaa oppimisen näkökulmasta. On myös olennaista, että kriisinhallinnan palautejärjestelmiä kehitetään mahdollisimman hyvin edistämään sekä yksilöllistä että organisatorista oppimista kriisinhallinnassa. Kriisinhallintaoperaatio on oppimisympäristö. Kriisinhallintahenkilöstön kommunikaatio- ja vuorovaikutustaitojen kehittäminen on olennaista tavoiteltaessa kestävää rauhanprosessia, jossa konfliktialueen asukkaatkin ovat mukana.
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.
Resumo:
Reactive arthritis (ReA) is an inflammatory joint disease, which belongs to the group of Spondyloarthritis (SpA). It may occur after infections with certain gram-negative bacteria such as Salmonella and Yersinia. SpAs are strongly associated with the human leucocyte antigen (HLA)-B27. Despite active research, the mechanism by which HLA-B27 causes disease susceptibility is still unknown. However, HLA-B27 has a tendency to misfold during assembly. It is possible that the misfolding of HLA-B27 could alter signaling pathways and/or molecules involved in inflammatory response in cells. We have earlier discovered that in HLA-B27-positive cells the interaction between the host and causative bacteria is disturbed. Our recent studies indicate that the expression of HLA-B27 may alter certain signaling molecules by disturbing their activation. The aim of this study was to investigate whether the expression of HLA-B27 disturbs the signaling molecules, especially the phosphorylation of transcription factor STAT1. STAT1 is an important mediator of inflammatory responses. Our results show that the phosphorylation of the STAT1 is significantly altered in HLA-B27-expressing U937 monocytic cells compared with control cells. STAT1 tyrosine 701 is more strongly phosphorylated in HLAB27- expressing cells; whereas the phosphorylation of STAT1 serine 727 is prolonged. Phosphorylation of STAT1 was discovered to be dependent on protein kinase PKR. Furthermore, we found out that the expression of posttranscriptional gene regulator HuR was altered in HLA-B27-expressing cells. We also detected that HLA-B27-positive cells secrete more interleukin 6, which is an important mediator of inflammation. These results help to understand how HLA-B27 may confer susceptibility to SpAs.
Resumo:
Human activity recognition in everyday environments is a critical, but challenging task in Ambient Intelligence applications to achieve proper Ambient Assisted Living, and key challenges still remain to be dealt with to realize robust methods. One of the major limitations of the Ambient Intelligence systems today is the lack of semantic models of those activities on the environment, so that the system can recognize the speci c activity being performed by the user(s) and act accordingly. In this context, this thesis addresses the general problem of knowledge representation in Smart Spaces. The main objective is to develop knowledge-based models, equipped with semantics to learn, infer and monitor human behaviours in Smart Spaces. Moreover, it is easy to recognize that some aspects of this problem have a high degree of uncertainty, and therefore, the developed models must be equipped with mechanisms to manage this type of information. A fuzzy ontology and a semantic hybrid system are presented to allow modelling and recognition of a set of complex real-life scenarios where vagueness and uncertainty are inherent to the human nature of the users that perform it. The handling of uncertain, incomplete and vague data (i.e., missing sensor readings and activity execution variations, since human behaviour is non-deterministic) is approached for the rst time through a fuzzy ontology validated on real-time settings within a hybrid data-driven and knowledgebased architecture. The semantics of activities, sub-activities and real-time object interaction are taken into consideration. The proposed framework consists of two main modules: the low-level sub-activity recognizer and the high-level activity recognizer. The rst module detects sub-activities (i.e., actions or basic activities) that take input data directly from a depth sensor (Kinect). The main contribution of this thesis tackles the second component of the hybrid system, which lays on top of the previous one, in a superior level of abstraction, and acquires the input data from the rst module's output, and executes ontological inference to provide users, activities and their in uence in the environment, with semantics. This component is thus knowledge-based, and a fuzzy ontology was designed to model the high-level activities. Since activity recognition requires context-awareness and the ability to discriminate among activities in di erent environments, the semantic framework allows for modelling common-sense knowledge in the form of a rule-based system that supports expressions close to natural language in the form of fuzzy linguistic labels. The framework advantages have been evaluated with a challenging and new public dataset, CAD-120, achieving an accuracy of 90.1% and 91.1% respectively for low and high-level activities. This entails an improvement over both, entirely data-driven approaches, and merely ontology-based approaches. As an added value, for the system to be su ciently simple and exible to be managed by non-expert users, and thus, facilitate the transfer of research to industry, a development framework composed by a programming toolbox, a hybrid crisp and fuzzy architecture, and graphical models to represent and con gure human behaviour in Smart Spaces, were developed in order to provide the framework with more usability in the nal application. As a result, human behaviour recognition can help assisting people with special needs such as in healthcare, independent elderly living, in remote rehabilitation monitoring, industrial process guideline control, and many other cases. This thesis shows use cases in these areas.
Resumo:
The emerging technologies have recently challenged the libraries to reconsider their role as a mere mediator between the collections, researchers, and wider audiences (Sula, 2013), and libraries, especially the nationwide institutions like national libraries, haven’t always managed to face the challenge (Nygren et al., 2014). In the Digitization Project of Kindred Languages, the National Library of Finland has become a node that connects the partners to interplay and work for shared goals and objectives. In this paper, I will be drawing a picture of the crowdsourcing methods that have been established during the project to support both linguistic research and lingual diversity. The National Library of Finland has been executing the Digitization Project of Kindred Languages since 2012. The project seeks to digitize and publish approximately 1,200 monograph titles and more than 100 newspapers titles in various, and in some cases endangered Uralic languages. Once the digitization has been completed in 2015, the Fenno-Ugrica online collection will consist of 110,000 monograph pages and around 90,000 newspaper pages to which all users will have open access regardless of their place of residence. The majority of the digitized literature was originally published in the 1920s and 1930s in the Soviet Union, and it was the genesis and consolidation period of literary languages. This was the era when many Uralic languages were converted into media of popular education, enlightenment, and dissemination of information pertinent to the developing political agenda of the Soviet state. The ‘deluge’ of popular literature in the 1920s to 1930s suddenly challenged the lexical orthographic norms of the limited ecclesiastical publications from the 1880s onward. Newspapers were now written in orthographies and in word forms that the locals would understand. Textbooks were written to address the separate needs of both adults and children. New concepts were introduced in the language. This was the beginning of a renaissance and period of enlightenment (Rueter, 2013). The linguistically oriented population can also find writings to their delight, especially lexical items specific to a given publication, and orthographically documented specifics of phonetics. The project is financially supported by the Kone Foundation in Helsinki and is part of the Foundation’s Language Programme. One of the key objectives of the Kone Foundation Language Programme is to support a culture of openness and interaction in linguistic research, but also to promote citizen science as a tool for the participation of the language community in research. In addition to sharing this aspiration, our objective within the Language Programme is to make sure that old and new corpora in Uralic languages are made available for the open and interactive use of the academic community as well as the language societies. Wordlists are available in 17 languages, but without tokenization, lemmatization, and so on. This approach was verified with the scholars, and we consider the wordlists as raw data for linguists. Our data is used for creating the morphological analyzers and online dictionaries at the Helsinki and Tromsø Universities, for instance. In order to reach the targets, we will produce not only the digitized materials but also their development tools for supporting linguistic research and citizen science. The Digitization Project of Kindred Languages is thus linked with the research of language technology. The mission is to improve the usage and usability of digitized content. During the project, we have advanced methods that will refine the raw data for further use, especially in the linguistic research. How does the library meet the objectives, which appears to be beyond its traditional playground? The written materials from this period are a gold mine, so how could we retrieve these hidden treasures of languages out of the stack that contains more than 200,000 pages of literature in various Uralic languages? The problem is that the machined-encoded text (OCR) contains often too many mistakes to be used as such in research. The mistakes in OCRed texts must be corrected. For enhancing the OCRed texts, the National Library of Finland developed an open-source code OCR editor that enabled the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary to implement, since these rare and peripheral prints did often include already perished characters, which are sadly neglected by the modern OCR software developers, but belong to the historical context of kindred languages and thus are an essential part of the linguistic heritage (van Hemel, 2014). Our crowdsourcing tool application is essentially an editor of Alto XML format. It consists of a back-end for managing users, permissions, and files, communicating through a REST API with a front-end interface—that is, the actual editor for correcting the OCRed text. The enhanced XML files can be retrieved from the Fenno-Ugrica collection for further purposes. Could the crowd do this work to support the academic research? The challenge in crowdsourcing lies in its nature. The targets in the traditional crowdsourcing have often been split into several microtasks that do not require any special skills from the anonymous people, a faceless crowd. This way of crowdsourcing may produce quantitative results, but from the research’s point of view, there is a danger that the needs of linguists are not necessarily met. Also, the remarkable downside is the lack of shared goal or the social affinity. There is no reward in the traditional methods of crowdsourcing (de Boer et al., 2012). Also, there has been criticism that digital humanities makes the humanities too data-driven and oriented towards quantitative methods, losing the values of critical qualitative methods (Fish, 2012). And on top of that, the downsides of the traditional crowdsourcing become more imminent when you leave the Anglophone world. Our potential crowd is geographically scattered in Russia. This crowd is linguistically heterogeneous, speaking 17 different languages. In many cases languages are close to extinction or longing for language revitalization, and the native speakers do not always have Internet access, so an open call for crowdsourcing would not have produced appeasing results for linguists. Thus, one has to identify carefully the potential niches to complete the needed tasks. When using the help of a crowd in a project that is aiming to support both linguistic research and survival of endangered languages, the approach has to be a different one. In nichesourcing, the tasks are distributed amongst a small crowd of citizen scientists (communities). Although communities provide smaller pools to draw resources, their specific richness in skill is suited for complex tasks with high-quality product expectations found in nichesourcing. Communities have a purpose and identity, and their regular interaction engenders social trust and reputation. These communities can correspond to research more precisely (de Boer et al., 2012). Instead of repetitive and rather trivial tasks, we are trying to utilize the knowledge and skills of citizen scientists to provide qualitative results. In nichesourcing, we hand in such assignments that would precisely fill the gaps in linguistic research. A typical task would be editing and collecting the words in such fields of vocabularies where the researchers do require more information. For instance, there is lack of Hill Mari words and terminology in anatomy. We have digitized the books in medicine, and we could try to track the words related to human organs by assigning the citizen scientists to edit and collect words with the OCR editor. From the nichesourcing’s perspective, it is essential that altruism play a central role when the language communities are involved. In nichesourcing, our goal is to reach a certain level of interplay, where the language communities would benefit from the results. For instance, the corrected words in Ingrian will be added to an online dictionary, which is made freely available for the public, so the society can benefit, too. This objective of interplay can be understood as an aspiration to support the endangered languages and the maintenance of lingual diversity, but also as a servant of ‘two masters’: research and society.
Resumo:
The subject of the thesis is automatic sentence compression with machine learning, so that the compressed sentences remain both grammatical and retain their essential meaning. There are multiple possible uses for the compression of natural language sentences. In this thesis the focus is generation of television program subtitles, which often are compressed version of the original script of the program. The main part of the thesis consists of machine learning experiments for automatic sentence compression using different approaches to the problem. The machine learning methods used for this work are linear-chain conditional random fields and support vector machines. Also we take a look which automatic text analysis methods provide useful features for the task. The data used for machine learning is supplied by Lingsoft Inc. and consists of subtitles in both compressed an uncompressed form. The models are compared to a baseline system and comparisons are made both automatically and also using human evaluation, because of the potentially subjective nature of the output. The best result is achieved using a CRF - sequence classification using a rich feature set. All text analysis methods help classification and most useful method is morphological analysis. Tutkielman aihe on suomenkielisten lauseiden automaattinen tiivistäminen koneellisesti, niin että lyhennetyt lauseet säilyttävät olennaisen informaationsa ja pysyvät kieliopillisina. Luonnollisen kielen lauseiden tiivistämiselle on monta käyttötarkoitusta, mutta tässä tutkielmassa aihetta lähestytään television ohjelmien tekstittämisen kautta, johon käytännössä kuuluu alkuperäisen tekstin lyhentäminen televisioruudulle paremmin sopivaksi. Tutkielmassa kokeillaan erilaisia koneoppimismenetelmiä tekstin automaatiseen lyhentämiseen ja tarkastellaan miten hyvin erilaiset luonnollisen kielen analyysimenetelmät tuottavat informaatiota, joka auttaa näitä menetelmiä lyhentämään lauseita. Lisäksi tarkastellaan minkälainen lähestymistapa tuottaa parhaan lopputuloksen. Käytetyt koneoppimismenetelmät ovat tukivektorikone ja lineaarisen sekvenssin mallinen CRF. Koneoppimisen tukena käytetään tekstityksiä niiden eri käsittelyvaiheissa, jotka on saatu Lingsoft OY:ltä. Luotuja malleja vertaillaan Lopulta mallien lopputuloksia evaluoidaan automaattisesti ja koska teksti lopputuksena on jossain määrin subjektiivinen myös ihmisarviointiin perustuen. Vertailukohtana toimii kirjallisuudesta poimittu menetelmä. Tutkielman tuloksena paras lopputulos saadaan aikaan käyttäen CRF sekvenssi-luokittelijaa laajalla piirrejoukolla. Kaikki kokeillut teksin analyysimenetelmät auttavat luokittelussa, joista tärkeimmän panoksen antaa morfologinen analyysi.
Resumo:
Human-Centered Design (HCD) is a well-recognized approach to the design of interactive computing systems that supports everyday and professional lives of people. To that end, the HCD approach put central emphasis on the explicit understanding of users and context of use by involving users throughout the entire design and development process. With mobile computing, the diversity of users as well as the variety in the spatial, temporal, and social settings of the context of use has notably expanded, which affect the effort of interaction designers to understand users and context of use. The emergence of the mobile apps era in 2008 as a result of structural changes in the mobile industry and the profound enhanced capabilities of mobile devices, further intensify the embeddedness of technology in the daily life of people and the challenges that interaction designers face to cost-efficiently understand users and context of use. Supporting interaction designers in this challenge requires understanding of their existing practice, rationality, and work environment. The main objective of this dissertation is to contribute to interaction design theories by generating understanding on the HCD practice of mobile systems in the mobile apps era, as well as to explain the rationality of interaction designers in attending to users and context of use. To achieve that, a literature study is carried out, followed by a mixed-methods research that combines multiple qualitative interview studies and a quantitative questionnaire study. The dissertation contributes new insights regarding the evolving HCD practice at an important time of transition from stationary computing to mobile computing. Firstly, a gap is identified between interaction design as practiced in research and in the industry regarding the involvement of users in context; whereas the utilization of field evaluations, i.e. in real-life environments, has become more common in academic projects, interaction designers in the industry still rely, by large, on lab evaluations. Secondly, the findings indicate on new aspects that can explain this gap and the rationality of interaction designers in the industry in attending to users and context; essentially, the professional-client relationship was found to inhibit the involvement of users, while the mental distance between practitioners and users as well as the perceived innovativeness of the designed system are suggested in explaining the inclination to study users in situ. Thirdly, the research contributes the first explanatory model on the relation between the organizational context and HCD; essentially, innovation-focused organizational strategies greatly affect the cost-effective usage of data on users and context of use. Last, the findings suggest a change in the nature of HCD in the mobile apps era, at least with universal consumer systems; evidently, the central attention on the explicit understanding of users and context of use shifts from an early requirements phase and continual activities during design and development to follow-up activities. That is, the main effort to understand users is by collecting data on their actual usage of the system, either before or after the system is deployed. The findings inform both researchers and practitioners in interaction design. In particular, the dissertation suggest on action research as a useful approach to support interaction designers and further inform theories on interaction design. With regard to the interaction design practice, the dissertation highlights strategies that encourage a more cost-effective user- and context-informed interaction design process. With the continual embeddedness of computing into people’s life, e.g. with wearable devices and connected car systems, the dissertation provides a timely and valuable view on the evolving humancentered design.