20 resultados para Domain-specific language
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.
Resumo:
The research assesses the skills of upper comprehensive school pupils in history. The focus is on locating personal motives, assessing wider reasons hidden in historical sources and evaluating source reliability. The research also questions how a wide use of multiple sources affects pupils’ holistic understanding of historical phenomena. The participants were a multicultural group of pupils. The origins of their cultures can be traced to the Balkan, the Middle East, Asia and Europe. The number of native Finnish speakers and pupils speaking Finnish as their second language was almost equal. The multicultural composition provides opportunities to assess how culturally responsive learning history from sources is. The intercultural approach to learning in a multicultural setting emphasizes equality as a precondition for learning. In order to set assignments at least to some extent match with all participants only those answers were taken into account which were produced by pupils who had studied history for a similar period of time in the Finnish comprehensive school system. Due to the small number of participants (41), the study avoids wide generalizations. Nevertheless, possible cultural blueprints in pupils’ way of thinking are noted. The first test examined the skills of pupils to find motives for emigration. The results showed that for 7th graders finding reasons is not a problematic task. However, the number of reasons noticed and justifications varied. In addition, the way the pupils explained their choices was a distinguishing factor. Some pupils interpreted source material making use of previous knowledge on the issue, while other pupils based their analysis solely on the text handed and did not try to add their own knowledge. Answers were divided into three categories: historical, explanatory and stating. Historical answers combined smoothly previously learned historical knowledge to one’s own source analysis; explanatory answers often ignored a wider frame, although they were effective when explaining e.g. historical concepts. The stating answers only noticed motives from the sources and made no attempts to explain them historically. Was the first test culturally responsive? All pupils representing different cultures tackled the first source exam successfully, but there were some signs of how historical concepts are understood in a slightly different way if the pupil’s personal history has no linkage to the concepts under scrutiny. The second test focused on the history of Native Americans. The test first required pupils to recognize whether short source extracts (5) were written by Indians or Caucasians. Based on what they had already learned from North American history, the pupils did not find it hard to distinguish between the sources. The analysis of multiphase causes and consequences of the disputes between Native Americans and white Americans caused dispersion among pupils. Using two historical sources and combining historical knowledge from both of them simultaneously was cumbersome for many. The explanations of consequences can be divided into two groups: the ones emphasizing short term consequences and those placing emphasis on long term consequences. The short term approach was mainly followed by boys in every group. The girls mainly paid attention to long term consequences. The result suggests that historical knowledge in sources is at least to some extent read through role and gender lenses. The third test required pupils to explain in their own words how the three sources given differed in their account of living conditions in Nazi Germany, which turned out to be demanding for many pupils. The pupils’ stronghold was rather the assessment of source reliability and accounts why the sources approached the same events differently. All participants wrote critical and justified comments on reliability and aspects that might have affected the content of the sources. The pupils felt that the main reasons that affected source reliability were the authors’ ethnic background, nationality and profession. The assessment showed that pupils were well aware that position in a historical situation has an impact on historical accounts, but in certain cases the victim’s account was seen as a historical truth. The account of events by a historian was chosen most often as the most reliable source, but it was often justified leniently with an indication to professionalism rather than with clear ideas of how historians conduct accounts based on sources. In brief, the last source test demonstrates that pupils have a strong idea that the ethnicity or nationalism determines how people explained events of the past. It is also an implication that pupils understand how historical knowledge is interpretative. The results also imply that history can be analyzed from a neutral perspective. One’s own membership in an ethnical or religious group does not automatically mean that a person’s cultural identity excludes historical explanations if something in them contradicts with his or her identity. The second method of extracting knowledge of pupils’ historical thinking was an essay analysis. The analysis shows that an analytical account of complicated political issues, which often include a great number of complicated political concepts, leads more likely to an inconsistent structure in the written work of pupils. The material also demonstrates that pupils have a strong tendency to take a critical stance when assessing history. Historical empathy in particular is shown if history somehow has a linkage to young people, children or minorities. Some topics can also awake strong feelings, especially among pupils with emigrant background, if there is a linkage between one’s own personal history and that of the school; and occasionally a student’s historical experience or thoughts replaced school history. Using sources during history lessons at school seems to have many advantages. It enhances the reasoning skills of pupils and their skills to assess the nature of historical knowledge. Thus one of the main aims and a great benefit of source work is to encourage pupils to express their own ideas and opinions. To conclude, when assessing the skills of adolescents in history - their work with sources, comments on history, historical knowledge and finally their historical thinking - one should be cautious and avoid cut off score evaluations. One purpose of pursuing history with sources is to encourage pupils to think independently, which is a useful tool for further identity construction. The idea that pupils have the right to conduct their own interpretations of history can be partially understood as part of a wider learning process, justification to study history comes from extrinsic reasons. The intrinsic reason is history itself; in order to understand history one should have a basic understanding of history as a specific domain of knowledge. Using sources does not mean that knowing history is of secondary importance. Only a balance between knowing the contextual history, understanding basic key concepts and working with sources is a solid base to improve pupils’ historical understanding.
Resumo:
International e-commerce is still rather new concept and therefore lacks comprehensive research. Different nature of markets and companies has challenged the traditional theories as well as redefined traditional operations. Prior research has mainly concentrated on studying the specific topics as barriers and the choice of international strategy. For this reason, there is a lack of research that comprehensively analyzes the operations of international e-commerce companies. The aim of this study was to increase knowledge on operations of Finnish e-commerce companies in Russia. In order to receive comprehensive knowledge of the operations, research analyzed the internationalization process, the effects of market specific factors to e-commerce and the implementation of various value chain activities of e-commerce. Research focused on examining how companies have seen the peculiarities of Russian markets and how to respond to them. The empiric part of the study was conducted as a qualitative research by interviewing five company representatives and three specialists of international e-commerce and Russian business.The results of this research revealed that having e-commerce in Russia is challenging and requires long term, strategy-based work. E-commerce is assumed to be inherently global business model, but in the case of Russia, numerous e-commerce activities require localization. The most crucial activity to localize is a content and language of content. Even though e-commerce market in Russia has a lot of peculiarities, operating via marketspace decreases the level of bureaucracy and market risk. Despite the challenges, developing e- commerce market in Russia offers a huge potential for companies, whose international strategy needs Russian operation to achieve company goals.
Resumo:
Traditionally metacognition has been theorised, methodologically studied and empirically tested from the standpoint mainly of individuals and their learning contexts. In this dissertation the emergence of metacognition is analysed more broadly. The aim of the dissertation was to explore socially shared metacognitive regulation (SSMR) as part of collaborative learning processes taking place in student dyads and small learning groups. The specific aims were to extend the concept of individual metacognition to SSMR, to develop methods to capture and analyse SSMR and to validate the usefulness of the concept of SSMR in two different learning contexts; in face-to-face student dyads solving mathematical word problems and also in small groups taking part in inquiry-based science learning in an asynchronous computer-supported collaborative learning (CSCL) environment. This dissertation is comprised of four studies. In Study I, the main aim was to explore if and how metacognition emerges during problem solving in student dyads and then to develop a method for analysing the social level of awareness, monitoring, and regulatory processes emerging during the problem solving. Two dyads comprised of 10-year-old students who were high-achieving especially in mathematical word problem solving and reading comprehension were involved in the study. An in-depth case analysis was conducted. Data consisted of over 16 (30–45 minutes) videotaped and transcribed face-to-face sessions. The dyads solved altogether 151 mathematical word problems of different difficulty levels in a game-format learning environment. The interaction flowchart was used in the analysis to uncover socially shared metacognition. Interviews (also stimulated recall interviews) were conducted in order to obtain further information about socially shared metacognition. The findings showed the emergence of metacognition in a collaborative learning context in a way that cannot solely be explained by individual conception. The concept of socially-shared metacognition (SSMR) was proposed. The results highlighted the emergence of socially shared metacognition specifically in problems where dyads encountered challenges. Small verbal and nonverbal signals between students also triggered the emergence of socially shared metacognition. Additionally, one dyad implemented a system whereby they shared metacognitive regulation based on their strengths in learning. Overall, the findings suggested that in order to discover patterns of socially shared metacognition, it is important to investigate metacognition over time. However, it was concluded that more research on socially shared metacognition, from larger data sets, is needed. These findings formed the basis of the second study. In Study II, the specific aim was to investigate whether socially shared metacognition can be reliably identified from a large dataset of collaborative face-to-face mathematical word problem solving sessions by student dyads. We specifically examined different difficulty levels of tasks as well as the function and focus of socially shared metacognition. Furthermore, the presence of observable metacognitive experiences at the beginning of socially shared metacognition was explored. Four dyads participated in the study. Each dyad was comprised of high-achieving 10-year-old students, ranked in the top 11% of their fourth grade peers (n=393). Dyads were from the same data set as in Study I. The dyads worked face-to-face in a computer-supported, game-format learning environment. Problem-solving processes for 251 tasks at three difficulty levels taking place during 56 (30–45 minutes) lessons were video-taped and analysed. Baseline data for this study were 14 675 turns of transcribed verbal and nonverbal behaviours observed in four study dyads. The micro-level analysis illustrated how participants moved between different channels of communication (individual and interpersonal). The unit of analysis was a set of turns, referred to as an ‘episode’. The results indicated that socially shared metacognition and its function and focus, as well as the appearance of metacognitive experiences can be defined in a reliable way from a larger data set by independent coders. A comparison of the different difficulty levels of the problems suggested that in order to trigger socially shared metacognition in small groups, the problems should be more difficult, as opposed to moderately difficult or easy. Although socially shared metacognition was found in collaborative face-to-face problem solving among high-achieving student dyads, more research is needed in different contexts. This consideration created the basis of the research on socially shared metacognition in Studies III and IV. In Study III, the aim was to expand the research on SSMR from face-to-face mathematical problem solving in student dyads to inquiry-based science learning among small groups in an asynchronous computer-supported collaborative learning (CSCL) environment. The specific aims were to investigate SSMR’s evolvement and functions in a CSCL environment and to explore how SSMR emerges at different phases of the inquiry process. Finally, individual student participation in SSMR during the process was studied. An in-depth explanatory case study of one small group of four girls aged 12 years was carried out. The girls attended a class that has an entrance examination and conducts a language-enriched curriculum. The small group solved complex science problems in an asynchronous CSCL environment, participating in research-like processes of inquiry during 22 lessons (á 45–minute). Students’ network discussion were recorded in written notes (N=640) which were used as study data. A set of notes, referred to here as a ‘thread’, was used as the unit of analysis. The inter-coder agreement was regarded as substantial. The results indicated that SSMR emerges in a small group’s asynchronous CSCL inquiry process in the science domain. Hence, the results of Study III were in line with the previous Study I and Study II and revealed that metacognition cannot be reduced to the individual level alone. The findings also confirm that SSMR should be examined as a process, since SSMR can evolve during different phases and that different SSMR threads overlapped and intertwined. Although the classification of SSMR’s functions was applicable in the context of CSCL in a small group, the dominant function was different in the asynchronous CSCL inquiry in the small group in a science activity than in mathematical word problem solving among student dyads (Study II). Further, the use of different analytical methods provided complementary findings about students’ participation in SSMR. The findings suggest that it is not enough to code just a single written note or simply to examine who has the largest number of notes in the SSMR thread but also to examine the connections between the notes. As the findings of the present study are based on an in-depth analysis of a single small group, further cases were examined in Study IV, as well as looking at the SSMR’s focus, which was also studied in a face-to-face context. In Study IV, the general aim was to investigate the emergence of SSMR with a larger data set from an asynchronous CSCL inquiry process in small student groups carrying out science activities. The specific aims were to study the emergence of SSMR in the different phases of the process, students’ participation in SSMR, and the relation of SSMR’s focus to the quality of outcomes, which was not explored in previous studies. The participants were 12-year-old students from the same class as in Study III. Five small groups consisting of four students and one of five students (N=25) were involved in the study. The small groups solved ill-defined science problems in an asynchronous CSCL environment, participating in research-like processes of inquiry over a total period of 22 hours. Written notes (N=4088) detailed the network discussions of the small groups and these constituted the study data. With these notes, SSMR threads were explored. As in Study III, the thread was used as the unit of analysis. In total, 332 notes were classified as forming 41 SSMR threads. Inter-coder agreement was assessed by three coders in the different phases of the analysis and found to be reliable. Multiple methods of analysis were used. Results showed that SSMR emerged in all the asynchronous CSCL inquiry processes in the small groups. However, the findings did not reveal any significantly changing trend in the emergence of SSMR during the process. As a main trend, the number of notes included in SSMR threads differed significantly in different phases of the process and small groups differed from each other. Although student participation was seen as highly dispersed between the students, there were differences between students and small groups. Furthermore, the findings indicated that the amount of SSMR during the process or participation structure did not explain the differences in the quality of outcomes for the groups. Rather, when SSMRs were focused on understanding and procedural matters, it was associated with achieving high quality learning outcomes. In turn, when SSMRs were focused on incidental and procedural matters, it was associated with low level learning outcomes. Hence, the findings imply that the focus of any emerging SSMR is crucial to the quality of the learning outcomes. Moreover, the findings encourage the use of multiple research methods for studying SSMR. In total, the four studies convincingly indicate that a phenomenon of socially shared metacognitive regulation also exists. This means that it was possible to define the concept of SSMR theoretically, to investigate it methodologically and to validate it empirically in two different learning contexts across dyads and small groups. In-depth micro-level case analysis in Studies I and III showed the possibility to capture and analyse in detail SSMR during the collaborative process, while in Studies II and IV, the analysis validated the emergence of SSMR in larger data sets. Hence, validation was tested both between two environments and within the same environments with further cases. As a part of this dissertation, SSMR’s detailed functions and foci were revealed. Moreover, the findings showed the important role of observable metacognitive experiences as the starting point of SSMRs. It was apparent that problems dealt with by the groups should be rather difficult if SSMR is to be made clearly visible. Further, individual students’ participation was found to differ between students and groups. The multiple research methods employed revealed supplementary findings regarding SSMR. Finally, when SSMR was focused on understanding and procedural matters, this was seen to lead to higher quality learning outcomes. Socially shared metacognition regulation should therefore be taken into consideration in students’ collaborative learning at school similarly to how an individual’s metacognition is taken into account in individual learning.