29 resultados para computer vision,machine learning,centernet,volleyball,sports
Resumo:
Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
In this thesis, the suitability of different trackers for finger tracking in high-speed videos was studied. Tracked finger trajectories from the videos were post-processed and analysed using various filtering and smoothing methods. Position derivatives of the trajectories, speed and acceleration were extracted for the purposes of hand motion analysis. Overall, two methods, Kernelized Correlation Filters and Spatio-Temporal Context Learning tracking, performed better than the others in the tests. Both achieved high accuracy for the selected high-speed videos and also allowed real-time processing, being able to process over 500 frames per second. In addition, the results showed that different filtering methods can be applied to produce more appropriate velocity and acceleration curves calculated from the tracking data. Local Regression filtering and Unscented Kalman Smoother gave the best results in the tests. Furthermore, the results show that tracking and filtering methods are suitable for high-speed hand-tracking and trajectory-data post-processing.
Resumo:
Object detection is a fundamental task of computer vision that is utilized as a core part in a number of industrial and scientific applications, for example, in robotics, where objects need to be correctly detected and localized prior to being grasped and manipulated. Existing object detectors vary in (i) the amount of supervision they need for training, (ii) the type of a learning method adopted (generative or discriminative) and (iii) the amount of spatial information used in the object model (model-free, using no spatial information in the object model, or model-based, with the explicit spatial model of an object). Although some existing methods report good performance in the detection of certain objects, the results tend to be application specific and no universal method has been found that clearly outperforms all others in all areas. This work proposes a novel generative part-based object detector. The generative learning procedure of the developed method allows learning from positive examples only. The detector is based on finding semantically meaningful parts of the object (i.e. a part detector) that can provide additional information to object location, for example, pose. The object class model, i.e. the appearance of the object parts and their spatial variance, constellation, is explicitly modelled in a fully probabilistic manner. The appearance is based on bio-inspired complex-valued Gabor features that are transformed to part probabilities by an unsupervised Gaussian Mixture Model (GMM). The proposed novel randomized GMM enables learning from only a few training examples. The probabilistic spatial model of the part configurations is constructed with a mixture of 2D Gaussians. The appearance of the parts of the object is learned in an object canonical space that removes geometric variations from the part appearance model. Robustness to pose variations is achieved by object pose quantization, which is more efficient than previously used scale and orientation shifts in the Gabor feature space. Performance of the resulting generative object detector is characterized by high recall with low precision, i.e. the generative detector produces large number of false positive detections. Thus a discriminative classifier is used to prune false positive candidate detections produced by the generative detector improving its precision while keeping high recall. Using only a small number of positive examples, the developed object detector performs comparably to state-of-the-art discriminative methods.
Resumo:
Traditionally metacognition has been theorised, methodologically studied and empirically tested from the standpoint mainly of individuals and their learning contexts. In this dissertation the emergence of metacognition is analysed more broadly. The aim of the dissertation was to explore socially shared metacognitive regulation (SSMR) as part of collaborative learning processes taking place in student dyads and small learning groups. The specific aims were to extend the concept of individual metacognition to SSMR, to develop methods to capture and analyse SSMR and to validate the usefulness of the concept of SSMR in two different learning contexts; in face-to-face student dyads solving mathematical word problems and also in small groups taking part in inquiry-based science learning in an asynchronous computer-supported collaborative learning (CSCL) environment. This dissertation is comprised of four studies. In Study I, the main aim was to explore if and how metacognition emerges during problem solving in student dyads and then to develop a method for analysing the social level of awareness, monitoring, and regulatory processes emerging during the problem solving. Two dyads comprised of 10-year-old students who were high-achieving especially in mathematical word problem solving and reading comprehension were involved in the study. An in-depth case analysis was conducted. Data consisted of over 16 (30–45 minutes) videotaped and transcribed face-to-face sessions. The dyads solved altogether 151 mathematical word problems of different difficulty levels in a game-format learning environment. The interaction flowchart was used in the analysis to uncover socially shared metacognition. Interviews (also stimulated recall interviews) were conducted in order to obtain further information about socially shared metacognition. The findings showed the emergence of metacognition in a collaborative learning context in a way that cannot solely be explained by individual conception. The concept of socially-shared metacognition (SSMR) was proposed. The results highlighted the emergence of socially shared metacognition specifically in problems where dyads encountered challenges. Small verbal and nonverbal signals between students also triggered the emergence of socially shared metacognition. Additionally, one dyad implemented a system whereby they shared metacognitive regulation based on their strengths in learning. Overall, the findings suggested that in order to discover patterns of socially shared metacognition, it is important to investigate metacognition over time. However, it was concluded that more research on socially shared metacognition, from larger data sets, is needed. These findings formed the basis of the second study. In Study II, the specific aim was to investigate whether socially shared metacognition can be reliably identified from a large dataset of collaborative face-to-face mathematical word problem solving sessions by student dyads. We specifically examined different difficulty levels of tasks as well as the function and focus of socially shared metacognition. Furthermore, the presence of observable metacognitive experiences at the beginning of socially shared metacognition was explored. Four dyads participated in the study. Each dyad was comprised of high-achieving 10-year-old students, ranked in the top 11% of their fourth grade peers (n=393). Dyads were from the same data set as in Study I. The dyads worked face-to-face in a computer-supported, game-format learning environment. Problem-solving processes for 251 tasks at three difficulty levels taking place during 56 (30–45 minutes) lessons were video-taped and analysed. Baseline data for this study were 14 675 turns of transcribed verbal and nonverbal behaviours observed in four study dyads. The micro-level analysis illustrated how participants moved between different channels of communication (individual and interpersonal). The unit of analysis was a set of turns, referred to as an ‘episode’. The results indicated that socially shared metacognition and its function and focus, as well as the appearance of metacognitive experiences can be defined in a reliable way from a larger data set by independent coders. A comparison of the different difficulty levels of the problems suggested that in order to trigger socially shared metacognition in small groups, the problems should be more difficult, as opposed to moderately difficult or easy. Although socially shared metacognition was found in collaborative face-to-face problem solving among high-achieving student dyads, more research is needed in different contexts. This consideration created the basis of the research on socially shared metacognition in Studies III and IV. In Study III, the aim was to expand the research on SSMR from face-to-face mathematical problem solving in student dyads to inquiry-based science learning among small groups in an asynchronous computer-supported collaborative learning (CSCL) environment. The specific aims were to investigate SSMR’s evolvement and functions in a CSCL environment and to explore how SSMR emerges at different phases of the inquiry process. Finally, individual student participation in SSMR during the process was studied. An in-depth explanatory case study of one small group of four girls aged 12 years was carried out. The girls attended a class that has an entrance examination and conducts a language-enriched curriculum. The small group solved complex science problems in an asynchronous CSCL environment, participating in research-like processes of inquiry during 22 lessons (á 45–minute). Students’ network discussion were recorded in written notes (N=640) which were used as study data. A set of notes, referred to here as a ‘thread’, was used as the unit of analysis. The inter-coder agreement was regarded as substantial. The results indicated that SSMR emerges in a small group’s asynchronous CSCL inquiry process in the science domain. Hence, the results of Study III were in line with the previous Study I and Study II and revealed that metacognition cannot be reduced to the individual level alone. The findings also confirm that SSMR should be examined as a process, since SSMR can evolve during different phases and that different SSMR threads overlapped and intertwined. Although the classification of SSMR’s functions was applicable in the context of CSCL in a small group, the dominant function was different in the asynchronous CSCL inquiry in the small group in a science activity than in mathematical word problem solving among student dyads (Study II). Further, the use of different analytical methods provided complementary findings about students’ participation in SSMR. The findings suggest that it is not enough to code just a single written note or simply to examine who has the largest number of notes in the SSMR thread but also to examine the connections between the notes. As the findings of the present study are based on an in-depth analysis of a single small group, further cases were examined in Study IV, as well as looking at the SSMR’s focus, which was also studied in a face-to-face context. In Study IV, the general aim was to investigate the emergence of SSMR with a larger data set from an asynchronous CSCL inquiry process in small student groups carrying out science activities. The specific aims were to study the emergence of SSMR in the different phases of the process, students’ participation in SSMR, and the relation of SSMR’s focus to the quality of outcomes, which was not explored in previous studies. The participants were 12-year-old students from the same class as in Study III. Five small groups consisting of four students and one of five students (N=25) were involved in the study. The small groups solved ill-defined science problems in an asynchronous CSCL environment, participating in research-like processes of inquiry over a total period of 22 hours. Written notes (N=4088) detailed the network discussions of the small groups and these constituted the study data. With these notes, SSMR threads were explored. As in Study III, the thread was used as the unit of analysis. In total, 332 notes were classified as forming 41 SSMR threads. Inter-coder agreement was assessed by three coders in the different phases of the analysis and found to be reliable. Multiple methods of analysis were used. Results showed that SSMR emerged in all the asynchronous CSCL inquiry processes in the small groups. However, the findings did not reveal any significantly changing trend in the emergence of SSMR during the process. As a main trend, the number of notes included in SSMR threads differed significantly in different phases of the process and small groups differed from each other. Although student participation was seen as highly dispersed between the students, there were differences between students and small groups. Furthermore, the findings indicated that the amount of SSMR during the process or participation structure did not explain the differences in the quality of outcomes for the groups. Rather, when SSMRs were focused on understanding and procedural matters, it was associated with achieving high quality learning outcomes. In turn, when SSMRs were focused on incidental and procedural matters, it was associated with low level learning outcomes. Hence, the findings imply that the focus of any emerging SSMR is crucial to the quality of the learning outcomes. Moreover, the findings encourage the use of multiple research methods for studying SSMR. In total, the four studies convincingly indicate that a phenomenon of socially shared metacognitive regulation also exists. This means that it was possible to define the concept of SSMR theoretically, to investigate it methodologically and to validate it empirically in two different learning contexts across dyads and small groups. In-depth micro-level case analysis in Studies I and III showed the possibility to capture and analyse in detail SSMR during the collaborative process, while in Studies II and IV, the analysis validated the emergence of SSMR in larger data sets. Hence, validation was tested both between two environments and within the same environments with further cases. As a part of this dissertation, SSMR’s detailed functions and foci were revealed. Moreover, the findings showed the important role of observable metacognitive experiences as the starting point of SSMRs. It was apparent that problems dealt with by the groups should be rather difficult if SSMR is to be made clearly visible. Further, individual students’ participation was found to differ between students and groups. The multiple research methods employed revealed supplementary findings regarding SSMR. Finally, when SSMR was focused on understanding and procedural matters, this was seen to lead to higher quality learning outcomes. Socially shared metacognition regulation should therefore be taken into consideration in students’ collaborative learning at school similarly to how an individual’s metacognition is taken into account in individual learning.
Resumo:
Computer Supported Collaborative Learning (CSCL) is a teaching and learning approach which is widely adopted. However there are still some problems can be found when CSCL takes place. Studies show that using game-like mechanics can increase motivation, engagement, as well as modelling behaviors of players. Gamification is a rapid growing trend by applying the same mechanics. It refers to use game design elements in non-game contexts. This thesis is about combining gamification concept and computer supported collaborative learning together in software engineering education field. And finally a gamified prototype system is designed.
Resumo:
Päätetyöhön epäillään liittyvän monenlaisia ongelmia. Eniten epäiltyjä ja käsiteltyjä ovat silmien rasitus- ja ärsytysoireet sekä päätetyön kuormittavuus ja näköergonomiset ongelmat. Näkemiseen ja silmiin liittyvät ongelmat näyttöpäätetyöskentelyssä ovat hyvin tavallisia. Niitä kutsutaan termillä Computer Vision Syndrome (CVS). Opinnäytetyömme tarkoituksena oli tutkia kuinka eri katsekulmat vaikuttavat näönrasitusoireisiin sekä olemassa oleviin näköjärjestelmän vikoihin. Kokeessa näyttöpääte sijoitettiin kolmeen eri katsekulmaan. Nämä kulmat olivat 15 astetta horisontaalilinjan yläpuolelle, horisontaalilinja sekä 15 astetta horisontaalilinjan alapuolelle. Tutkimus oli vertaileva ikäryhmien 20-39 ja 40-60-vuotiaat välillä. Opinnäytetyö on kvantitatiivinen. Tutkimusjoukko koostui 80 henkilöstä. VSQ- ja SSQ-kyselylomakkeilla ja mittauksilla saatu aineisto analysoitiin SPSS-ohjelmassa Wilcoxonin merkkitestillä ja Mann-Whitneyn U-testillä. Koko tutkimusjoukon SSQ-oireiden keskiarvoja tarkastellessa voitiin oireiden todeta voimistuneen tehtävän aikana tilastollisesti merkitsevästi. + 15 asteen katsekulmassa havaittiin oireiden voimistumista eniten. SSQ-oireiden jakaminen eri ryhmiin toi esiin tilastollisesti merkitseviä eroja varsinkin silmänrasitusoireiden kohdalla. - 15 asteen katsekulma aiheutti vähiten oireiden arvojen kasvua tehtävän aikana silmänrasitus- ja disorientaatio-oireiden ryhmissä. Tarkasteltaessa koko joukon silmänrasitus- ja disorientaatio-oireita voidaan päätellä näyttöpäätetyön aiheuttavan rasitusoireiden lisääntymistä, koska merkitsevyystaso näissä oli tilastollisesti erittäin merkitsevä. Sekä kokonaisuudessaan että oireryhmittäin oli huomionarvoista, että 20-40-vuotiaat kokivat näyttöpäätetyön rasittavan enemmän. Mittaustulosten perusteella voidaan sanoa, että akkommodaatiolaajuus ja konvergenssikyky olivat merkitsevästi heikompia tehtävän jälkeen. Kyynelfilmin repeämisajan keskiarvo kokeen jälkeen koko tutkimusjoukolla oli normaaliarvoa alhaisempi. Yhteistyökumppanimme voi hyödyntää työmme tuloksia laajemmassa tutkimuksessa. Opinnäytetyömme tukee ammattiosaamistamme toimiessamme näönhuollon asiantuntijoina.
Resumo:
Multispectral images contain information from several spectral wavelengths and currently multispectral images are widely used in remote sensing and they are becoming more common in the field of computer vision and in industrial applications. Typically, one multispectral image in remote sensing may occupy hundreds of megabytes of disk space and several this kind of images may be received from a single measurement. This study considers the compression of multispectral images. The lossy compression is based on the wavelet transform and we compare the suitability of different waveletfilters for the compression. A method for selecting a wavelet filter for the compression and reconstruction of multispectral images is developed. The performance of the multidimensional wavelet transform based compression is compared to other compression methods like PCA, ICA, SPIHT, and DCT/JPEG. The quality of the compression and reconstruction is measured by quantitative measures like signal-to-noise ratio. In addition, we have developed a qualitative measure, which combines the information from the spatial and spectral dimensions of a multispectral image and which also accounts for the visual quality of the bands from the multispectral images.
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
Luokittelujärjestelmää suunniteltaessa tarkoituksena on rakentaa systeemi, joka pystyy ratkaisemaan mahdollisimman tarkasti tutkittavan ongelma-alueen. Hahmontunnistuksessa tunnistusjärjestelmän ydin on luokitin. Luokittelun sovellusaluekenttä on varsin laaja. Luokitinta tarvitaan mm. hahmontunnistusjärjestelmissä, joista kuvankäsittely toimii hyvänä esimerkkinä. Myös lääketieteen parissa tarkkaa luokittelua tarvitaan paljon. Esimerkiksi potilaan oireiden diagnosointiin tarvitaan luokitin, joka pystyy mittaustuloksista päättelemään mahdollisimman tarkasti, onko potilaalla kyseinen oire vai ei. Väitöskirjassa on tehty similaarisuusmittoihin perustuva luokitin ja sen toimintaa on tarkasteltu mm. lääketieteen paristatulevilla data-aineistoilla, joissa luokittelutehtävänä on tunnistaa potilaan oireen laatu. Väitöskirjassa esitetyn luokittimen etuna on sen yksinkertainen rakenne, josta johtuen se on helppo tehdä sekä ymmärtää. Toinen etu on luokittimentarkkuus. Luokitin saadaan luokittelemaan useita eri ongelmia hyvin tarkasti. Tämä on tärkeää varsinkin lääketieteen parissa, missä jo pieni tarkkuuden parannus luokittelutuloksessa on erittäin tärkeää. Väitöskirjassa ontutkittu useita eri mittoja, joilla voidaan mitata samankaltaisuutta. Mitoille löytyy myös useita parametreja, joille voidaan etsiä juuri kyseiseen luokitteluongelmaan sopivat arvot. Tämä parametrien optimointi ongelma-alueeseen sopivaksi voidaan suorittaa mm. evoluutionääri- algoritmeja käyttäen. Kyseisessä työssä tähän on käytetty geneettistä algoritmia ja differentiaali-evoluutioalgoritmia. Luokittimen etuna on sen joustavuus. Ongelma-alueelle on helppo vaihtaa similaarisuusmitta, jos kyseinen mitta ei ole sopiva tutkittavaan ongelma-alueeseen. Myös eri mittojen parametrien optimointi voi parantaa tuloksia huomattavasti. Kun käytetään eri esikäsittelymenetelmiä ennen luokittelua, tuloksia pystytään parantamaan.
Resumo:
In this thesis author approaches the problem of automated text classification, which is one of basic tasks for building Intelligent Internet Search Agent. The work discusses various approaches to solving sub-problems of automated text classification, such as feature extraction and machine learning on text sources. Author also describes her own multiword approach to feature extraction and pres-ents the results of testing this approach using linear discriminant analysis based classifier, and classifier combining unsupervised learning for etalon extraction with supervised learning using common backpropagation algorithm for multilevel perceptron.
Resumo:
Vuosi vuodelta kasvava tietokoneiden prosessointikyky on mahdollistanut harmaataso- ja RGB-värikuvia tarkempien spektrikuvien käsittelyn järjellisessä ajassa ilman suuria kustannuksia. Ongelmana on kuitenkin, ettei talletus- ja tiedonsiirtomedia ole kehittynyt prosessointikyvyn vauhdissa. Ratkaisu tähän ongelmaan on spektrikuvien tiivistäminen talletuksen ja tiedonsiirron ajaksi. Tässä työssä esitellään menetelmä, jossa spektrikuva tiivistetään kahdessa vaiheessa: ensin ryhmittelemällä itseorganisoituvan kartan (SOM) avulla ja toisessa vaiheessa jatketaan tiivistämistä perinteisin menetelmin. Saadut tiivistyssuhteet ovat merkittäviä vääristymän pysyessä siedettävänä. Työ on tehty Lappeenrannan teknillisen korkeakoulun Tietotekniikan osaston Tietojenkäsittelytekniikan tutkimuslaboratoriossa osana laajempaa kuvantiivistyksen tutkimushanketta.
Resumo:
Kolmiulotteisten kappaleiden rekonstruktio on yksi konenäön haastavimmista ongelmista, koska kappaleiden kolmiulotteisia etäisyyksiä ei voida selvittää yhdestä kaksiulotteisesta kuvasta. Ongelma voidaan ratkaista stereonäön avulla, jossa näkymän kolmiulotteinen rakenne päätellään usean kuvan perusteella. Tämä lähestymistapa mahdollistaa kuitenkin vain rekonstruktion niille kappaleiden osille, jotka näkyvät vähintään kahdessa kuvassa. Piilossa olevien osien rekonstruktio ei ole mahdollista pelkästään stereonäön avulla. Tässä työssä on kehitetty uusi menetelmä osittain piilossa olevien kolmiulotteisten tasomaisten kappaleiden rekonstruktioon. Menetelmän avulla voidaan selvittää hyvällä tarkkuudella tasomaisista pinnoista koostuvan kappaleen muoto ja paikka käyttäen kahta kuvaa kappaleesta. Menetelmä perustuu epipolaarigeometriaan, jonka avulla selvitetään molemmissa kuvissa näkyvät kappaleiden osat. Osittain piilossa olevien piirteiden rekonstruointi suoritetaan käyttämäen stereonäköä sekä tietoa kappaleen rakenteesta. Esitettyä ratkaisua voitaisiin käyttää esimerkiksi kolmiulotteisten kappaleiden visualisointiin, robotin navigointiin tai esineentunnistukseen.
Resumo:
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.