131 resultados para graphical methods


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The skill of programming is a key asset for every computer science student. Many studies have shown that this is a hard skill to learn and the outcomes of programming courses have often been substandard. Thus, a range of methods and tools have been developed to assist students’ learning processes. One of the biggest fields in computer science education is the use of visualizations as a learning aid and many visualization based tools have been developed to aid the learning process during last few decades. Studies conducted in this thesis focus on two different visualizationbased tools TRAKLA2 and ViLLE. This thesis includes results from multiple empirical studies about what kind of effects the introduction and usage of these tools have on students’ opinions and performance, and what kind of implications there are from a teacher’s point of view. The results from studies in this thesis show that students preferred to do web-based exercises, and felt that those exercises contributed to their learning. The usage of the tool motivated students to work harder during their course, which was shown in overall course performance and drop-out statistics. We have also shown that visualization-based tools can be used to enhance the learning process, and one of the key factors is the higher and active level of engagement (see. Engagement Taxonomy by Naps et al., 2002). The automatic grading accompanied with immediate feedback helps students to overcome obstacles during the learning process, and to grasp the key element in the learning task. These kinds of tools can help us to cope with the fact that many programming courses are overcrowded with limited teaching resources. These tools allows us to tackle this problem by utilizing automatic assessment in exercises that are most suitable to be done in the web (like tracing and simulation) since its supports students’ independent learning regardless of time and place. In summary, we can use our course’s resources more efficiently to increase the quality of the learning experience of the students and the teaching experience of the teacher, and even increase performance of the students. There are also methodological results from this thesis which contribute to developing insight into the conduct of empirical evaluations of new tools or techniques. When we evaluate a new tool, especially one accompanied with visualization, we need to give a proper introduction to it and to the graphical notation used by tool. The standard procedure should also include capturing the screen with audio to confirm that the participants of the experiment are doing what they are supposed to do. By taken such measures in the study of the learning impact of visualization support for learning, we can avoid drawing false conclusion from our experiments. As computer science educators, we face two important challenges. Firstly, we need to start to deliver the message in our own institution and all over the world about the new – scientifically proven – innovations in teaching like TRAKLA2 and ViLLE. Secondly, we have the relevant experience of conducting teaching related experiment, and thus we can support our colleagues to learn essential know-how of the research based improvement of their teaching. This change can transform academic teaching into publications and by utilizing this approach we can significantly increase the adoption of the new tools and techniques, and overall increase the knowledge of best-practices. In future, we need to combine our forces and tackle these universal and common problems together by creating multi-national and multiinstitutional research projects. We need to create a community and a platform in which we can share these best practices and at the same time conduct multi-national research projects easily.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The results shown in this thesis are based on selected publications of the 2000s decade. The work was carried out in several national and EC funded public research projects and in close cooperation with industrial partners. The main objective of the thesis was to study and quantify the most important phenomena of circulating fluidized bed combustors by developing and applying proper experimental and modelling methods using laboratory scale equipments. An understanding of the phenomena plays an essential role in the development of combustion and emission performance, and the availability and controls of CFB boilers. Experimental procedures to study fuel combustion behaviour under CFB conditions are presented in the thesis. Steady state and dynamic measurements under well controlled conditions were carried out to produce the data needed for the development of high efficiency, utility scale CFB technology. The importance of combustion control and furnace dynamics is emphasized when CFB boilers are scaled up with a once through steam cycle. Qualitative information on fuel combustion characteristics was obtained directly by comparing flue gas oxygen responses during the impulse change experiments with fuel feed. A one-dimensional, time dependent model was developed to analyse the measurement data Emission formation was studied combined with fuel combustion behaviour. Correlations were developed for NO, N2O, CO and char loading, as a function of temperature and oxygen concentration in the bed area. An online method to characterize char loading under CFB conditions was developed and validated with the pilot scale CFB tests. Finally, a new method to control air and fuel feeds in CFB combustion was introduced. The method is based on models and an analysis of the fluctuation of the flue gas oxygen concentration. The effect of high oxygen concentrations on fuel combustion behaviour was also studied to evaluate the potential of CFB boilers to apply oxygenfiring technology to CCS. In future studies, it will be necessary to go through the whole scale up chain from laboratory phenomena devices through pilot scale test rigs to large scale, commercial boilers in order to validate the applicability and scalability of the, results. This thesis shows the chain between the laboratory scale phenomena test rig (bench scale) and the CFB process test rig (pilot). CFB technology has been scaled up successfully from an industrial scale to a utility scale during the last decade. The work shown in the thesis, for its part, has supported the development by producing new detailed information on combustion under CFB conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Credit risk assessment is an integral part of banking. Credit risk means that the return will not materialise in case the customer fails to fulfil its obligations. Thus a key component of banking is setting acceptance criteria for granting loans. Theoretical part of the study focuses on key components of credit assessment methods of Banks in the literature when extending credits to large corporations. Main component is Basel II Accord, which sets regulatory requirement for credit risk assessment methods of banks. Empirical part comprises, as primary source, analysis of major Nordic banks’ annual reports and risk management reports. As secondary source complimentary interviews were carried out with senior credit risk assessment personnel. The findings indicate that all major Nordic banks are using combination of quantitative and qualitative information in credit risk assessment model when extending credits to large corporations. The relative input of qualitative information depends on the selected approach to the credit rating, i.e. point-in-time or through-the-cycle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Forest inventories are used to estimate forest characteristics and the condition of forest for many different applications: operational tree logging for forest industry, forest health state estimation, carbon balance estimation, land-cover and land use analysis in order to avoid forest degradation etc. Recent inventory methods are strongly based on remote sensing data combined with field sample measurements, which are used to define estimates covering the whole area of interest. Remote sensing data from satellites, aerial photographs or aerial laser scannings are used, depending on the scale of inventory. To be applicable in operational use, forest inventory methods need to be easily adjusted to local conditions of the study area at hand. All the data handling and parameter tuning should be objective and automated as much as possible. The methods also need to be robust when applied to different forest types. Since there generally are no extensive direct physical models connecting the remote sensing data from different sources to the forest parameters that are estimated, mathematical estimation models are of "black-box" type, connecting the independent auxiliary data to dependent response data with linear or nonlinear arbitrary models. To avoid redundant complexity and over-fitting of the model, which is based on up to hundreds of possibly collinear variables extracted from the auxiliary data, variable selection is needed. To connect the auxiliary data to the inventory parameters that are estimated, field work must be performed. In larger study areas with dense forests, field work is expensive, and should therefore be minimized. To get cost-efficient inventories, field work could partly be replaced with information from formerly measured sites, databases. The work in this thesis is devoted to the development of automated, adaptive computation methods for aerial forest inventory. The mathematical model parameter definition steps are automated, and the cost-efficiency is improved by setting up a procedure that utilizes databases in the estimation of new area characteristics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of correct programs is a core problem in computer science. Although formal verification methods for establishing correctness with mathematical rigor are available, programmers often find these difficult to put into practice. One hurdle is deriving the loop invariants and proving that the code maintains them. So called correct-by-construction methods aim to alleviate this issue by integrating verification into the programming workflow. Invariant-based programming is a practical correct-by-construction method in which the programmer first establishes the invariant structure, and then incrementally extends the program in steps of adding code and proving after each addition that the code is consistent with the invariants. In this way, the program is kept internally consistent throughout its development, and the construction of the correctness arguments (proofs) becomes an integral part of the programming workflow. A characteristic of the approach is that programs are described as invariant diagrams, a graphical notation similar to the state charts familiar to programmers. Invariant-based programming is a new method that has not been evaluated in large scale studies yet. The most important prerequisite for feasibility on a larger scale is a high degree of automation. The goal of the Socos project has been to build tools to assist the construction and verification of programs using the method. This thesis describes the implementation and evaluation of a prototype tool in the context of the Socos project. The tool supports the drawing of the diagrams, automatic derivation and discharging of verification conditions, and interactive proofs. It is used to develop programs that are correct by construction. The tool consists of a diagrammatic environment connected to a verification condition generator and an existing state-of-the-art theorem prover. Its core is a semantics for translating diagrams into verification conditions, which are sent to the underlying theorem prover. We describe a concrete method for 1) deriving sufficient conditions for total correctness of an invariant diagram; 2) sending the conditions to the theorem prover for simplification; and 3) reporting the results of the simplification to the programmer in a way that is consistent with the invariantbased programming workflow and that allows errors in the program specification to be efficiently detected. The tool uses an efficient automatic proof strategy to prove as many conditions as possible automatically and lets the remaining conditions be proved interactively. The tool is based on the verification system PVS and i uses the SMT (Satisfiability Modulo Theories) solver Yices as a catch-all decision procedure. Conditions that were not discharged automatically may be proved interactively using the PVS proof assistant. The programming workflow is very similar to the process by which a mathematical theory is developed inside a computer supported theorem prover environment such as PVS. The programmer reduces a large verification problem with the aid of the tool into a set of smaller problems (lemmas), and he can substantially improve the degree of proof automation by developing specialized background theories and proof strategies to support the specification and verification of a specific class of programs. We demonstrate this workflow by describing in detail the construction of a verified sorting algorithm. Tool-supported verification often has little to no presence in computer science (CS) curricula. Furthermore, program verification is frequently introduced as an advanced and purely theoretical topic that is not connected to the workflow taught in the early and practically oriented programming courses. Our hypothesis is that verification could be introduced early in the CS education, and that verification tools could be used in the classroom to support the teaching of formal methods. A prototype of Socos has been used in a course at Åbo Akademi University targeted at first and second year undergraduate students. We evaluate the use of Socos in the course as part of a case study carried out in 2007.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

New luminometric particle-based methods were developed to quantify protein and to count cells. The developed methods rely on the interaction of the sample with nano- or microparticles and different principles of detection. In fluorescence quenching, timeresolved luminescence resonance energy transfer (TR-LRET), and two-photon excitation fluorescence (TPX) methods, the sample prevents the adsorption of labeled protein to the particles. Depending on the system, the addition of the analyte increases or decreases the luminescence. In the dissociation method, the adsorbed protein protects the Eu(III) chelate on the surface of the particles from dissociation at a low pH. The experimental setups are user-friendly and rapid and do not require hazardous test compounds and elevated temperatures. The sensitivity of the quantification of protein (from 40 to 500 pg bovine serum albumin in a sample) was 20-500-fold better than in most sensitive commercial methods. The quenching method exhibited low protein-to-protein variability and the dissociation method insensitivity to the assay contaminants commonly found in biological samples. Less than ten eukaryotic cells were detected and quantified with all the developed methods under optimized assay conditions. Furthermore, two applications, the method for detection of the aggregation of protein and the cell viability test, were developed by utilizing the TR-LRET method. The detection of the aggregation of protein was allowed at a more than 10,000 times lower concentration, 30 μg/L, compared to the known methods of UV240 absorbance and dynamic light scattering. The TR-LRET method was combined with a nucleic acid assay with cell-impermeable dye to measure the percentage of dead cells in a single tube test with cell counts below 1000 cells/tube.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this dissertation is to improve the dynamic simulation of fluid power circuits. A fluid power circuit is a typical way to implement power transmission in mobile working machines, e.g. cranes, excavators etc. Dynamic simulation is an essential tool in developing controllability and energy-efficient solutions for mobile machines. Efficient dynamic simulation is the basic requirement for the real-time simulation. In the real-time simulation of fluid power circuits there exist numerical problems due to the software and methods used for modelling and integration. A simulation model of a fluid power circuit is typically created using differential and algebraic equations. Efficient numerical methods are required since differential equations must be solved in real time. Unfortunately, simulation software packages offer only a limited selection of numerical solvers. Numerical problems cause noise to the results, which in many cases leads the simulation run to fail. Mathematically the fluid power circuit models are stiff systems of ordinary differential equations. Numerical solution of the stiff systems can be improved by two alternative approaches. The first is to develop numerical solvers suitable for solving stiff systems. The second is to decrease the model stiffness itself by introducing models and algorithms that either decrease the highest eigenvalues or neglect them by introducing steady-state solutions of the stiff parts of the models. The thesis proposes novel methods using the latter approach. The study aims to develop practical methods usable in dynamic simulation of fluid power circuits using explicit fixed-step integration algorithms. In this thesis, twomechanisms whichmake the systemstiff are studied. These are the pressure drop approaching zero in the turbulent orifice model and the volume approaching zero in the equation of pressure build-up. These are the critical areas to which alternative methods for modelling and numerical simulation are proposed. Generally, in hydraulic power transmission systems the orifice flow is clearly in the turbulent area. The flow becomes laminar as the pressure drop over the orifice approaches zero only in rare situations. These are e.g. when a valve is closed, or an actuator is driven against an end stopper, or external force makes actuator to switch its direction during operation. This means that in terms of accuracy, the description of laminar flow is not necessary. But, unfortunately, when a purely turbulent description of the orifice is used, numerical problems occur when the pressure drop comes close to zero since the first derivative of flow with respect to the pressure drop approaches infinity when the pressure drop approaches zero. Furthermore, the second derivative becomes discontinuous, which causes numerical noise and an infinitely small integration step when a variable step integrator is used. A numerically efficient model for the orifice flow is proposed using a cubic spline function to describe the flow in the laminar and transition areas. Parameters for the cubic spline function are selected such that its first derivative is equal to the first derivative of the pure turbulent orifice flow model in the boundary condition. In the dynamic simulation of fluid power circuits, a tradeoff exists between accuracy and calculation speed. This investigation is made for the two-regime flow orifice model. Especially inside of many types of valves, as well as between them, there exist very small volumes. The integration of pressures in small fluid volumes causes numerical problems in fluid power circuit simulation. Particularly in realtime simulation, these numerical problems are a great weakness. The system stiffness approaches infinity as the fluid volume approaches zero. If fixed step explicit algorithms for solving ordinary differential equations (ODE) are used, the system stability would easily be lost when integrating pressures in small volumes. To solve the problem caused by small fluid volumes, a pseudo-dynamic solver is proposed. Instead of integration of the pressure in a small volume, the pressure is solved as a steady-state pressure created in a separate cascade loop by numerical integration. The hydraulic capacitance V/Be of the parts of the circuit whose pressures are solved by the pseudo-dynamic method should be orders of magnitude smaller than that of those partswhose pressures are integrated. The key advantage of this novel method is that the numerical problems caused by the small volumes are completely avoided. Also, the method is freely applicable regardless of the integration routine applied. The superiority of both above-mentioned methods is that they are suited for use together with the semi-empirical modelling method which necessarily does not require any geometrical data of the valves and actuators to be modelled. In this modelling method, most of the needed component information can be taken from the manufacturer’s nominal graphs. This thesis introduces the methods and shows several numerical examples to demonstrate how the proposed methods improve the dynamic simulation of various hydraulic circuits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Asevaikutusta laukaisusta kohteeseen simuloiva integroitu laskentaketju koostuu sisä-, ulko- ja maaliballistiikan malleista. Ulkoballistiikka kattaa laskentamallit radanlaskennan, sääkorjauksen ja ammusaerodynamiikan alueilla. Graafisella käyttöliittymällä toteutetulla, fysikaalisesti tarkkaan mallinnukseen perustuvalla ja kokonaisuuden kattavalla laskentajärjestelmällä on kasvavaa tarvetta teknisiä ja koulutuksellisia tarkoituksia varten. Erikoisesti, jos laskentaketjuun lisätään räjähdysvaikutuksen mallintaminen, voidaan simuloida asejärjestelmien vaikutusta kohteessa käyttäjien arvostamalla tavalla. Tietointensiiviset ballistiikan laskentamallit ovat välttämättömiä työkaluja teknisen suunnitteluosaamisen kattamiseksi ja kilpailuedun luomiseksi verkostoituneessa yritysympäristössä. Yliopistotutkimuksen tuottamien laskennallisten menetelmien hyötykäyttö yritysten suunnittelujärjestelmissä syventää teknistä osaamista, jolla on myös henkilöstöä motivoiva vaikutus teknisesti vaikeutuvilla markkinoilla. Työssä arvioidaan toimialaa analysoimalla eri käyttötarpeita samoille tietokantoihin tukeutuville laskentamalleille. Tarkastellaan teknisiä perusteita, käyttöympäristöjä ja markkinoita liiketoimintamahdollisuuksien tunnistamiseksi. Työn tuloksena syvennetään näkemystä ydinosaamisista ja visioidaan liikeidean erottumista kilpailijoista, markkinoita ja sen kehittämistä.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Systems biology is a new, emerging and rapidly developing, multidisciplinary research field that aims to study biochemical and biological systems from a holistic perspective, with the goal of providing a comprehensive, system- level understanding of cellular behaviour. In this way, it addresses one of the greatest challenges faced by contemporary biology, which is to compre- hend the function of complex biological systems. Systems biology combines various methods that originate from scientific disciplines such as molecu- lar biology, chemistry, engineering sciences, mathematics, computer science and systems theory. Systems biology, unlike “traditional” biology, focuses on high-level concepts such as: network, component, robustness, efficiency, control, regulation, hierarchical design, synchronization, concurrency, and many others. The very terminology of systems biology is “foreign” to “tra- ditional” biology, marks its drastic shift in the research paradigm and it indicates close linkage of systems biology to computer science. One of the basic tools utilized in systems biology is the mathematical modelling of life processes tightly linked to experimental practice. The stud- ies contained in this thesis revolve around a number of challenges commonly encountered in the computational modelling in systems biology. The re- search comprises of the development and application of a broad range of methods originating in the fields of computer science and mathematics for construction and analysis of computational models in systems biology. In particular, the performed research is setup in the context of two biolog- ical phenomena chosen as modelling case studies: 1) the eukaryotic heat shock response and 2) the in vitro self-assembly of intermediate filaments, one of the main constituents of the cytoskeleton. The range of presented approaches spans from heuristic, through numerical and statistical to ana- lytical methods applied in the effort to formally describe and analyse the two biological processes. We notice however, that although applied to cer- tain case studies, the presented methods are not limited to them and can be utilized in the analysis of other biological mechanisms as well as com- plex systems in general. The full range of developed and applied modelling techniques as well as model analysis methodologies constitutes a rich mod- elling framework. Moreover, the presentation of the developed methods, their application to the two case studies and the discussions concerning their potentials and limitations point to the difficulties and challenges one encounters in computational modelling of biological systems. The problems of model identifiability, model comparison, model refinement, model inte- gration and extension, choice of the proper modelling framework and level of abstraction, or the choice of the proper scope of the model run through this thesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The drug discovery process is facing new challenges in the evaluation process of the lead compounds as the number of new compounds synthesized is increasing. The potentiality of test compounds is most frequently assayed through the binding of the test compound to the target molecule or receptor, or measuring functional secondary effects caused by the test compound in the target model cells, tissues or organism. Modern homogeneous high-throughput-screening (HTS) assays for purified estrogen receptors (ER) utilize various luminescence based detection methods. Fluorescence polarization (FP) is a standard method for ER ligand binding assay. It was used to demonstrate the performance of two-photon excitation of fluorescence (TPFE) vs. the conventional one-photon excitation method. As result, the TPFE method showed improved dynamics and was found to be comparable with the conventional method. It also held potential for efficient miniaturization. Other luminescence based ER assays utilize energy transfer from a long-lifetime luminescent label e.g. lanthanide chelates (Eu, Tb) to a prompt luminescent label, the signal being read in a time-resolved mode. As an alternative to this method, a new single-label (Eu) time-resolved detection method was developed, based on the quenching of the label by a soluble quencher molecule when displaced from the receptor to the solution phase by an unlabeled competing ligand. The new method was paralleled with the standard FP method. It was shown to yield comparable results with the FP method and found to hold a significantly higher signal-tobackground ratio than FP. Cell-based functional assays for determining the extent of cell surface adhesion molecule (CAM) expression combined with microscopy analysis of the target molecules would provide improved information content, compared to an expression level assay alone. In this work, immune response was simulated by exposing endothelial cells to cytokine stimulation and the resulting increase in the level of adhesion molecule expression was analyzed on fixed cells by means of immunocytochemistry utilizing specific long-lifetime luminophore labeled antibodies against chosen adhesion molecules. Results showed that the method was capable of use in amulti-parametric assay for protein expression levels of several CAMs simultaneously, combined with analysis of the cellular localization of the chosen adhesion molecules through time-resolved luminescence microscopy inspection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mathematical models often contain parameters that need to be calibrated from measured data. The emergence of efficient Markov Chain Monte Carlo (MCMC) methods has made the Bayesian approach a standard tool in quantifying the uncertainty in the parameters. With MCMC, the parameter estimation problem can be solved in a fully statistical manner, and the whole distribution of the parameters can be explored, instead of obtaining point estimates and using, e.g., Gaussian approximations. In this thesis, MCMC methods are applied to parameter estimation problems in chemical reaction engineering, population ecology, and climate modeling. Motivated by the climate model experiments, the methods are developed further to make them more suitable for problems where the model is computationally intensive. After the parameters are estimated, one can start to use the model for various tasks. Two such tasks are studied in this thesis: optimal design of experiments, where the task is to design the next measurements so that the parameter uncertainty is minimized, and model-based optimization, where a model-based quantity, such as the product yield in a chemical reaction model, is optimized. In this thesis, novel ways to perform these tasks are developed, based on the output of MCMC parameter estimation. A separate topic is dynamical state estimation, where the task is to estimate the dynamically changing model state, instead of static parameters. For example, in numerical weather prediction, an estimate of the state of the atmosphere must constantly be updated based on the recently obtained measurements. In this thesis, a novel hybrid state estimation method is developed, which combines elements from deterministic and random sampling methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of the this research project is to develop a novel force control scheme for the teleoperation of a hydraulically driven manipulator, and to implement an ideal transparent mapping between human and machine interaction, and machine and task environment interaction. This master‘s thesis provides a preparatory study for the present research project. The research is limited into a single degree of freedom hydraulic slider with 6-DOF Phantom haptic device. The key contribution of the thesis is to set up the experimental rig including electromechanical haptic device, hydraulic servo and 6-DOF force sensor. The slider is firstly tested as a position servo by using previously developed intelligent switching control algorithm. Subsequently the teleoperated system is set up and the preliminary experiments are carried out. In addition to development of the single DOF experimental set up, methods such as passivity control in teleoperation are reviewed. The thesis also contains review of modeling of the servo slider in particular reference to the servo valve. Markov Chain Monte Carlo method is utilized in developing the robustness of the model in presence of noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.