13 resultados para Evaluation metrics

em Universidad Politécnica de Madrid


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ontologies and taxonomies are widely used to organize concepts providing the basis for activities such as indexing, and as background knowledge for NLP tasks. As such, translation of these resources would prove useful to adapt these systems to new languages. However, we show that the nature of these resources is significantly different from the "free-text" paradigm used to train most statistical machine translation systems. In particular, we see significant differences in the linguistic nature of these resources and such resources have rich additional semantics. We demonstrate that as a result of these linguistic differences, standard SMT methods, in particular evaluation metrics, can produce poor performance. We then look to the task of leveraging these semantics for translation, which we approach in three ways: by adapting the translation system to the domain of the resource; by examining if semantics can help to predict the syntactic structure used in translation; and by evaluating if we can use existing translated taxonomies to disambiguate translations. We present some early results from these experiments, which shed light on the degree of success we may have with each approach

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Automatic grading of programming assignments is an important topic in academic research. It aims at improving the level of feedback given to students and optimizing the professor time. Several researches have reported the development of software tools to support this process. Then, it is helpfulto get a quickly and good sight about their key features. This paper reviews an ample set of tools forautomatic grading of programming assignments. They are divided in those most important mature tools, which have remarkable features; and those built recently, with new features. The review includes the definition and description of key features e.g. supported languages, used technology, infrastructure, etc. The two kinds of tools allow making a temporal comparative analysis. This analysis infrastructure, etc. The two kinds of tools allow making a temporal comparative analysis. This analysis shows good improvements in this research field, these include security, more language support, plagiarism detection, etc. On the other hand, the lack of a grading model for assignments is identified as an important gap in the reviewed tools. Thus, a characterization of evaluation metrics to grade programming assignments is provided as first step to get a model. Finally new paths in this research field are proposed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The multi-dimensional classification problem is a generalisation of the recently-popularised task of multi-label classification, where each data instance is associated with multiple class variables. There has been relatively little research carried out specific to multi-dimensional classification and, although one of the core goals is similar (modelling dependencies among classes), there are important differences; namely a higher number of possible classifications. In this paper we present method for multi-dimensional classification, drawing from the most relevant multi-label research, and combining it with important novel developments. Using a fast method to model the conditional dependence between class variables, we form super-class partitions and use them to build multi-dimensional learners, learning each super-class as an ordinary class, and thus explicitly modelling class dependencies. Additionally, we present a mechanism to deal with the many class values inherent to super-classes, and thus make learning efficient. To investigate the effectiveness of this approach we carry out an empirical evaluation on a range of multi-dimensional datasets, under different evaluation metrics, and in comparison with high-performing existing multi-dimensional approaches from the literature. Analysis of results shows that our approach offers important performance gains over competing methods, while also exhibiting tractable running time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El objetivo principal de esta tesis doctoral es profundizar en el análisis y diseño de un sistema inteligente para la predicción y control del acabado superficial en un proceso de fresado a alta velocidad, basado fundamentalmente en clasificadores Bayesianos, con el prop´osito de desarrollar una metodolog´ıa que facilite el diseño de este tipo de sistemas. El sistema, cuyo propósito es posibilitar la predicción y control de la rugosidad superficial, se compone de un modelo aprendido a partir de datos experimentales con redes Bayesianas, que ayudar´a a comprender los procesos dinámicos involucrados en el mecanizado y las interacciones entre las variables relevantes. Dado que las redes neuronales artificiales son modelos ampliamente utilizados en procesos de corte de materiales, también se incluye un modelo para fresado usándolas, donde se introdujo la geometría y la dureza del material como variables novedosas hasta ahora no estudiadas en este contexto. Por lo tanto, una importante contribución en esta tesis son estos dos modelos para la predicción de la rugosidad superficial, que se comparan con respecto a diferentes aspectos: la influencia de las nuevas variables, los indicadores de evaluación del desempeño, interpretabilidad. Uno de los principales problemas en la modelización con clasificadores Bayesianos es la comprensión de las enormes tablas de probabilidad a posteriori producidas. Introducimos un m´etodo de explicación que genera un conjunto de reglas obtenidas de árboles de decisión. Estos árboles son inducidos a partir de un conjunto de datos simulados generados de las probabilidades a posteriori de la variable clase, calculadas con la red Bayesiana aprendida a partir de un conjunto de datos de entrenamiento. Por último, contribuimos en el campo multiobjetivo en el caso de que algunos de los objetivos no se puedan cuantificar en números reales, sino como funciones en intervalo de valores. Esto ocurre a menudo en aplicaciones de aprendizaje automático, especialmente las basadas en clasificación supervisada. En concreto, se extienden las ideas de dominancia y frontera de Pareto a esta situación. Su aplicación a los estudios de predicción de la rugosidad superficial en el caso de maximizar al mismo tiempo la sensibilidad y la especificidad del clasificador inducido de la red Bayesiana, y no solo maximizar la tasa de clasificación correcta. Los intervalos de estos dos objetivos provienen de un m´etodo de estimación honesta de ambos objetivos, como e.g. validación cruzada en k rodajas o bootstrap.---ABSTRACT---The main objective of this PhD Thesis is to go more deeply into the analysis and design of an intelligent system for surface roughness prediction and control in the end-milling machining process, based fundamentally on Bayesian network classifiers, with the aim of developing a methodology that makes easier the design of this type of systems. The system, whose purpose is to make possible the surface roughness prediction and control, consists of a model learnt from experimental data with the aid of Bayesian networks, that will help to understand the dynamic processes involved in the machining and the interactions among the relevant variables. Since artificial neural networks are models widely used in material cutting proceses, we include also an end-milling model using them, where the geometry and hardness of the piecework are introduced as novel variables not studied so far within this context. Thus, an important contribution in this thesis is these two models for surface roughness prediction, that are then compared with respecto to different aspects: influence of the new variables, performance evaluation metrics, interpretability. One of the main problems with Bayesian classifier-based modelling is the understanding of the enormous posterior probabilitiy tables produced. We introduce an explanation method that generates a set of rules obtained from decision trees. Such trees are induced from a simulated data set generated from the posterior probabilities of the class variable, calculated with the Bayesian network learned from a training data set. Finally, we contribute in the multi-objective field in the case that some of the objectives cannot be quantified as real numbers but as interval-valued functions. This often occurs in machine learning applications, especially those based on supervised classification. Specifically, the dominance and Pareto front ideas are extended to this setting. Its application to the surface roughness prediction studies the case of maximizing simultaneously the sensitivity and specificity of the induced Bayesian network classifier, rather than only maximizing the correct classification rate. Intervals in these two objectives come from a honest estimation method of both objectives, like e.g. k-fold cross-validation or bootstrap.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Three-dimensional kinematic analysis provides quantitative assessment of upper limb motion and is used as an outcome measure to evaluate movement disorders. The aim of the present study is to present a set of kinematic metrics for quantifying characteristics of movement performance and the functional status of the subject during the execution of the activity of daily living (ADL) of drinking from a glass. Then, the objective is to apply these metrics in healthy people and a population with cervical spinal cord injury (SCI), and to analyze the metrics ability to discriminate between healthy and pathologic people. 19 people participated in the study: 7 subjects with metameric level C6 tetraplegia, 4 subjects with metameric level C7 tetraplegia and 8 healthy subjects. The movement was recorded with a photogrammetry system. The ADL of drinking was divided into a series of clearly identifiable phases to facilitate analysis. Metrics describing the time of the reaching phase, the range of motion of the joints analyzed, and characteristics of movement performance such as the efficiency, accuracy and smoothness of the distal segment and inter-joint coordination were obtained. The performance of the drinking task was more variable in people with SCI compared to the control group in relation to the metrics measured. Reaching time was longer in SCI groups. The proposed metrics showed capability to discriminate between healthy and pathologic people. Relative deficits in efficiency were larger in SCI people than in controls. These metrics can provide useful information in a clinical setting about the quality of the movement performed by healthy and SCI people during functional activities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are many studies related with airport surface routing algorithms, based on different approaches and with different evaluation methods and metrics. So, the need of performing a balanced analysis and comparison using a common framework is evident. This paper presents an implementation of an evaluation tool for airport surface routing algorithms. The routing evaluation tool presented here is based in three basic pillars composed by the airport model, the model and generation of traffic and a comprehensive figure of merit function. The paper includes some example evaluations performed over Barajas Airport with representative traffic samples using several simple routing methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present an evaluation of a spoken language dialogue system with a module for the management of userrelated information, stored as user preferences and privileges. The flexibility of our dialogue management approach, based on Bayesian Networks (BN), together with a contextual information module, which performs different strategies for handling such information, allows us to include user information as a new level into the Context Manager hierarchy. We propose a set of objective and subjective metrics to measure the relevance of the different contextual information sources. The analysis of our evaluation scenarios shows that the relevance of the short-term information (i.e. the system status) remains pretty stable throughout the dialogue, whereas the dialogue history and the user profile (i.e. the middle-term and the long-term information, respectively) play a complementary role, evolving their usefulness as the dialogue evolves.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The magnetoencephalogram (MEG) is contaminated with undesired signals, which are called artifacts. Some of the most important ones are the cardiac and the ocular artifacts (CA and OA, respectively), and the power line noise (PLN). Blind source separation (BSS) has been used to reduce the influence of the artifacts in the data. There is a plethora of BSS-based artifact removal approaches, but few comparative analyses. In this study, MEG background activity from 26 subjects was processed with five widespread BSS (AMUSE, SOBI, JADE, extended Infomax, and FastICA) and one constrained BSS (cBSS) techniques. Then, the ability of several combinations of BSS algorithm, epoch length, and artifact detection metric to automatically reduce the CA, OA, and PLN were quantified with objective criteria. The results pinpointed to cBSS as a very suitable approach to remove the CA. Additionally, a combination of AMUSE or SOBI and artifact detection metrics based on entropy or power criteria decreased the OA. Finally, the PLN was reduced by means of a spectral metric. These findings confirm the utility of BSS to help in the artifact removal for MEG background activity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: Motion metrics have become an important source of information when addressing the assessment of surgical expertise. However, their direct relationship with the different surgical skills has not been fully explored. The purpose of this study is to investigate the relevance of motion-related metrics in the evaluation processes of basic psychomotor laparoscopic skills, as well as their correlation with the different abilities sought to measure. METHODS: A framework for task definition and metric analysis is proposed. An explorative survey was first conducted with a board of experts to identify metrics to assess basic psychomotor skills. Based on the output of that survey, three novel tasks for surgical assessment were designed. Face and construct validation study was performed, with focus on motion-related metrics. Tasks were performed by 42 participants (16 novices, 22 residents and 4 experts). Movements of the laparoscopic instruments were registered with the TrEndo tracking system and analyzed. RESULTS: Time, path length and depth showed construct validity for all three tasks. Motion smoothness and idle time also showed validity for tasks involving bi-manual coordination and tasks requiring a more tactical approach respectively. Additionally, motion smoothness and average speed showed a high internal consistency, proving them to be the most task-independent of all the metrics analyzed. CONCLUSION: Motion metrics are complementary and valid for assessing basic psychomotor skills, and their relevance depends on the skill being evaluated. A larger clinical implementation, combined with quality performance information, will give more insight on the relevance of the results shown in this study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Systematic evaluation of Learning Objects is essential to make high quality Web-based education possible. For this reason, several educational repositories and e-Learning systems have developed their own evaluation models and tools. However, the differences of the context in which Learning Objects are produced and consumed suggest that no single evaluation model is sufficient for all scenarios. Besides, no much effort has been put in developing open tools to facilitate Learning Object evaluation and use the quality information for the benefit of end users. This paper presents LOEP, an open source web platform that aims to facilitate Learning Object evaluation in different scenarios and educational settings by supporting and integrating several evaluation models and quality metrics. The work exposed in this paper shows that LOEP is capable of providing Learning Object evaluation to e-Learning systems in an open, low cost, reliable and effective way. Possible scenarios where LOEP could be used to implement quality control policies and to enhance search engines are also described. Finally, we report the results of a survey conducted among reviewers that used LOEP, showing that they perceived LOEP as a powerful and easy to use tool for evaluating Learning Objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evaluating and measuring the pedagogical quality of Learning Objects is essential for achieving a successful web-based education. On one hand, teachers need some assurance of quality of the teaching resources before making them part of the curriculum. On the other hand, Learning Object Repositories need to include quality information into the ranking metrics used by the search engines in order to save users time when searching. For these reasons, several models such as LORI (Learning Object Review Instrument) have been proposed to evaluate Learning Object quality from a pedagogical perspective. However, no much effort has been put in defining and evaluating quality metrics based on those models. This paper proposes and evaluates a set of pedagogical quality metrics based on LORI. The work exposed in this paper shows that these metrics can be effectively and reliably used to provide quality-based sorting of search results. Besides, it strongly evidences that the evaluation of Learning Objects from a pedagogical perspective can notably enhance Learning Object search if suitable evaluations models and quality metrics are used. An evaluation of the LORI model is also described. Finally, all the presented metrics are compared and a discussion on their weaknesses and strengths is provided.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context: Measurement is crucial and important to empirical software engineering. Although reliability and validity are two important properties warranting consideration in measurement processes, they may be influenced by random or systematic error (bias) depending on which metric is used. Aim: Check whether, the simple subjective metrics used in empirical software engineering studies are prone to bias. Method: Comparison of the reliability of a family of empirical studies on requirements elicitation that explore the same phenomenon using different design types and objective and subjective metrics. Results: The objectively measured variables (experience and knowledge) tend to achieve more reliable results, whereas subjective metrics using Likert scales (expertise and familiarity) tend to be influenced by systematic error or bias. Conclusions: Studies that predominantly use variables measured subjectively, like opinion polls or expert opinion acquisition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One important steps in a successful project-based-learning methodology (PBL) is the process of providing the students with a convenient feedback that allows them to keep on developing their projects or to improve them. However, this task is more difficult in massive courses, especially when the project deadline is close. Besides, the continuous evaluation methodology makes necessary to find ways to objectively and continuously measure students' performance without increasing excessively instructors' work load. In order to alleviate these problems, we have developed a web service that allows students to request personal tutoring assistance during the laboratory sessions by specifying the kind of problem they have and the person who could help them to solve it. This service provides tools for the staff to manage the laboratory, for performing continuous evaluation for all students and for the student collaborators, and to prioritize tutoring according to the progress of the student's project. Additionally, the application provides objective metrics which can be used at the end of the subject during the evaluation process in order to support some students' final scores. Different usability statistics and the results of a subjective evaluation with more than 330 students confirm the success of the proposed application.