9 resultados para score validity
em Universidad Politécnica de Madrid
Resumo:
Quality assessment is one of the activities performed as part of systematic literature reviews. It is commonly accepted that a good quality experiment is bias free. Bias is considered to be related to internal validity (e.g., how adequately the experiment is planned, executed and analysed). Quality assessment is usually conducted using checklists and quality scales. It has not yet been proven;however, that quality is related to experimental bias. Aim: Identify whether there is a relationship between internal validity and bias in software engineering experiments. Method: We built a quality scale to determine the quality of the studies, which we applied to 28 experiments included in two systematic literature reviews. We proposed an objective indicator of experimental bias, which we applied to the same 28 experiments. Finally, we analysed the correlations between the quality scores and the proposed measure of bias. Results: We failed to find a relationship between the global quality score (resulting from the quality scale) and bias; however, we did identify interesting correlations between bias and some particular aspects of internal validity measured by the instrument. Conclusions: There is an empirically provable relationship between internal validity and bias. It is feasible to apply quality assessment in systematic literature reviews, subject to limits on the internal validity aspects for consideration.
Resumo:
Background The aim of this study is to present face, content, and constructs validity of the endoscopic orthogonal video system (EndoViS) training system and determines its efficiency as a training and objective assessment tool of the surgeons’ psychomotor skills. Methods Thirty-five surgeons and medical students participated in this study: 11 medical students, 19 residents, and 5 experts. All participants performed four basic skill tasks using conventional laparoscopic instruments and EndoViS training system. Subsequently, participants filled out a questionnaire regarding the design, realism, overall functionality, and its capabilities to train hand–eye coordination and depth perception, rated on a 5-point Likert scale. Motion data of the instruments were obtained by means of two webcams built into a laparoscopic physical trainer. To identify the surgical instruments in the images, colored markers were placed in each instrument. Thirteen motion-related metrics were used to assess laparoscopic performance of the participants. Statistical analysis of performance was made between novice, intermediate, and expert groups. Internal consistency of all metrics was analyzed with Cronbach’s α test. Results Overall scores about features of the EndoViS system were positives. Participants agreed with the usefulness of tasks and the training capacities of EndoViS system (score >4). Results presented significant differences in the execution of three skill tasks performed by participants. Seven metrics showed construct validity for assessment of performance with high consistency levels. Conclusions EndoViS training system has been successfully validated. Results showed that EndoViS was able to differentiate between participants of varying laparoscopic experience. This simulator is a useful and effective tool to objectively assess laparoscopic psychomotor skills of the surgeons.
Resumo:
The constellation of adverse cardiovascular disease (CVD) and metabolic risk factors, including elevated abdominal obesity, blood pressure (BP), glucose, and triglycerides (TG) and lowered high-density lipoprotein-cholesterol (HDL-C), has been termed the metabolic syndrome (MetSyn) [1]. A number of different definitions have been developed by the World Health Organization (WHO) [2], the National Cholesterol Education Program Adult Treatment Panel III (ATP III) [3], the European Group for the Study of Insulin Resistance (EGIR) [4] and, most recently, the International Diabetes Federation (IDF) [5]. Since there is no universal definition of the Metabolic Syndrome, several authors have derived different risk scores to represent the clustering of its components [6-11].
Resumo:
This article focuses on the evaluation of a biometric technique based on the performance of an identifying gesture by holding a telephone with an embedded accelerometer in his/her hand. The acceleration signals obtained when users perform gestures are analyzed following a mathematical method based on global sequence alignment. In this article, eight different scores are proposed and evaluated in order to quantify the differences between gestures, obtaining an optimal EER result of 3.42% when analyzing a random set of 40 users of a database made up of 80 users with real attempts of falsification. Moreover, a temporal study of the technique is presented leeding to the need to update the template to adapt the manner in which users modify how they perform their identifying gesture over time. Six updating schemes have been assessed within a database of 22 users repeating their identifying gesture in 20 sessions over 4 months, concluding that the more often the template is updated the better and more stable performance the technique presents.
Resumo:
Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significants concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more? bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.
Resumo:
Enhanced learning environments are arising with great success within the field of cognitive skills training in minimally invasive surgery (MIS) because they provides multiple benefits since they avoid time, spatial and cost constraints. TELMA [1,2] is a new technology enhanced learning platform that promotes collaborative and ubiquitous training of surgeons. This platform is based on four main modules: an authoring tool, a learning content and knowledge management system, an evaluation module and a professional network. TELMA has been designed and developed focused on the user; therefore it is necessary to carry out a user validation as final stage of the development. For this purpose, e-MIS validity [3] has been defined. This validation includes usability, contents and functionality validities both for the development and production stages of any e-Learning web platform. Using e-MIS validity, the e-Learning is fully validated since it includes subjective and objective metrics. The purpose of this study is to specify and apply a set of objective and subjective metrics using e-MIS validity to test usability, contents and functionality of TELMA environment within the development stage.
Resumo:
Validity and reliability of AMPET Greek versión: a first examination of learning motivation in Greek PE settings
Resumo:
The aim of the present study was to assess the effects of game timeouts on basketball teams? offensive and defensive performances according to momentary differences in score and game period. The sample consisted of 144 timeouts registered during 18 basketball games randomly selected from the 2007 European Basketball Championship (Spain). For each timeout, five ball possessions were registered before (n?493) and after the timeout (n?475). The offensive and defensive efficiencies were registered across the first 35 min and last 5 min of games. A k-means cluster analysis classified the timeouts according to momentary score status as follows: losing ( ?10 to ?3 points), balanced ( ?2 to 3 points), and winning (4 to 10 points). Repeated-measures analysis of variance identified statistically significant main effects between pre and post timeout offensive and defensive values. Chi-square analysis of game period identified a higher percentage of timeouts called during the last 5 min of a game compared with the first 35 min (64.999.1% vs. 35.1910.3%; x ?5.4, PB0.05). Results showed higher post timeout offensive and defensive performances. No other effect or interaction was found for defensive performances. Offensive performances were better in the last 5 min of games, with the least differences when in balanced situations and greater differences when in winning situations. Results also showed one interaction between timeouts and momentary differences in score, with increased values when in losing and balanced situations but decreased values when in winning situations. Overall, the results suggest that coaches should examine offensive and defensive performances according to game period and differences in score when considering whether to call a timeout.