7 resultados para Rater
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Background: The COSMIN checklist is a tool for evaluating the methodological quality of studies on measurement properties of health-related patient-reported outcomes. The aim of this study is to determine the inter-rater agreement and reliability of each item score of the COSMIN checklist (n = 114). Methods: 75 articles evaluating measurement properties were randomly selected from the bibliographic database compiled by the Patient-Reported Outcome Measurement Group, Oxford, UK. Raters were asked to assess the methodological quality of three articles, using the COSMIN checklist. In a one-way design, percentage agreement and intraclass kappa coefficients or quadratic-weighted kappa coefficients were calculated for each item. Results: 88 raters participated. Of the 75 selected articles, 26 articles were rated by four to six participants, and 49 by two or three participants. Overall, percentage agreement was appropriate (68% was above 80% agreement), and the kappa coefficients for the COSMIN items were low (61% was below 0.40, 6% was above 0.75). Reasons for low inter-rater agreement were need for subjective judgement, and accustom to different standards, terminology and definitions.Conclusions: Results indicated that raters often choose the same response option, but that it is difficult on item level to distinguish between articles. When using the COSMIN checklist in a systematic review, we recommend getting some training and experience, completing it by two independent raters, and reaching consensus on one final rating. Instructions for using the checklist are improved.
Resumo:
Background: Choosing an adequate measurement instrument depends on the proposed use of the instrument, the concept to be measured, the measurement properties (e.g. internal consistency, reproducibility, content and construct validity, responsiveness, and interpretability), the requirements, the burden for subjects, and costs of the available instruments. As far as measurement properties are concerned, there are no sufficiently specific standards for the evaluation of measurement properties of instruments to measure health status, and also no explicit criteria for what constitutes good measurement properties. In this paper we describe the protocol for the COSMIN study, the objective of which is to develop a checklist that contains COnsensus-based Standards for the selection of health Measurement INstruments, including explicit criteria for satisfying these standards. We will focus on evaluative health related patient-reported outcomes (HR-PROs), i.e. patient-reported health measurement instruments used in a longitudinal design as an outcome measure, excluding health care related PROs, such as satisfaction with care or adherence. The COSMIN standards will be made available in the form of an easily applicable checklist.Method: An international Delphi study will be performed to reach consensus on which and how measurement properties should be assessed, and on criteria for good measurement properties. Two sources of input will be used for the Delphi study: (1) a systematic review of properties, standards and criteria of measurement properties found in systematic reviews of measurement instruments, and (2) an additional literature search of methodological articles presenting a comprehensive checklist of standards and criteria. The Delphi study will consist of four (written) Delphi rounds, with approximately 30 expert panel members with different backgrounds in clinical medicine, biostatistics, psychology, and epidemiology. The final checklist will subsequently be field-tested by assessing the inter-rater reproducibility of the checklist.Discussion: Since the study will mainly be anonymous, problems that are commonly encountered in face-to-face group meetings, such as the dominance of certain persons in the communication process, will be avoided. By performing a Delphi study and involving many experts, the likelihood that the checklist will have sufficient credibility to be accepted and implemented will increase.
Resumo:
The educational system in Spain is undergoing a reorganization. At present, high-school graduates who want to enroll at a public university must take a set of examinations Pruebas de Aptitud para el Acceso a la Universidad (PAAU). A "new formula" (components, weights, type of exam,...) for university admission is been discussed. The present paper summarizes part of the research done by the author in her PhD. The context for this thesis is the evaluation of large-scale and complex systems of assessment. The main objectives were: to achieve a deep knowledge of the entire university admissions process in Spain, to discover the main sources of uncertainty and topromote empirical research in a continual improvement of the entire process. Focusing in the suitable statistical models and strategies which allow to high-light the imperfections of the system and reduce them, the paper develops, among other approaches, some applications of multilevel modeling.
Resumo:
The context where the university admissions exams are performed is presented and the main concerns about this exams are outlined and discussed from a statistical point of view. The paper offers an illustration of the use of random coefficient models in the study of educational data. The association between two individual scores (one internal and the other external to the school) and the effect of the school in the external exam is analized by a regression model with random intercept and fixed slope. A variance component model for the analysis of the grading process is also presented. The paper ends with an outline of the main findings and the presentation of some specific proposals to improve and control the equity of the system. Some pedagogic reflections are also included.
Resumo:
The examinations taken by high-school graduates in Spain and the role ofthe examination in the university admissions process are described. Thefollowing issues arising in the assessment of the process are discussed:reliability of grading, comparability of the grades and scores(equating),maintenance of standards, and compilation and use of the grading process,and their integration in the operational grading are proposed. Variousschemes for score adjustment are reviewed and feasibility of theirimplementation discussed. The advantages of pretesting of items and ofempirical checks of experts' judgements are pointed out. The paperconcludes with an outline of a planned reorganisation of the highereducation in Spain, and with a call for a comprehensive programme ofempirical research concurrent with the operation of the examination andscoring system.
Resumo:
Background In an agreement assay, it is of interest to evaluate the degree of agreement between the different methods (devices, instruments or observers) used to measure the same characteristic. We propose in this study a technical simplification for inference about the total deviation index (TDI) estimate to assess agreement between two devices of normally-distributed measurements and describe its utility to evaluate inter- and intra-rater agreement if more than one reading per subject is available for each device. Methods We propose to estimate the TDI by constructing a probability interval of the difference in paired measurements between devices, and thereafter, we derive a tolerance interval (TI) procedure as a natural way to make inferences about probability limit estimates. We also describe how the proposed method can be used to compute bounds of the coverage probability. Results The approach is illustrated in a real case example where the agreement between two instruments, a handle mercury sphygmomanometer device and an OMRON 711 automatic device, is assessed in a sample of 384 subjects where measures of systolic blood pressure were taken twice by each device. A simulation study procedure is implemented to evaluate and compare the accuracy of the approach to two already established methods, showing that the TI approximation produces accurate empirical confidence levels which are reasonably close to the nominal confidence level. Conclusions The method proposed is straightforward since the TDI estimate is derived directly from a probability interval of a normally-distributed variable in its original scale, without further transformations. Thereafter, a natural way of making inferences about this estimate is to derive the appropriate TI. Constructions of TI based on normal populations are implemented in most standard statistical packages, thus making it simpler for any practitioner to implement our proposal to assess agreement.
Resumo:
Objectives: The objectives of this study is to review the set of criteria of the Institute of Medicine (IOM) for priority-setting in research with addition of new criteria if necessary, and to develop and evaluate the reliability and validity of the final priority score. Methods: Based on the evaluation of 199 research topics, forty-five experts identified additional criteria for priority-setting, rated their relevance, and ranked and weighted them in a three-round modified Delphi technique. A final priority score was developed and evaluated. Internal consistency, test–retest and inter-rater reliability were assessed. Correlation with experts’ overall qualitative topic ratings were assessed as an approximation to validity. Results: All seven original IOM criteria were considered relevant and two new criteria were added (“potential for translation into practice”, and “need for knowledge”). Final ranks and relative weights differed from those of the original IOM criteria: “research impact on health outcomes” was considered the most important criterion (4.23), as opposed to “burden of disease” (3.92). Cronbach’s alpha (0.75) and test–retest stability (interclass correlation coefficient = 0.66) for the final set of criteria were acceptable. The area under the receiver operating characteristic curve for overall assessment of priority was 0.66. Conclusions: A reliable instrument for prioritizing topics in clinical and health services research has been developed. Further evaluation of its validity and impact on selecting research topics is required