66 resultados para rater reliability

em BORIS: Bern Open Repository and Information System - Berna - Suiça


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Pulmonary Embolism Severity Index (PESI) is a validated clinical prognostic model for patients with acute pulmonary embolism (PE). Our goal was to assess the PESI's inter-rater reliability in patients diagnosed with PE. We prospectively identified consecutive patients diagnosed with PE in the emergency department of a Swiss teaching hospital. For all patients, resident and attending physician raters independently collected the 11 PESI variables. The raters then calculated the PESI total point score and classified patients into one of five PESI risk classes (I-V) and as low (risk classes I/II) versus higher-risk (risk classes III-V). We examined the inter-rater reliability for each of the 11 PESI variables, the PESI total point score, assignment to each of the five PESI risk classes, and classification of patients as low versus higher-risk using kappa ( ) and intra-class correlation coefficients (ICC). Among 48 consecutive patients with an objective diagnosis of PE, reliability coefficients between resident and attending physician raters were > 0.60 for 10 of the 11 variables comprising the PESI. The inter-rater reliability for the PESI total point score (ICC: 0.89, 95% CI: 0.81-0.94), PESI risk class assignment ( : 0.81, 95% CI: 0.66-0.94), and the classification of patients as low versus higher-risk ( : 0.92, 95% CI: 0.72-0.98) was near perfect. Our results demonstrate the high reproducibility of the PESI, supporting the use of the PESI for risk stratification of patients with PE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND The abstraction of data from medical records is a widespread practice in epidemiological research. However, studies using this means of data collection rarely report reliability. Within the Transition after Childhood Cancer Study (TaCC) which is based on a medical record abstraction, we conducted a second independent abstraction of data with the aim to assess a) intra-rater reliability of one rater at two time points; b) the possible learning effects between these two time points compared to a gold-standard; and c) inter-rater reliability. METHOD Within the TaCC study we conducted a systematic medical record abstraction in the 9 Swiss clinics with pediatric oncology wards. In a second phase we selected a subsample of medical records in 3 clinics to conduct a second independent abstraction. We then assessed intra-rater reliability at two time points, the learning effect over time (comparing each rater at two time-points with a gold-standard) and the inter-rater reliability of a selected number of variables. We calculated percentage agreement and Cohen's kappa. FINDINGS For the assessment of the intra-rater reliability we included 154 records (80 for rater 1; 74 for rater 2). For the inter-rater reliability we could include 70 records. Intra-rater reliability was substantial to excellent (Cohen's kappa 0-6-0.8) with an observed percentage agreement of 75%-95%. In all variables learning effects were observed. Inter-rater reliability was substantial to excellent (Cohen's kappa 0.70-0.83) with high agreement ranging from 86% to 100%. CONCLUSIONS Our study showed that data abstracted from medical records are reliable. Investigating intra-rater and inter-rater reliability can give confidence to draw conclusions from the abstracted data and increase data quality by minimizing systematic errors.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The aim of this study was to evaluate the reliability of the cardiothoracic ratio (CTR) in postmortem computed tomography (PMCT) and to assess a CTR threshold for the diagnosis of cardiomegaly based on the weight of the heart at autopsy. PMCT data of 170 deceased human adults were retrospectively evaluated by two blinded radiologists. The CTR was measured on axial computed tomography images and the actual cardiac weight was weighed at autopsy. Inter-rater reliability, sensitivity, and specificity were calculated. Receiver operating characteristic curves were calculated to assess enlarged heart weight by CTR. The autopsy definition of cardiomegaly was based on normal values of the Zeek method (within a range of both, one or two SD) and the Smith method (within the given range). Intra-class correlation coefficients demonstrated excellent agreements (0.983) regarding CTR measurements. In 105/170 (62 %) cases the CTR in PMCT was >0.5, indicating enlarged heart weight, according to clinical references. The mean heart weight measured in autopsy was 405 ± 105 g. As a result, 114/170 (67 %) cases were interpreted as having enlarged heart weights according to the normal values of Zeek within one SD, while 97/170 (57 %) were within two SD. 100/170 (59 %) were assessed as enlarged according to Smith's normal values. The sensitivity/specificity of the 0.5 cut-off of the CTR for the diagnosis of enlarged heart weight was 78/71 % (Zeek one SD), 74/55 % (Zeek two SD), and 76/59 % (Smith), respectively. The discriminative power between normal heart weight and cardiomegaly was 79, 73, and 74 % for the Zeek (1SD/2SD) and Smith methods respectively. Changing the CTR threshold to 0.57 resulted in a minimum specificity of 95 % for all three definitions of cardiomegaly. With a CTR threshold of 0.57, cardiomegaly can be identified with a very high specificity. This may be useful if PMCT is used by forensic pathologists as a screening tool for medico-legal autopsies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

OBJECTIVES To test the inter-rater reliability of the RoB tool applied to Physical Therapy (PT) trials by comparing ratings from Cochrane review authors with those of blinded external reviewers. METHODS Randomized controlled trials (RCTs) in PT were identified by searching the Cochrane Database of Systematic Reviews for meta-analysis of PT interventions. RoB assessments were conducted independently by 2 reviewers blinded to the RoB ratings reported in the Cochrane reviews. Data on RoB assessments from Cochrane reviews and other characteristics of reviews and trials were extracted. Consensus assessments between the two reviewers were then compared with the RoB ratings from the Cochrane reviews. Agreement between Cochrane and blinded external reviewers was assessed using weighted kappa (κ). RESULTS In total, 109 trials included in 17 Cochrane reviews were assessed. Inter-rater reliability on the overall RoB assessment between Cochrane review authors and blinded external reviewers was poor (κ  =  0.02, 95%CI: -0.06, 0.06]). Inter-rater reliability on individual domains of the RoB tool was poor (median κ  = 0.19), ranging from κ  =  -0.04 ("Other bias") to κ  =  0.62 ("Sequence generation"). There was also no agreement (κ  =  -0.29, 95%CI: -0.81, 0.35]) in the overall RoB assessment at the meta-analysis level. CONCLUSIONS Risk of bias assessments of RCTs using the RoB tool are not consistent across different research groups. Poor agreement was not only demonstrated at the trial level but also at the meta-analysis level. Results have implications for decision making since different recommendations can be reached depending on the group analyzing the evidence. Improved guidelines to consistently apply the RoB tool and revisions to the tool for different health areas are needed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background: Virtual patients (VPs) are increasingly used to train clinical reasoning. So far, no validated evaluation instruments for VP design are available. Aims: We examined the validity of an instrument for assessing the perception of VP design by learners. Methods: Three sources of validity evidence were examined: (i) Content was examined based on theory of clinical reasoning and an international VP expert team. (ii) The response process was explored in think-aloud pilot studies with medical students and in content analyses of free text questions accompanying each item of the instrument. (iii) Internal structure was assessed by exploratory factor analysis (EFA) and inter-rater reliability by generalizability analysis. Results: Content analysis was reasonably supported by the theoretical foundation and the VP expert team. The think-aloud studies and analysis of free text comments supported the validity of the instrument. In the EFA, using 2547 student evaluations of a total of 78 VPs, a three-factor model showed a reasonable fit with the data. At least 200 student responses are needed to obtain a reliable evaluation of a VP on all three factors. Conclusion: The instrument has the potential to provide valid information about VP design, provided that many responses per VP are available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Abstractor training is a key element in creating valid and reliable data collection procedures. The choice between in-person vs. remote or simultaneous vs. sequential abstractor training has considerable consequences for time and resource utilization. We conducted a web-based (webinar) abstractor training session to standardize training across six individual Cancer Research Network (CRN) sites for a study of breast cancer treatment effects in older women (BOWII). The goals of this manuscript are to describe the training session, its participants and participants' evaluation of webinar technology for abstraction training. Findings A webinar was held for all six sites with the primary purpose of simultaneously training staff and ensuring consistent abstraction across sites. The training session involved sequential review of over 600 data elements outlined in the coding manual in conjunction with the display of data entry fields in the study's electronic data collection system. Post-training evaluation was conducted via Survey Monkey©. Inter-rater reliability measures for abstractors within each site were conducted three months after the commencement of data collection. Ten of the 16 people who participated in the training completed the online survey. Almost all (90%) of the 10 trainees had previous medical record abstraction experience and nearly two-thirds reported over 10 years of experience. Half of the respondents had previously participated in a webinar, among which three had participated in a webinar for training purposes. All rated the knowledge and information delivered through the webinar as useful and reported it adequately prepared them for data collection. Moreover, all participants would recommend this platform for multi-site abstraction training. Consistent with participant-reported training effectiveness, results of data collection inter-rater agreement within sites ranged from 89 to 98%, with a weighted average of 95% agreement across sites. Conclusions Conducting training via web-based technology was an acceptable and effective approach to standardizing medical record review across multiple sites for this group of experienced abstractors. Given the substantial time and cost savings achieved with the webinar, coupled with participants' positive evaluation of the training session, researchers should consider this instructional method as part of training efforts to ensure high quality data collection in multi-site studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJECTIVE: To evaluate the agreement of blood pressure measurements and hypertension scores obtained by use of 3 indirect arterial blood pressure measurement devices in hospitalized dogs. Design-Diagnostic test evaluation. ANIMALS: 29 client-owned dogs. PROCEDURES: 5 to 7 consecutive blood pressure readings were obtained from each dog on each of 3 occasions with a Doppler ultrasonic flow detector, a standard oscillometric device (STO), and a high-definition oscillometric device (HDO). RESULTS: When the individual sets of 5 to 7 readings were evaluated, the coefficient of variation for systolic arterial blood pressure (SAP) exceeded 20% for 0% (Doppler), 11 % (STO), and 28% (HDO) of the sets of readings. After readings that exceeded a 20% coefficient of variation were discarded, repeatability was within 25 (Doppler), 37 (STO), and 39 (HDO) mm Hg for SAP. Correlation of mean values among the devices was between 0.47 and 0.63. Compared with Doppler readings, STO underestimated and HDO overestimated SAP. Limits of agreement between mean readings of any 2 devices were wide. With the hypertension scale used to score SAP, the intraclass correlation of scores was 0.48. Linear-weighted inter-rater reliability between scores was 0.40 (Doppler vs STO), 0.38 (Doppler vs HDO), and 0.29 (STO vs HDO). CONCLUSIONS AND CLINICAL RELEVANCE: Results of this study suggested that no meaningful clinical comparison can be made between blood pressure readings obtained from the same dog with different indirect blood pressure measurement devices.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

INTRODUCTION Hemodynamic management in intensive care patients guided by blood pressure and flow measurements often do not sufficiently reveal common hemodynamic problems. Trans-esophageal echocardiography (TEE) allows for direct measurement of cardiac volumes and function. A new miniaturized probe for TEE (mTEE) potentially provides a rapid and simplified approach to monitor cardiac function. The aim of the study was to assess the feasibility of hemodynamic monitoring using mTEE in critically ill patients after a brief operator training period. METHODS In the context of the introduction of mTEE in a large ICU, 14 ICU staff specialists with no previous TEE experience received six hours of training as mTEE operators. The feasibility of mTEE and the quality of the obtained hemodynamic information were assessed. Three standard views were acquired in hemodynamically unstable patients: 1) for assessment of left ventricular function (LV) fractional area change (FAC) was obtained from a trans-gastric mid-esophageal short axis view, 2) right ventricular (RV) size was obtained from mid-esophageal four chamber view, and 3) superior vena cava collapsibility for detection of hypovolemia was assessed from mid-esophageal ascending aortic short axis view. Off-line blinded assessment by an expert cardiologist was considered as a reference. Inter-rater agreement was assessed using Chi-square tests or correlation analysis as appropriate. RESULTS In 55 patients, 148 mTEE examinations were performed. Acquisition of loops in sufficient quality was possible in 110 examinations for trans-gastric mid-esophageal short axis, 118 examinations for mid-esophageal four chamber and 125 examinations for mid-esophageal ascending aortic short axis view. Inter-rater agreement (Kappa) between ICU mTEE operators and the reference was 0.62 for estimates of LV function, 0.65 for RV dilatation, 0.76 for hypovolemia and 0.77 for occurrence of pericardial effusion (all P < 0.0001). There was a significant correlation between the FAC measured by ICU operators and the reference (r = 0.794, P (one-tailed) < 0.0001). CONCLUSIONS Echocardiographic examinations using mTEE after brief bed-side training were feasible and of sufficient quality in a majority of examined ICU patients with good inter-rater reliability between mTEE operators and an expert cardiologist. Further studies are required to assess the impact of hemodynamic monitoring by mTEE on relevant patient outcomes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The "Ardouin Scale of Behavior in Parkinson's Disease" is a new instrument specifically designed for assessing mood and behavior with a view to quantifying changes related to Parkinson's disease, to dopaminergic medication, and to non-motor fluctuations. This study was aimed at analyzing the psychometric attributes of this scale in patients with Parkinson's disease without dementia. In addition to this scale, the following measures were applied: the Unified Parkinson's Disease Rating Scale, the Montgomery and Asberg Depression Rating Scale, the Lille Apathy Rating Scale, the Bech and Rafaelsen Mania Scale, the Positive and Negative Syndrome Scale, the MacElroy Criteria, the Patrick Carnes criteria, the Hospital Anxiety and Depression Scale, and the Mini-International Neuropsychiatric Interview. Patients (n = 260) were recruited at 13 centers across four countries (France, Spain, United Kingdom, and United States). Cronbach's alpha coefficient for domains ranged from 0.69 to 0.78. Regarding test-retest reliability, the kappa coefficient for items was higher than 0.4. For inter-rater reliability, the kappa values were 0.29 to 0.81. Furthermore, most of the items from the Ardouin Scale of Behavior in Parkinson's Disease correlated with the corresponding items of the other scales, depressed mood with the Montgomery and Asberg Depression Rating Scale (ρ = 0.82); anxiety with the Hospital Anxiety and Depression Scale-anxiety (ρ = 0.56); apathy with the Lille Apathy Rating Scale (ρ = 0.60). The Ardouin Scale of Behavior in Parkinson's disease is an acceptable, reproducible, valid, and precise assessment for evaluating changes in behavior in patients with Parkinson's disease without dementia. © 2015 International Parkinson and Movement Disorder Society.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MRSI grids frequently show spectra with poor quality, mainly because of the high sensitivity of MRS to field inhomogeneities. These poor quality spectra are prone to quantification and/or interpretation errors that can have a significant impact on the clinical use of spectroscopic data. Therefore, quality control of the spectra should always precede their clinical use. When performed manually, quality assessment of MRSI spectra is not only a tedious and time-consuming task, but is also affected by human subjectivity. Consequently, automatic, fast and reliable methods for spectral quality assessment are of utmost interest. In this article, we present a new random forest-based method for automatic quality assessment of (1) H MRSI brain spectra, which uses a new set of MRS signal features. The random forest classifier was trained on spectra from 40 MRSI grids that were classified as acceptable or non-acceptable by two expert spectroscopists. To account for the effects of intra-rater reliability, each spectrum was rated for quality three times by each rater. The automatic method classified these spectra with an area under the curve (AUC) of 0.976. Furthermore, in the subset of spectra containing only the cases that were classified every time in the same way by the spectroscopists, an AUC of 0.998 was obtained. Feature importance for the classification was also evaluated. Frequency domain skewness and kurtosis, as well as time domain signal-to-noise ratios (SNRs) in the ranges 50-75 ms and 75-100 ms, were the most important features. Given that the method is able to assess a whole MRSI grid faster than a spectroscopist (approximately 3 s versus approximately 3 min), and without loss of accuracy (agreement between classifier trained with just one session and any of the other labelling sessions, 89.88%; agreement between any two labelling sessions, 89.03%), the authors suggest its implementation in the clinical routine. The method presented in this article was implemented in jMRUI's SpectrIm plugin. Copyright © 2016 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Only few standardized apraxia scales are available and they do not cover all domains and semantic features of gesture production. Therefore, the objective of the present study was to evaluate the reliability and validity of a newly developed test of upper limb apraxia (TULIA), which is comprehensive and still short to administer. METHODS: The TULIA consists of 48 items including imitation and pantomime domain of non-symbolic (meaningless), intransitive (communicative) and transitive (tool related) gestures corresponding to 6 subtests. A 6-point scoring method (0-5) was used (score range 0-240). Performance was assessed by blinded raters based on videos in 133 stroke patients, 84 with left hemisphere damage (LHD) and 49 with right hemisphere damage (RHD), as well as 50 healthy subjects (HS). RESULTS: The clinimetric findings demonstrated mostly good to excellent internal consistency, inter- and intra-rater (test-retest) reliability, both at the level of the six subtests and at individual item level. Criterion validity was evaluated by confirming hypotheses based on the literature. Construct validity was demonstrated by a high correlation (r = 0.82) with the De Renzi-test. CONCLUSION: These results show that the TULIA is both a reliable and valid test to systematically assess gesture production. The test can be easily applied and is therefore useful for both research purposes and clinical practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent studies have shown that the nociceptive withdrawal reflex threshold (NWR-T) and the electrical pain threshold (EP-T) are reliable measures in pain-free populations. However, it is necessary to investigate the reliability of these measures in patients with chronic pain in order to translate these techniques from laboratory to clinic. The aims of this study were to determine the test-retest reliability of the NWR-T and EP-T after single and repeated (temporal summation) electrical stimulation in a group of patients with chronic low back pain, and to investigate the association between the NWR-T and the EP-T. To this end, 25 patients with chronic pain participated in three identical sessions, separated by 1 week in average, in which the NWR-T and the EP-T to single and repeated stimulation were measured. Test-retest reliability was assessed using intra-class correlation coefficient (ICC), coefficient of variation (CV), and Bland-Altman analysis. The association between the thresholds was assessed using the coefficient of determination (r (2)). The results showed good-to-excellent reliability for both NWR-T and EP-T in all cases, with average ICC values ranging 0.76-0.90 and average CV values ranging 12.0-17.7%. The association between thresholds was better after repeated stimulation than after single stimulation, with average r (2) values of 0.83 and 0.56, respectively. In conclusion, the NWR-T and the EP-T are reliable assessment tools for assessing the sensitivity of spinal nociceptive pathways in patients with chronic pain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

End-stage ankle arthritis should have an appropriate classification to assist surgeons in the management of end-stage ankle arthritis. Outcomes research also requires a classification system to stratify patients appropriately.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background To assess the criterion and construct validity of the KIDSCREEN-10 well-being and health-related quality of life (HRQoL) score, a short version of the KIDSCREEN-52 and KIDSCREEN-27 instruments. Methods The child self-report and parent report versions of the KIDSCREEN-10 were tested in a sample of 22,830 European children and adolescents aged 8–18 and their parents (n = 16,237). Correlation with the KIDSCREEN-52 and associations with other generic HRQoL measures, physical and mental health, and socioeconomic status were examined. Score differences by age, gender, and country were investigated. Results Correlations between the 10-item KIDSCREEN score and KIDSCREEN-52 scales ranged from r = 0.24 to 0.72 (r = 0.27–0.72) for the self-report version (proxy-report version). Coefficients below r = 0.5 were observed for the KIDSCREEN-52 dimensions Financial Resources and Being Bullied only. Cronbach alpha was 0.82 (0.78), test–retest reliability was ICC = 0.70 (0.67) for the self- (proxy-)report version. Correlations between other children self-completed HRQoL questionnaires and KIDSCREEN-10 ranged from r = 0.43 to r = 0.63 for the KIDSCREEN children self-report and r = 0.22–0.40 for the KIDSCREEN parent proxy report. Known group differences in HRQoL between physically/mentally healthy and ill children were observed in the KIDSCREEN-10 self and proxy scores. Associations with self-reported psychosomatic complaints were r = −0.52 (−0.36) for the KIDSCREEN-10 self-report (proxy-report). Statistically significant differences in KIDSCREEN-10 self and proxy scores were found by socioeconomic status, age, and gender. Conclusions Our results indicate that the KIDSCREEN-10 provides a valid measure of a general HRQoL factor in children and adolescents, but the instrument does not represent well most of the single dimensions of the original KIDSCREEN-52. Test–retest reliability was slightly below a priori defined thresholds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This in situ study evaluated the discriminatory power and reliability of methods of dental plaque quantification and the relationship between visual indices (VI) and fluorescence camera (FC) to detect plaque.