73 resultados para rater reliability
em Université de Lausanne, Switzerland
Resumo:
The Pulmonary Embolism Severity Index (PESI) is a validated clinical prognostic model for patients with acute pulmonary embolism (PE). Our goal was to assess the PESI's inter-rater reliability in patients diagnosed with PE. We prospectively identified consecutive patients diagnosed with PE in the emergency department of a Swiss teaching hospital. For all patients, resident and attending physician raters independently collected the 11 PESI variables. The raters then calculated the PESI total point score and classified patients into one of five PESI risk classes (I-V) and as low (risk classes I/II) versus higher-risk (risk classes III-V). We examined the inter-rater reliability for each of the 11 PESI variables, the PESI total point score, assignment to each of the five PESI risk classes, and classification of patients as low versus higher-risk using kappa (κ) and intra-class correlation coefficients (ICC). Among 48 consecutive patients with an objective diagnosis of PE, reliability coefficients between resident and attending physician raters were > 0.60 for 10 of the 11 variables comprising the PESI. The inter-rater reliability for the PESI total point score (ICC: 0.89, 95% CI: 0.81-0.94), PESI risk class assignment (κ: 0.81, 95% CI: 0.66-0.94), and the classification of patients as low versus higher-risk (κ: 0.92, 95% CI: 0.72-0.98) was near perfect. Our results demonstrate the high reproducibility of the PESI, supporting the use of the PESI for risk stratification of patients with PE.
Resumo:
Intraclass correlation (ICC) is an established tool to assess inter-rater reliability. In a seminal paper published in 1979, Shrout and Fleiss considered three statistical models for inter-rater reliability data with a balanced design. In their first two models, an infinite population of raters was considered, whereas in their third model, the raters in the sample were considered to be the whole population of raters. In the present paper, we show that the two distinct estimates of ICC developed for the first two models can both be applied to the third model and we discuss their different interpretations in this context.
Resumo:
The evaluation of children's statements of sexual abuse cases in forensic cases is critically important and must and reliable. Criteria-based content analysis (CBCA) is the main component of the statement validity assessment (SVA), which is the most frequently used approach in this setting. This study investigated the inter-rater reliability (IRR) of CBCA in a forensic context. Three independent raters evaluated the transcripts of 95 statements of sexual abuse. IRR was calculated for each criterion, total score, and overall evaluation. The IRR was variable for the criteria, with several being unsatisfactory. But high IRR was found for the total CBCA scores (Kendall's W = 0.84) and for overall evaluation (Kendall's W = 0.65). Despite some shortcomings, SVA remains a robust method to be used in the comprehensive evaluation of children's statements of sexual abuse in the forensic setting. However, the low IRR of some CBCA criteria could justify some technical improvements.
Resumo:
The National Institute of Mental Health developed the semi-structured Diagnostic Interview for Genetic Studies (DIGS) for the assessment of major mood and psychotic disorders and their spectrum conditions. The DIGS was translated into French in a collaborative effort of investigators from sites in France and Switzerland. Inter-rater and test-retest reliability of the French version have been established in a clinical sample in Lausanne. Excellent inter-rater reliability was found for schizophrenia, bipolar disorder, major depression, and unipolar schizoaffective disorder while fair inter-rater reliability was demonstrated for bipolar schizoaffective disorder. Using a six-week test-retest interval, reliability for all diagnoses was found to be fair to good with the exception of bipolar schizoaffective disorder. The lower test-retest reliability was the result of a relatively long test-retest interval that favored incomplete symptom recall. In order to increase reliability for lifetime diagnoses in persons not currently affected, best-estimate procedures using additional sources of diagnostic information such as medical records and reports from relatives should supplement DIGS information in family-genetic studies. Within such a procedure, the DIGS appears to be a useful part of data collection for genetic studies on major mood disorders and schizophrenia in French-speaking populations.
Resumo:
The semi-structured diagnostic interview for genetic studies (DIGS) was developed to assess major mood and psychotic disorders and their spectrum manifestations in genetic studies. Our research group developed a French version of the DIGS and tested its inter-rater and test-retest reliability in psychiatric patients. In this article, we present estimates of the reliability of substance use and antisocial personality disorders. High kappa coefficients for inter-rater reliability were found for drug and alcohol as well as antisocial personality diagnoses and slightly lower kappas for test-retest reliability. Combined with evidence of the reliability of major mood and psychotic disorders, these findings support the suitability of the DIGS for studies of familial aggregation and comorbidity of psychiatric disorders including substance use and antisocial personality disorders.
Resumo:
This study examined the validity and reliability of the French version of two observer-rated measures developed to assess cognitive errors (cognitive errors rating system [CERS]) [6] and coping action patterns (coping action patterns rating system [CAPRS]) [22,24]. The CE measures 14 cognitive errors, broken down according to their valence positive or negative (see the definitions by A.T. Beck), and the CAP measures 12 coping categories, based on an comprehensive review literature, each broken down into three levels of action (affective, behavioural, cognitive). Thirty (N = 30) subjects recruited in a community sample participated in the study. They were interviewed according to a standardized clinical protocol: these interviews were transcribed and analysed with both observer-rated systems. Results showed that the inter-rater reliability of the two measures is good and that their internal validity is satisfactory, due to a non-significant canonical correlation between CAP and CE. With regard to discriminant validity, we found a non-significant canonical correlation between CAPRS and CISS, one of most widely used self-report questionnaire measuring coping. The same can be said for the correlation with a self-report questionnaire measuring symptoms (SCL-90-R). These results confirm the absence of confounds in the assessment of cognitive errors and of coping as assessed by these observer-rated scales and add an argument in favour of the French validation of the CE-CAP rating scales. (C) 2010 Elsevier Masson SAS. All rights reserved.
Resumo:
Purpose: Many countries used the PGMI (P=perfect, G=good, M=moderate, I=inadequate) classification system for assessing the quality of mammograms. Limits inherent to the subjectivity of this classification have been shown. Prior to introducing this system in Switzerland, we wanted to better understand the origin of this subjectivity in order to minimize it. Our study aimed at identifying the main determinants of the variability of the PGMI system and which criteria are the most subjected to subjectivity. Methods and Materials: A focus group composed of 2 experienced radiographers and 2 radiologists specified each PGMI criterion. Ten raters (6 radiographers and 4 radiologists) evaluated twice a panel of 40 randomly selected mammograms (20 analogic and 20 digital) according to these specified PGMI criteria. The PGMI classification was assessed and the intra- and inter-rater reliability was tested for each professional group (radiographer vs radiologist), image technology (analogic vs digital) and PGMI criterion. Results: Some 3,200 images were assessed. The intra-rater reliability appears to be weak, particularly in respect to inter-rater variability. Subjectivity appears to be largely independent of the professional group and image technology. Aspects of the PGMI classification criteria most subjected to variability were identified. Conclusion: Post-test discussions enabled to specify more precisely some criteria. This should reduce subjectivity when applying the PGMI classification system. A concomitant, important effort in training radiographers is also necessary.
Resumo:
INTRODUCTION: Quantitative sensory testing (QST) is widely used in human research to investigate the integrity of the sensory function in patients with pain of neuropathic origin, or other causes such as low back pain. Reliability of QST has been evaluated on both sides of the face, hands and feet as well as on the trunk (Th3-L3). In order to apply these tests on other body-parts such as the lower lumbar spine, it is important first to establish reliability on healthy individuals. The aim of this study was to investigate intra-rater reliability of thermal QST in healthy adults, on two sites within the L5 dermatome of the lumbar spine and lower extremity. METHODS: Test-retest reliability of thermal QST was determined at the L5-level of the lumbar spine and in the same dermatome on the lower extremity in 30 healthy persons under 40 years of age. Results were analyzed using descriptive statistics and intraclass correlation coefficient (ICC). Values were compared to normative data, using Z-transformation. RESULTS: Mean intraindividual differences were small for cold and warm detection thresholds but larger for pain thresholds. ICC values showed excellent reliability for warm detection and heat pain threshold, good-to-excellent reliability for cold pain threshold and fair-to-excellent reliability for cold detection threshold. ICC had large ranges of confidence interval (95%). CONCLUSION: In healthy adults, thermal QST on the lumbar spine and lower extremity demonstrated fair-to-excellent test-retest reliability.
Resumo:
We present the first steps in the validation of an observational tool for father-mother-infant interactions: the FAAS (Family Alliance Assessment Scales). Family-level variables are acknowledged as unique contributors to the understanding of the socio-affective development of the child, yet producing reliable assessments of family-level interactions poses a methodological challenge. There is, therefore, a clear need for a validated and clinically relevant tool. This validation study has been carried out on three samples: one non-referred sample, of families taking part in a study on the transition to parenthood (normative sample; n = 30), one referred for medically assisted procreation (infertility sample; n = 30) and one referred for a psychiatric condition in one parent (clinical sample; n = 15). Results show that the FAAS scales have (1) good inter-rater reliability and (2) good validity, as assessed through known-group validity by comparing the three samples and through concurrent validity by checking family interactions against parents' self-reported marital satisfaction.
Resumo:
Introduction Occupational therapists could play an important role in facilitating driving cessation for ageing drivers. This, however, requires an easy-to-learn, standardised on-road evaluation method. This study therefore investigates whether use of P-drive' could be reliably taught to occupational therapists via a short half-day training session. Method Using the English 26-item version of P-drive, two occupational therapists evaluated the driving ability of 24 home-dwelling drivers aged 70 years or over on a standardised on-road route. Experienced driving instructors' on-road, subjective evaluations were then compared with P-drive scores. Results Following a short half-day training session, P-drive was shown to have almost perfect between-rater reliability (ICC2,1=0.950, 95% CI 0.889 to 0.978). Reliability was stable across sessions including the training phase even if occupational therapists seemed to become slightly less severe in their ratings with experience. P-drive's score was related to the driving instructors' subjective evaluations of driving skills in a non-linear manner (R-2=0.445, p=0.021). Conclusion P-drive is a reliable instrument that can easily be taught to occupational therapists and implemented as a way of standardising the on-road driving test.
Resumo:
Experts in the field of conversion disorder have suggested for the upcoming DSM-V edition to put less weight on the associated psychological factors and to emphasise the role of clinical findings. Indeed, a critical step in reaching a diagnosis of conversion disorder is careful bedside neurological examination, aimed at excluding organic signs and identifying 'positive' signs suggestive of a functional disorder. These positive signs are well known to all trained neurologists but their validity is still not established. The aim of this study is to provide current evidence regarding their sensitivity and specificity. We conducted a systematic search on motor, sensory and gait functional signs in Embase, Medline, PsycINfo from 1965 to June 2012. Studies in English, German or French reporting objective data on more than 10 participants in a controlled design were included in a systematic review. Other relevant signs are discussed in a narrative review. Eleven controlled studies (out of 147 eligible articles) describing 14 signs (7 motor, 5 sensory, 2 gait) reported low sensitivity of 8-100% but high specificity of 92-100%. Studies were evidence class III, only two had a blinded design and none reported on inter-rater reliability of the signs. Clinical signs for functional neurological symptoms are numerous but only 14 have been validated; overall they have low sensitivity but high specificity and their use should thus be recommended, especially with the introduction of the new DSM-V criteria.
Resumo:
Crizotinib is a first-in-class oral anaplastic lymphoma kinase (ALK) inhibitor targeting ALK-rearranged non-small-cell lung cancer. The therapy was approved by the US FDA in August 2011 and received conditional marketing approval by the European Commission in October 2012 for advanced non-small-cell lung cancer. A break-apart FISH-based assay was jointly approved with crizotinib by the FDA. This assay and an immunohistochemistry assay that uses a D5F3 rabbit monoclonal primary antibody were also approved for marketing in Europe in October 2012. While ALK rearrangement has relatively low prevalence, a clinical benefit is exhibited in more than 85% of patients with median progression-free survival of 8-10 months. In this article, the authors summarize the therapy and alternative test strategies for identifying patients who are likely to respond to therapy, including key issues for effective and efficient testing. The key economic considerations regarding the joint companion diagnostic and therapy are also presented. Given the observed clinical benefit and relatively high cost of crizotinib therapy, companion diagnostics should be evaluated relative to response to therapy versus correlation alone whenever possible, and both high inter-rater reliability and external quality assessment programs are warranted.
Resumo:
Objective: To test the efficacy of teaching motivational interviewing (MI) to medical students. Methods: Thirteen 4th year medical students volunteered to participate. Seven days before and 7 days after an 8-hour interactive training MI workshop, each student performed a videorecorded interview with two standardized patients: a 60 year old alcohol dependent woman and a 50 year old cigarette smoking man. Students' counseling skills were coded by two blinded clinicians using the Motivational Interviewing Treatment Integrity 3.0 (MITI). Inter-rater reliability was calculated for all interviews and a test-retest was completed in a sub-sample of 10 consecutive interviews three days apart. Difference between MITI scores before and after training were calculated and tested using non-parametric tests. Effect size was approximated by calculating the probability that posttest scores are greater than pretest scores (P*=P(Pre<Post)+1/2P(Pre=Post)), P*>1/2 indicating greater scores in posttest, P*=1/2 no effect, and P*<1/2 smaller scores in posttest. Results: Median differences between MITI scores before and after MI training indicated a general progression in MI skills: MI spirit global score (median difference=1.5, Inter quartile range=1.5, p<0.001, P*=0.90); Empathy global score (med diff=1, IQR=0.5, p<0.001, P*=0.85); Percentage of MI adherent skills (med diff=36.6, IQR=50.5, p<0.001, P*=0.85); Percentage of open questions (med diff=18.6, IQR=21.6, p<0.001, P*=0.96); reflections/ questions ratio (med diff=0.2, IQR=0.4, p<0.001, P*=0.81). Only Direction global score and the percentage of complex reflections were not significantly improved (med diff=0, IQR=1, p=0.53, P*=0.44, and med diff=4.3, IQR=24.8, p=0.48, P*=0.62, respectively). Inter-rater reliability indicated weighted kappa ranged between 0.14 for Direction to 0.51 for Collaboration and ICC ranged between 0.28 for Simple reflection to 0.95 for Closed question. Test-retests indicated weighted kappa ranged between 0.27 for Direction to 0.80 for Empathy and ICC ranged between 0.87 for Complex reflection to 0.98 for Closed question. Conclusion: This pilot study indicated that an 8-hour training in MI for voluntary 4th year medical students resulted in significant improvement of MI skills. Larger sample of unselected medical students should be studied to generalize the benefit of MI training to medical students. Interrater reliability and test-retests suggested that coders' training should be intensified.
Resumo:
OBJECTIVE: The aim of this study was to evaluate a French language version of the Adolescent Drug Abuse Diagnosis (ADAD) instrument in a Swiss sample of adolescent illicit drug and/or alcohol users. PARTICIPANTS AND SETTING: The participants in the study were 102 French-speaking adolescents aged 13-19 years who fitted the criteria of illicit drug or alcohol use (at least one substance--except tobacco--once a week during the last 3 months). They were recruited in hospitals, institutions and leisure places. Procedure. The ADAD was administered individually by trained psychologists. It was integrated into a broader protocol including alcohol and drug abuse DSM-IV diagnoses, the BDI-13 (Beck Depression Inventory), life events and treatment trajectories. RESULTS: The ADAD appears to show good inter-rater reliability; the subscales showed good internal coherence and the correlations between the composite scores and the severity ratings were moderate to high. Finally, the results confirmed good concurrent validity for three out of eight ADAD dimensions. CONCLUSIONS: The French language version of the ADAD appears to be an adequate instrument for assessing drug use and associated problems in adolescents. Despite its complexity, the instrument has acceptable validity, reliability and usefulness criteria, enabling international and transcultural comparisons.
Resumo:
Valid individualized case conceptualization methodologies, such as plan analysis, are rarely used for the psychotherapeutic treatment conceptualization and planning of bipolar affective disorder (BD), even if data do exist showing that psychotherapy interventions might be enhanced by applying such analyses for treatment planning for several groups of patients. We applied plan analysis as a research tool (Caspar, 1995) to N=30 inpatients presenting BD, who were interviewed twice. Our study aimed at producing a prototypical plan structure encompassing the most relevant data from the 30 individual case conceptualizations. Special focus was given to links with emotions and coping plans. Inter-rater reliability of these plan analyses was considered sufficient. Results suggest the presence of two subtypes based on plananalytic principles: emotion control and relationship control, along with a mixed form. These subtypes are discussed with regard to inherent plananalytic conflicts, specific emotions and coping plans, as well as symptom level and type. Finally, conclusions are drawn for enhancing psychotherapeutic practice with BD patients, based on the motive-oriented therapeutic relationship.