913 resultados para inter-rater reliability
Resumo:
Stroke is a leading cause of death and permanent disability worldwide, affecting millions of individuals. Traditional clinical scores for assessment of stroke-related impairments are inherently subjective and limited by inter-rater and intra-rater reliability, as well as floor and ceiling effects. In contrast, robotic technologies provide objective, highly repeatable tools for quantification of neurological impairments following stroke. KINARM is an exoskeleton robotic device that provides objective, reliable tools for assessment of sensorimotor, proprioceptive and cognitive brain function by means of a battery of behavioral tasks. As such, KINARM is particularly useful for assessment of neurological impairments following stroke. This thesis introduces a computational framework for assessment of neurological impairments using the data provided by KINARM. This is done by achieving two main objectives. First, to investigate how robotic measurements can be used to estimate current and future abilities to perform daily activities for subjects with stroke. We are able to predict clinical scores related to activities of daily living at present and future time points using a set of robotic biomarkers. The findings of this analysis provide a proof of principle that robotic evaluation can be an effective tool for clinical decision support and target-based rehabilitation therapy. The second main objective of this thesis is to address the emerging problem of long assessment time, which can potentially lead to fatigue when assessing subjects with stroke. To address this issue, we examine two time reduction strategies. The first strategy focuses on task selection, whereby KINARM tasks are arranged in a hierarchical structure so that an earlier task in the assessment procedure can be used to decide whether or not subsequent tasks should be performed. The second strategy focuses on time reduction on the longest two individual KINARM tasks. Both reduction strategies are shown to provide significant time savings, ranging from 30% to 90% using task selection and 50% using individual task reductions, thereby establishing a framework for reduction of assessment time on a broader set of KINARM tasks. All in all, findings of this thesis establish an improved platform for diagnosis and prognosis of stroke using robot-based biomarkers.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Objective: To assess the reliability and validity of a brief measure of quality of life recently developed by the World Health Organization, the WHOQOL-BREF, and to examine its association with a variety of clinical and sociodemographic factors in older depressed patients. Design: Cross-sectional study. Methods: Older depressed patients (N=41) underwent diagnostic assessment using the Composite International Diagnostic Interview (CIDI) and were independently assessed on a variety of measures including the WHOQOL-BREF (a 26-item self-report questionnaire generating four domain scores), Hamilton Depression Rating Scale (HAM-D); Geriatric Depression Scale (GDS); Mini-mental State Examination (MMSE); Modified Barthel Index (MBI); Instrumental activities of daily living (IADL), and measures of physical health status and social relationships. Estimates of inter-rater and test-retest reliability, and concurrent validity were made. Results: 39 subjects completed the study. The majority of subjects (94.9%) received a diagnosis of DSM-IV Major Depressive Disorder. Levels of comorbidity were high. Three of the four domains of the WHOQOL-BREF (Physical, Psychological and Environment domains) demonstrated satisfactory reliability and validity. However, the Social Relationships domain exhibited poor validity. Quality of life scores were strongly correlated with severity of depression, number of self-reported physical symptoms and self-assessed general health status. There was no relationship between diagnostic comorbidity and quality of life scores. Conclusions: The WHOQOL-BREF was successfully administered to older depressed patients although the concurrent validity of one of its four domains was poor. Quality of life scores were strongly correlated with severity of depression, raising the issue of measurement redundancy.
Resumo:
BACKGROUND: The Health of the Nation Outcome Scales was developed to routinely measure outcomes for adults with mental illness. Comparable instruments were also developed for children and adolescents (the Health of the Nation Outcome Scales for Children and Adolescents) and older people (the Health of the Nation Outcome Scales 65+). All three are being widely used as outcome measures in the United Kingdom, Australia and New Zealand. There is, however, no comprehensive review of these instruments. This paper fills this gap by reviewing the psychometric properties of each. METHOD: Articles and reports relating to the instruments were retrieved, and their findings synthesised to assess the instruments' validity (content, construct, concurrent, predictive), reliability (test-retest, inter-rater), sensitivity to change, and feasibility/utility. RESULTS: Mostly, the instruments perform adequately or better on most dimensions, although some of their psychometric properties warrant closer examination. CONCLUSION: Collectively, the Health of the Nation Outcome Scales family of measures can assess outcomes for different groups on a range of mental health-related constructs, and can be regarded as appropriate for routinely monitoring outcomes.
Resumo:
This paper describes the development and evaluation of a new instrument – the Clinician Suicide Risk Assessment Checklist (CSRAC). The instrument assesses the clinician’s competency in three areas: clinical interviewing, assessment of specific suicide risk factors, and formulating a management plan. A draft checklist was constructed by integrating information from 1) literature review 2) expert clinician focus group and 3) consultation with experts. It was utilised in a simulated clinical scenario with clinician trainees and a trained actor in order to test for inter-rater agreement. Agreement was calculated and the checklist was re-drafted with the aim of maximising agreement. A second phase of simulated clinical scenarios was then conducted and inter-rater agreement was calculated for the revised checklist. In the first phase of the study, 18 of 35 items had inadequate inter-rater agreement (60%>), while in the second phase, using the revised version, only 3 of 39 items failed to achieve adequate inter-rater agreement. Further evidence of reliability and validity are required. Continued development of the CSRAC will be necessary before it can be utilised to assess the effectiveness of risk assessment training programs.
Resumo:
Higher education institutions across the United States have developed global learning initiatives to support student achievement of global awareness and global perspective, but assessment options for these outcomes are extremely limited. A review of research for a global learning initiative at a large, Hispanic-serving, urban, public, research university in South Florida found a lack of instruments designed to measure global awareness and global perspective in the context of an authentic performance assessment. This quasi-experimental study explored the development of two rubrics for the global learning initiative and the extent to which evidence supported the rubrics' validity and reliability. One holistic rubric was developed to measure students' global awareness and the second to measure their global perspective. The study utilized a pretest/posttest nonequivalent group design. Multiple linear regression was used to ascertain the rubrics' ability to discern and compare average learning gains of undergraduate students enrolled in two global learning courses and students enrolled in two non-global learning courses. Parallel pretest/posttest forms of the performance task required students to respond to two open-ended questions, aligned with the learning outcomes, concerning a complex case narrative. Trained faculty raters read responses and used the rubrics to measure students' global awareness and perspective. Reliability was tested by calculating the rates of agreement among raters. Evidence supported the finding that the global awareness and global perspective rubrics yielded scores that were highly reliable measures of students' development of these learning outcomes. Chi-square tests of frequency found significant rates of inter-rater agreement exceeding the study's .80 minimum requirement. Evidence also supported the finding that the rubrics yielded scores that were valid measures of students' global awareness and global perspective. Regression analyses found little evidence of main effects; however, post hoc analyses revealed a significant interaction between global awareness pretest scores and the treatment, the global learning course. Significant interaction was also found between global perspective pretest scores and the treatment. These crossover interactions supported the finding that the global awareness and global perspective rubrics could be used to detect learning differences between the treatment and control groups as well as differences within the treatment group.
Resumo:
Objective: Establish intra- and inter-examiner reliability of glenohumeral range of motion (ROM) measures taken by a single-clinician using a mechanical inclinometer. Design: A single-session, repeated-measure, randomized, counterbalanced design. Setting: Athletic Training laboratory. Participants: Ten college-aged volunteers (9 right-hand dominant; 4 males, 6 females; age=23.2±2.4y, mass=73±16kg, height=170±8cm) without shoulder or neck injuries within one year. Interventions: Two Certified Athletic Trainers separately assessed passive glenohumeral (GH) internal (IR) and external (ER) rotation bilaterally. Each clinician secured the inclinometer to each subject’s distal forearm using elastic straps. Clinicians followed standard procedures for assessing ROM, with the participants supine on a standard treatment table with 90° of elbow flexion. A second investigator recorded the angle. Clinicians measured all shoulders once to assess inter-clinician reliability and eight shoulders twice to assess intra-clinician reliability. We used SPSS 14.0 (SPSS Inc., Chicago, IL) to calculate standard error of measure (SEM) and Intraclass Correlation Coefficients (ICC) to evaluate intra- and inter-clinician reliability. Main Outcome Measures: Dependent variables were degrees of IR, ER, glenohumeral internal rotation deficit (GIRD) and total arc of rotation. We calculated GIRD as the bilateral difference in IR (nondominant–dominant) and total arc for each shoulder (IR+ER). Results: Intra-clinician reliability for each examiner was excellent (ICC[1,1] range=0.90-0.96; SEM=2.2°-2.5°) for all measures. Examiners displayed excellent inter-clinician reliability (ICC[2,1] range=0.79-0.97; SEM=1.7°-3.0°) for all measures except nondominant IR which had good reliability(0.72). Conclusions: Results suggest that clinicians can achieve reliable measures of GH rotation and GIRD using a single-clinician technique and an inexpensive, readily available mechanical inclinometer.
Resumo:
Stroke is a leading cause of death and permanent disability worldwide, affecting millions of individuals. Traditional clinical scores for assessment of stroke-related impairments are inherently subjective and limited by inter-rater and intra-rater reliability, as well as floor and ceiling effects. In contrast, robotic technologies provide objective, highly repeatable tools for quantification of neurological impairments following stroke. KINARM is an exoskeleton robotic device that provides objective, reliable tools for assessment of sensorimotor, proprioceptive and cognitive brain function by means of a battery of behavioral tasks. As such, KINARM is particularly useful for assessment of neurological impairments following stroke. This thesis introduces a computational framework for assessment of neurological impairments using the data provided by KINARM. This is done by achieving two main objectives. First, to investigate how robotic measurements can be used to estimate current and future abilities to perform daily activities for subjects with stroke. We are able to predict clinical scores related to activities of daily living at present and future time points using a set of robotic biomarkers. The findings of this analysis provide a proof of principle that robotic evaluation can be an effective tool for clinical decision support and target-based rehabilitation therapy. The second main objective of this thesis is to address the emerging problem of long assessment time, which can potentially lead to fatigue when assessing subjects with stroke. To address this issue, we examine two time reduction strategies. The first strategy focuses on task selection, whereby KINARM tasks are arranged in a hierarchical structure so that an earlier task in the assessment procedure can be used to decide whether or not subsequent tasks should be performed. The second strategy focuses on time reduction on the longest two individual KINARM tasks. Both reduction strategies are shown to provide significant time savings, ranging from 30% to 90% using task selection and 50% using individual task reductions, thereby establishing a framework for reduction of assessment time on a broader set of KINARM tasks. All in all, findings of this thesis establish an improved platform for diagnosis and prognosis of stroke using robot-based biomarkers.
Resumo:
Objective Leadership is particularly important in complex highly interprofessional health care contexts involving a number of staff, some from the same specialty (intraprofessional), and others from different specialties (interprofessional). The authors recently published the concept of “The Burns Suite” (TBS) as a novel simulation tool to deliver interprofessional and teamwork training. It is unclear which leadership behaviors are the most important in an interprofessional burns resuscitation scenario, and whether they can be modeled on to current leadership theory. The purpose of this study was to perform a comprehensive video analysis of leadership behaviors within TBS. Methods A total of 3 burns resuscitation simulations within TBS were recorded. The video analysis was grounded-theory inspired. Using predefined criteria, actions/interactions deemed as leadership behaviors were identified. Using an inductive iterative process, 8 main leadership behaviors were identified. Cohen’s κ coefficient was used to measure inter-rater agreement and calculated as κ = 0.7 (substantial agreement). Each video was watched 4 times, focusing on 1 of the 4 team members per viewing (senior surgeon, senior nurse, trainee surgeon, and trainee nurse). The frequency and types of leadership behavior of each of the 4 team members were recorded. Statistical significance to assess any differences was assessed using analysis of variance, whereby a p < 0.05 was taken to be significant. Leadership behaviors were triangulated with verbal cues and actions from the videos. Results All 3 scenarios were successfully completed. The mean scenario length was 22 minutes. A total of 362 leadership behaviors were recorded from the 12 participants. The most evident leadership behaviors of all team members were adhering to guidelines (which effectively equates to following Advanced Trauma and Life Support/Emergency Management of Severe Burns resuscitation guidelines and hence “maintaining standards”), followed by making decisions. Although in terms of total frequency the senior surgeon engaged in more leadership behaviors compared with the entire team, statistically there was no significant difference between all 4 members within the 8 leadership categories. This analysis highlights that “distributed leadership” was predominant, whereby leadership was “distributed” or “shared” among team members. The leadership behaviors within TBS also seemed to fall in line with the “direction, alignment, and commitment” ontology. Conclusions Effective leadership is essential for successful functioning of work teams and accomplishment of task goals. As the resuscitation of a patient with major burns is a dynamic event, team leaders require flexibility in their leadership behaviors to effectively adapt to changing situations. Understanding leadership behaviors of different team members within an authentic simulation can identify important behaviors required to optimize nontechnical skills in a major resuscitation. Furthermore, attempting to map these behaviors on to leadership models can help further our understanding of leadership theory. Collectively this can aid the development of refined simulation scenarios for team members, and can be extrapolated into other areas of simulation-based team training and interprofessional education.
Resumo:
O presente estudo teve como principal objetivo rever a versão original do Protocolo de Avaliação da Qualidade Vocal da Universidade de Aveiro (PAQVUA), criar o respetivo manual, comprovar a validade de conteúdo do PAQVUA assim como a sua fiabilidade inter e intra avaliador. Para rever a versão original do PAQVUA, foi efetuada uma revisão narrativa da literatura, com o objetivo de encontrar informação que revelasse a pertinência, da versão original do protocolo, para assim se efetuarem as alterações necessárias. Para analisar a validade de conteúdo recorreu-se a um painel de peritos, num total de seis, especialistas na área da voz que avaliaram as provas do PAQVUA, folha de registo e manual. Os peritos avaliaram todos os componentes do PAQVUA através do preenchimento de um questionário com o objetivo de se comprovar a relevância, clareza e precisão de cada uma das provas de avaliação e também o conteúdo do seu manual. Para além disso pretendeu-se também comprovar a fiabilidade inter e intra avaliador do PAQVUA. Para isso, o mesmo protocolo foi aplicado duas vezes por duas avaliadoras diferentes e também foi aplicado em dois momentos temporais diferentes por uma só avaliadora. A amostra teve um total de doze participantes. Para analisar estatisticamente os resultados da validade de conteúdo utilizou-se o método gráfico Bland e Altman modificado e o Coeficiente de Correlação Intraclasses (CCI). Para a fiabilidade inter e intra avaliador recorreu-se ao Coeficiente de Correlação de Spearman (ρ) e ao Kappa de Cohen (k). Relativamente aos resultados obtidos com o método Bland e Altman modificado, verificou-se a existência de concordância entre os peritos através da análise dos gráficos, sendo que a maioria dos pontos se encontrou dentro dos limites esperados de concordância. Quanto aos valores do CCI (entre 0.379 e 0.479), estes revelaram uma correlação regular. No que concerne à fiabilidade inter avaliador, os resultados de correlação e concordância apresentaram-se relevantes, com valores de ρ (de Spearman) superiores a 0.700 e valores de k (de Cohen) superiores a 0.600, na maioria dos casos. O mesmo foi constatado para a fiabilidade intra avaliador. Desta forma pode concluir-se que a nova versão do PAQVUA apresenta validade de conteúdo sendo tal comprovado pelo método de Bland-Altman modificado e pelos valores do CCI. Quanto à fiabilidade inter e intra avaliador, pode afirmar-se que o PAQVUA é fiável, porém como a amostra em estudo é pequena, o que a torna pouco representativa, esta conclusão carece de fundamento com base num estudo mais alargado. Apesar da limitação referida, acredita-se que este protocolo é uma mais-valia para o estudo da patologia vocal em ambiente clínico, assim como para a investigação científica nesta área, pois através do PAQVUA podem-se recolher diversas informações relacionadas com a voz, úteis para uma intervenção terapêutica baseada em evidências científicas.
Resumo:
Aim: In the current climate of medical education, there is an ever-increasing demand for and emphasis on simulation as both a teaching and training tool. The objective of our study was to compare the realism and practicality of a number of artificial blood products that could be used for high-fidelity simulation. Method: A literature and internet search was performed and 15 artificial blood products were identified from a variety of sources. One product was excluded due to its potential toxicity risks. Five observers, blinded to the products, performed two assessments on each product using an evaluation tool with 14 predefined criteria including color, consistency, clotting, and staining potential to manikin skin and clothing. Each criterion was rated using a five-point Likert scale. The products were left for 24 hours, both refrigerated and at room temperature, and then reassessed. Statistical analysis was performed to identify the most suitable products, and both inter- and intra-rater variability were examined. Results: Three products scored consistently well with all five assessors, with one product in particular scoring well in almost every criterion. This highest-rated product had a mean rating of 3.6 of 5.0 (95% posterior Interval 3.4-3.7). Inter-rater variability was minor with average ratings varying from 3.0 to 3.4 between the highest and lowest scorer. Intrarater variability was negligible with good agreement between first and second rating as per weighted kappa scores (K = 0.67). Conclusion: The most realistic and practical form of artificial blood identified was a commercial product called KD151 Flowing Blood Syrup. It was found to be not only realistic in appearance but practical in terms of storage and stain removal.
Resumo:
Dehydration has been associated with increased morbidity and mortality. Dehydration risk increases with advancing age, and will progressively become an issue as the aging population increases. Worldwide, those aged 60 years and over are the fastest growing segment of the population. The study aimed to develop a clinically practical means to identify dehydration amongst older people in the clinical care setting. Older people aged 60 years or over admitted to the Geriatric and Rehabilitation Unit (GARU) of two tertiary teaching hospitals were eligible for participation in the study. Ninety potential screening questions and 38 clinical parameters were initially tested on a single sample (n=33) with the most promising 11 parameters selected to undergo further testing in an independent group (n=86). Of the almost 130 variables explored, tongue dryness was most strongly associated with poor hydration status, demonstrating 64% sensitivity and 62% specificity within the study participants. The result was not confounded by age, gender or body mass index. With minimal training, inter-rater repeatability was over 90%. This study identified tongue dryness as a potentially practical tool to identify dehydration risk amongst older people in the clinical care setting. Further studies to validate the potential screen in larger and varied populations of older people are required
Resumo:
Preterm infants commence breastfeeding when health-care professionals deem them to be ready. However, the optimal timing for commencement of breastfeeding is unclear. Currently, there is little guidance for neonatal care providers to decide when to initiate breastfeeding among preterm infants. A mixed-methods study was conducted to develop and test the Preterm Sucking Readiness (PTSR) scale in four phases. The first phase involved a chart audit to explore the use of age as a criterion by investigating when preterm infants meet feeding milestones as well as other factors that may affect an infant’s readiness to engage in nutritive sucking behaviour. The second phase utilised focus groups to explore and define how neonatal care providers decide when to commence breastfeeding. To gain consensus on the criteria mentioned by the focus groups, a Delphi survey was conducted in phase 3, involving neonatal providers across Australia and New Zealand. Phase 4 of the study involved an observational study that was used to test the six-item PTSR. The age at which specific feeding milestones were reached was consistent with what has been previously described in the literature. The chart audit showed that the time taken to the first feeding attempt in the preterm infant population was affected by gestational age at birth, birth weight, and specific interventions. Staff also considered age along with other criteria when deciding when to initiate feeding. Consensus on nine criteria for inclusion into the six-item PTSR was achieved using the Delphi technique. Three items of PTSR showed significant differences between the preterm and fullterm infant groups. Only two items, feeding-readiness behaviour and low pulse oximetry during handling, explained the variance in breastfeeding behaviour. The inter-rater variability ranged between moderate and very good for the PTSR items. The results of this study indicate the importance of assessing behavioural cues as an indication of breastfeeding readiness in the preterm infant population, once an infant is deemed physiologically stable. Age continues to be a factor in some clinicians' decisions to commence breastfeeding. However, age alone cannot be used to decide if an infant is ready to engage in breastfeeding. Further research is needed to confirm these findings.
Resumo:
Background. Vertebral rotation found in structural scoliosis contributes to trunkal asymmetry which is commonly measured with a simple Scoliometer device on a patient's thorax in the forward flexed position. The new generation of mobile 'smartphones' have an integrated accelerometer, making accurate angle measurement possible, which provides a potentially useful clinical tool for assessing rib hump deformity. This study aimed to compare rib hump angle measurements performed using a Smartphone and traditional Scoliometer on a set of plaster torsos representing the range of torsional deformities seen in clinical practice. Methods. Nine observers measured the rib hump found on eight plaster torsos moulded from scoliosis patients with both a Scoliometer and an Apple iPhone on separate occasions. Each observer repeated the measurements at least a week after the original measurements, and were blinded to previous results. Intra-observer reliability and inter-observer reliability were analysed using the method of Bland and Altman and 95% confidence intervals were calculated. The Intra-Class Correlation Coefficients (ICC) were calculated for repeated measurements of each of the eight plaster torso moulds by the nine observers. Results. Mean absolute difference between pairs of iPhone/Scoliometer measurements was 2.1 degrees, with a small (1 degrees) bias toward higher rib hump angles with the iPhone. 95% confidence intervals for intra-observer variability were +/- 1.8 degrees (Scoliometer) and +/- 3.2 degrees (iPhone). 95% confidence intervals for inter-observer variability were +/- 4.9 degrees (iPhone) and +/- 3.8 degrees (Scoliometer). The measurement errors and confidence intervals found were similar to or better than the range of previously published thoracic rib hump measurement studies. Conclusions. The iPhone is a clinically equivalent rib hump measurement tool to the Scoliometer in spinal deformity patients. The novel use of plaster torsos as rib hump models avoids the variables of patient fatigue and discomfort, inconsistent positioning and deformity progression using human subjects in a single or multiple measurement sessions.
Resumo:
Nontechnical skills relating to team functioning are vital to the effective delivery of patient care and safety. In this study, we develop a reliable behavioral marker tool for assessing nontechnical skills that are critical to the success of ward-based multidisciplinary healthcare teams. The Team Functioning Assessment Tool (TFAT) was developed and refined using a literature review, focus groups, card-sorting exercise, field observations, and final questionnaire evaluation and refinement process. Results demonstrated that Clinical Planning, Executive Tasks, and Team Relations are important facets of effective multidisciplinary healthcare team functioning. The TFAT was also shown to yield acceptable inter-rater agreement.