817 resultados para Error of measurement
Resumo:
Hintergrund: Bei der Durchführung von summativen Prüfungen wird üblicherweise eine Mindestreliabilität von 0,8 gefordert. Bei praktischen Prüfungen wie OSCEs werden manchmal 0,7 akzeptiert (Downing 2004). Doch was kann man sich eigentlich unter der Präzision einer Messung mit einer Reliabilität von 0,7 oder 0,8 vorstellen? Methode: Mittels verschiedener statistischer Methoden wie dem Standardmessfehler oder der Generalisierbarkeitstheorie lässt sich die Reliabilität in ein Konfidenzintervall um eine festgestellte Kandidatenleistung übersetzen (Brennan 2003, Harvill 1991, McManus 2012). Hat ein Kandidat beispielsweise bei einer Prüfung 57 Punkte erreicht, schwankt seine wahre Leistung aufgrund der Messungenauigkeit der Prüfung um diesen Wert (z.B. zwischen 50 und 64 Punkte). Im Bereich der Bestehensgrenze ist die Messgenauigkeit aber besonders wichtig. Läge die Bestehensgrenze in unserem Beispiel bei 60 Punkten, wäre der Kandidat mit 57 Punkten zwar pro forma durchgefallen, allerdings könnte er aufgrund der Schwankungsbreite um seine gemessene Leistung in Wahrheit auch knapp bestanden haben. Überträgt man diese Erkenntnisse auf alle KandidatInnen einer Prüfung, kann man die Anzahl der Grenzfallkandidaten bestimmen, also all jene Kandidatinnen, die mit Ihrem Prüfungsergebnis so nahe an der Bestehensgrenze liegen, dass ihr jeweiliges Prüfungsresultate falsch positiv oder falsch negativ sein kann. Ergebnisse: Die Anzahl der GrenzfallkandidatInnen in einer Prüfung ist, nicht nur von der Reliabilität abhängig, sondern auch von der Leistung der KandidatInnen, der Varianz, dem Abstand der Bestehensgrenze zum Mittelwert und der Schiefe der Verteilung. Es wird anhand von Modelldaten und konkreten Prüfungsdaten der Zusammenhang zwischen der Reliabilität und der Anzahl der Grenzfallkandidaten auch für den Nichtstatistiker verständlich dargestellt. Es wird gezeigt, warum selbst eine Reliabilität von 0.8 in besonderen Situationen keine befriedigende Präzision der Messung bieten wird, während in manchen OSCEs die Reliabilität fast ignoriert werden kann. Schlussfolgerungen: Die Berechnung oder Schätzung der Grenzfallkandidaten anstatt der Reliabilität verbessert auf anschauliche Weise das Verständnis für die Präzision einer Prüfung. Wenn es darum geht, wie viele Stationen ein summativer OSCE benötigt oder wie lange eine MC-Prüfung dauern soll, sind Grenzfallkandidaten ein valideres Entscheidungskriterium als die Reliabilität. Brennan, R.L. (2003) Generalizability Theory. New York, Springer Downing, S.M. (2004) ‘Reliability: on the reproducibility of assessment data’, Medical Education 2004, 38, 1006–12 Harvill, L.M. (1991) ‘Standard Error of Measurement’, Educational Measurement: Issues and Practice, 33-41 McManus, I.C. (2012) ‘The misinterpretation of the standard error of measurement in medical education: A primer on the problems, pitfalls and peculiarities of the three different standard errors of measurement’ Medical teacher, 34, 569 - 76
Resumo:
Introduction: Clinical reasoning is essential for the practice of medicine. In theory of development of medical expertise it is stated, that clinical reasoning starts from analytical processes namely the storage of isolated facts and the logical application of the ‘rules’ of diagnosis. Then the learners successively develop so called semantic networks and illness-scripts which finally are used in an intuitive non-analytic fashion [1], [2]. The script concordance test (SCT) is an example for assessing clinical reasoning [3]. However the aggregate scoring [3] of the SCT is recognized as problematic [4]. The SCT`s scoring leads to logical inconsistencies and is likely to reflect construct-irrelevant differences in examinees’ response styles [4]. Also the expert panel judgments might lead to an unintended error of measurement [4]. In this PhD project the following research questions will be addressed: 1. How does a format look like to assess clinical reasoning (similar to the SCT but) with multiple true-false questions or other formats with unambiguous correct answers, and by this address the above mentioned pitfalls in traditional scoring of the SCT? 2. How well does this format fulfill the Ottawa criteria for good assessment, with special regards to educational and catalytic effects [5]? Methods: 1. In a first study it shall be assessed whether designing a new format using multiple true-false items to assess clinical reasoning similar to the SCT-format is arguable in a theoretically and practically sound fashion. For this study focus groups or interviews with assessment experts and students will be undertaken. 2. In an study using focus groups and psychometric data Norcini`s and colleagues Criteria for Good Assessment [5] shall be determined for the new format in a real assessment. Furthermore the scoring method for this new format shall be optimized using real and simulated data.
Resumo:
El Instituto Nacional de Estadística ha decidido elaborar una nueva Estadística de Migraciones utilizando como base la Estadística de Variaciones Residenciales. En este artículo se aportan algunos argumentos que apoyan esta decisión, ante la persistente falta de coherencia entre las fuentes del sistema estadístico español para captar la movilidad. En particular, se profundiza en los problemas de subestimación e inconsistencia interna de la Encuesta de Población Activa para medir la inmigración, a la vista de las diferencias entre las tres series de flujos de inmigración internacional que es posible estimar a partir de la misma.
Resumo:
Study Design. Survey of intraobserver and interobserver measurement variability. Objective. To assess the use of reformatted computerized tomography (CT) images for manual measurement of coronal Cobb angles in idiopathic scoliosis. Summary of Background Data. Cobb angle measurements in idiopathic scoliosis are traditionally made from standing radiographs, whereas CT is often used for assessment of vertebral rotation. Correlating Cobb angles from standing radiographs with vertebral rotations from supine CT is problematic because the geometry of the spine changes significantly from standing to supine positions, and 2 different imaging methods are involved. Methods. We assessed the use of reformatted thoracolumbar CT images for Cobb angle measurement. Preoperative CT of 12 patients with idiopathic scoliosis were used to generate reformatted coronal images. Five observers measured coronal Cobb angles on 3 occasions from each of the images. Intraobserver and interobserver variability associated with Cobb measurement from reformatted CT scans was assessed and compared with previous studies of measurement variability using plain radiographs. Results. For major curves, 95% confidence intervals for intraobserver and interobserver variability were +/- 6.6 degrees and +/- 7.7 degrees, respectively. For minor curves, the intervals were +/- 7.5 degrees and +/- 8.2 degrees, respectively. Intraobserver and interobserver technical error of measurement was 2.4 degrees and 2.7 degrees, with reliability coefficients of 88% and 84%, respectively. There was no correlation between measurement variability and curve severity. Conclusions. Reformatted CT images may be used for manual measurement of coronal Cobb angles in idiopathic scoliosis with similar variability to manual measurement of plain radiographs.
Resumo:
Studies suggest that enjoyment, perceived benefits and perceived barriers may be important mediators of physical activity. However, the psychometric properties of these scales have not been assessed using Rasch modeling. The purpose of this study was to use Rasch modeling to evaluate the properties of three scales commonly used in physical activity studies: the Physical Activity Enjoyment Scale, the Benefits of Physical Activity Scale and the Barriers to Physical Activity Scale. The scales were administered to 378 healthy adults, aged 25–75 years (50% women, 62% Whites), at the baseline assessment for a lifestyle physical activity intervention trial. The ConQuest software was used to assess model fit, item difficulty, item functioning and standard error of measurement. For all scales, the partial credit model fit the data. Item content of one scale did not adequately cover all respondents. Response options of each scale were not targeting respondents appropriately, and standard error of measurement varied across the total score continuum of each scale. These findings indicate that each scale's effectiveness at detecting differences among individuals may be limited unless changes in scale content and response format are made.
Resumo:
La fraction d’éjection du ventricule gauche est un excellent marqueur de la fonction cardiaque. Plusieurs techniques invasives ou non sont utilisées pour son calcul : l’angiographie, l’échocardiographie, la résonnance magnétique nucléaire cardiaque, le scanner cardiaque, la ventriculographie radioisotopique et l’étude de perfusion myocardique en médecine nucléaire. Plus de 40 ans de publications scientifiques encensent la ventriculographie radioisotopique pour sa rapidité d’exécution, sa disponibilité, son faible coût et sa reproductibilité intra-observateur et inter-observateur. La fraction d’éjection du ventricule gauche a été calculée chez 47 patients à deux reprises, par deux technologues, sur deux acquisitions distinctes selon trois méthodes : manuelle, automatique et semi-automatique. Les méthodes automatique et semi-automatique montrent dans l’ensemble une meilleure reproductibilité, une plus petite erreur standard de mesure et une plus petite différence minimale détectable. La méthode manuelle quant à elle fournit un résultat systématiquement et significativement inférieur aux deux autres méthodes. C’est la seule technique qui a montré une différence significative lors de l’analyse intra-observateur. Son erreur standard de mesure est de 40 à 50 % plus importante qu’avec les autres techniques, tout comme l’est sa différence minimale détectable. Bien que les trois méthodes soient d’excellentes techniques reproductibles pour l’évaluation de la fraction d’éjection du ventricule gauche, les estimations de la fiabilité des méthodes automatique et semi-automatique sont supérieures à celles de la méthode manuelle.
Resumo:
La fraction d’éjection du ventricule gauche est un excellent marqueur de la fonction cardiaque. Plusieurs techniques invasives ou non sont utilisées pour son calcul : l’angiographie, l’échocardiographie, la résonnance magnétique nucléaire cardiaque, le scanner cardiaque, la ventriculographie radioisotopique et l’étude de perfusion myocardique en médecine nucléaire. Plus de 40 ans de publications scientifiques encensent la ventriculographie radioisotopique pour sa rapidité d’exécution, sa disponibilité, son faible coût et sa reproductibilité intra-observateur et inter-observateur. La fraction d’éjection du ventricule gauche a été calculée chez 47 patients à deux reprises, par deux technologues, sur deux acquisitions distinctes selon trois méthodes : manuelle, automatique et semi-automatique. Les méthodes automatique et semi-automatique montrent dans l’ensemble une meilleure reproductibilité, une plus petite erreur standard de mesure et une plus petite différence minimale détectable. La méthode manuelle quant à elle fournit un résultat systématiquement et significativement inférieur aux deux autres méthodes. C’est la seule technique qui a montré une différence significative lors de l’analyse intra-observateur. Son erreur standard de mesure est de 40 à 50 % plus importante qu’avec les autres techniques, tout comme l’est sa différence minimale détectable. Bien que les trois méthodes soient d’excellentes techniques reproductibles pour l’évaluation de la fraction d’éjection du ventricule gauche, les estimations de la fiabilité des méthodes automatique et semi-automatique sont supérieures à celles de la méthode manuelle.
Resumo:
A new method to measure reciprocal four-port structures, using a 16-term error model, is presented. The measurement is based on 5 two-port calibration standards connected to two of the ports, while the network analyzer is connected to the two remaining ports. Least-squares-fit data reduction techniques are used to lower error sensitivity. The effect of connectors is deembedded using closed-form equations. (C) 2007 Wiley Periodicals, Inc.
Resumo:
A growing literature considers the impact of uncertainty using SVAR models that include proxies for uncertainty shocks as endogenous variables. In this paper we consider the impact of measurement error in these proxies on the estimated impulse responses. We show via a Monte-Carlo experiment that measurement error can result in attenuation bias in impulse responses. In contrast, the proxy SVAR that uses the uncertainty shock proxy as an instrument does not su¤er from this bias. Applying this latter method to the Bloom (2009) data-set results in impulse responses to uncertainty shocks that are larger in magnitude and more persistent than those obtained from a recursive SVAR.
Resumo:
If change over time is compared in several groups, it is important to take into account baseline values so that the comparison is carried out under the same preconditions. As the observed baseline measurements are distorted by measurement error, it may not be sufficient to include them as covariate. By fitting a longitudinal mixed-effects model to all data including the baseline observations and subsequently calculating the expected change conditional on the underlying baseline value, a solution to this problem has been provided recently so that groups with the same baseline characteristics can be compared. In this article, we present an extended approach where a broader set of models can be used. Specifically, it is possible to include any desired set of interactions between the time variable and the other covariates, and also, time-dependent covariates can be included. Additionally, we extend the method to adjust for baseline measurement error of other time-varying covariates. We apply the methodology to data from the Swiss HIV Cohort Study to address the question if a joint infection with HIV-1 and hepatitis C virus leads to a slower increase of CD4 lymphocyte counts over time after the start of antiretroviral therapy.
Resumo:
The reliability of measurement refers to unsystematic error in observed responses. Investigations of the prevalence of random error in stated estimates of willingness to pay (WTP) are important to an understanding of why tests of validity in CV can fail. However, published reliability studies have tended to adopt empirical methods that have practical and conceptual limitations when applied to WTP responses. This contention is supported in a review of contingent valuation reliability studies that demonstrate important limitations of existing approaches to WTP reliability. It is argued that empirical assessments of the reliability of contingent values may be better dealt with by using multiple indicators to measure the latent WTP distribution. This latent variable approach is demonstrated with data obtained from a WTP study for stormwater pollution abatement. Attitude variables were employed as a way of assessing the reliability of open-ended WTP (with benchmarked payment cards) for stormwater pollution abatement. The results indicated that participants' decisions to pay were reliably measured, but not the magnitude of the WTP bids. This finding highlights the need to better discern what is actually being measured in VVTP studies, (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Understanding the relationship between diet, physical activity and health in humans requires accurate measurement of body composition and daily energy expenditure. Stable isotopes provide a means of measuring total body water and daily energy expenditure under free-living conditions. While the use of isotope ratio mass spectrometry (IRMS) for the analysis of 2H (Deuterium) and 18O (Oxygen-18) is well established in the field of human energy metabolism research, numerous questions remain regarding the factors which influence analytical and measurement error using this methodology. This thesis was comprised of four studies with the following emphases. The aim of Study 1 was to determine the analytical and measurement error of the IRMS with regard to sample handling under certain conditions. Study 2 involved the comparison of TEE (Total daily energy expenditure) using two commonly employed equations. Further, saliva and urine samples, collected at different times, were used to determine if clinically significant differences would occur. Study 3 was undertaken to determine the appropriate collection times for TBW estimates and derived body composition values. Finally, Study 4, a single case study to investigate if TEE measures are affected when the human condition changes due to altered exercise and water intake. The aim of Study 1 was to validate laboratory approaches to measure isotopic enrichment to ensure accurate (to international standards), precise (reproducibility of three replicate samples) and linear (isotope ratio was constant over the expected concentration range) results. This established the machine variability for the IRMS equipment in use at Queensland University for both TBW and TEE. Using either 0.4mL or 0.5mL sample volumes for both oxygen-18 and deuterium were statistically acceptable (p>0.05) and showed a within analytical variance of 5.8 Delta VSOW units for deuterium, 0.41 Delta VSOW units for oxygen-18. This variance was used as “within analytical noise” to determine sample deviations. It was also found that there was no influence of equilibration time on oxygen-18 or deuterium values when comparing the minimum (oxygen-18: 24hr; deuterium: 3 days) and maximum (oxygen-18: and deuterium: 14 days) equilibration times. With regard to preparation using the vacuum line, any order of preparation is suitable as the TEE values fall within 8% of each other regardless of preparation order. An 8% variation is acceptable for the TEE values due to biological and technical errors (Schoeller, 1988). However, for the automated line, deuterium must be assessed first followed by oxygen-18 as the automated machine line does not evacuate tubes but merely refills them with an injection of gas for a predetermined time. Any fractionation (which may occur for both isotopes), would cause a slight elevation in the values and hence a lower TEE. The purpose of the second and third study was to investigate the use of IRMS to measure the TEE and TBW of and to validate the current IRMS practices in use with regard to sample collection times of urine and saliva, the use of two TEE equations from different research centers and the body composition values derived from these TEE and TBW values. Following the collection of a fasting baseline urine and saliva sample, 10 people (8 women, 2 men) were dosed with a doubly labeled water does comprised of 1.25g 10% oxygen-18 and 0.1 g 100% deuterium/kg body weight. The samples were collected hourly for 12 hrs on the first day and then morning, midday, and evening samples were collected for the next 14 days. The samples were analyzed using an isotope ratio mass spectrometer. For the TBW, time to equilibration was determined using three commonly employed data analysis approaches. Isotopic equilibration was reached in 90% of the sample by hour 6, and in 100% of the sample by hour 7. With regard to the TBW estimations, the optimal time for urine collection was found to be between hours 4 and 10 as to where there was no significant difference between values. In contrast, statistically significant differences in TBW estimations were found between hours 1-3 and from 11-12 when compared with hours 4-10. Most of the individuals in this study were in equilibrium after 7 hours. The TEE equations of Prof Dale Scholler (Chicago, USA, IAEA) and Prof K.Westerterp were compared with that of Prof. Andrew Coward (Dunn Nutrition Centre). When comparing values derived from samples collected in the morning and evening there was no effect of time or equation on resulting TEE values. The fourth study was a pilot study (n=1) to test the variability in TEE as a result of manipulations in fluid consumption and level of physical activity; the magnitude of change which may be expected in a sedentary adult. Physical activity levels were manipulated by increasing the number of steps per day to mimic the increases that may result when a sedentary individual commences an activity program. The study was comprised of three sub-studies completed on the same individual over a period of 8 months. There were no significant changes in TBW across all studies, even though the elimination rates changed with the supplemented water intake and additional physical activity. The extra activity may not have sufficiently strenuous enough and the water intake high enough to cause a significant change in the TBW and hence the CO2 production and TEE values. The TEE values measured show good agreement based on the estimated values calculated on an RMR of 1455 kcal/day, a DIT of 10% of TEE and activity based on measured steps. The covariance values tracked when plotting the residuals were found to be representative of “well-behaved” data and are indicative of the analytical accuracy. The ratio and product plots were found to reflect the water turnover and CO2 production and thus could, with further investigation, be employed to identify the changes in physical activity.
Resumo:
Bayesian networks (BNs) are graphical probabilistic models used for reasoning under uncertainty. These models are becoming increasing popular in a range of fields including ecology, computational biology, medical diagnosis, and forensics. In most of these cases, the BNs are quantified using information from experts, or from user opinions. An interest therefore lies in the way in which multiple opinions can be represented and used in a BN. This paper proposes the use of a measurement error model to combine opinions for use in the quantification of a BN. The multiple opinions are treated as a realisation of measurement error and the model uses the posterior probabilities ascribed to each node in the BN which are computed from the prior information given by each expert. The proposed model addresses the issues associated with current methods of combining opinions such as the absence of a coherent probability model, the lack of the conditional independence structure of the BN being maintained, and the provision of only a point estimate for the consensus. The proposed model is applied an existing Bayesian Network and performed well when compared to existing methods of combining opinions.
Resumo:
In this study, we assessed a broad range of barley breeding lines and commercial varieties by three hardness methods (two particle size methods and one crush resistance method (SKCS—Single-Kernel Characterization System), grown at multiple sites to see if there was variation in barley hardness and if that variation was genetic or environmentally controlled. We also developed near-infrared reflectance (NIR) calibrations for these three hardness methods to ascertain if NIR technology was suitable for rapid screening of breeding lines or specific populations. In addition, we used this data to identify genetic regions that may be associated with hardness. There were significant (p<0.05) genetic effects for the three hardness methods. There were also environmental effects, possibly linked to the effect of protein on hardness, i.e. increasing protein resulted in harder grain. Heritability values were calculated at >85% for all methods. The NIR calibrations, with R2 values of >90%, had Standard Error of Prediction values of 0.90, 72 and 4.0, respectively, for the three hardness methods. These equations were used to predict hardness values of a mapping population which resulted in genetic markers being identified on all chromosomes but chromosomes 2H, 3H, 5H, 6H and 7H had markers with significant LOD scores. The two regions on 5H were on the distal end of both the long and short arms. The region that showed significant LOD score was on the long arm. However, the region on the short arm associated with the hardness (hordoindoline) genes did not have significant LOD scores. The results indicate that barley hardness is influenced by both genotype and environment and that the trait is heritable, which would allow breeders to develop very hard or soft varieties if required. In addition, NIR was shown to be a reliable tool for screening for hardness. While the data set used in this study has a relatively low variation in hardness, the tools developed could be applied to breeding populations that have large variation in barley grain hardness.
Resumo:
Visual acuities at the time of referral and on the day before surgery were compared in 124 patients operated on for cataract in Vaasa Central Hospital, Finland. Preoperative visual acuity and the occurrence of ocular and general disease were compared in samples of consecutive cataract extractions performed in 1982, 1985, 1990, 1995 and 2000 in two hospitals in the Vaasa region in Finland. The repeatability and standard deviation of random measurement error in visual acuity and refractive error determination in a clinical environment in cataractous, pseudophakic and healthy eyes were estimated by re-examining visual acuity and refractive error of patients referred to cataract surgery or consultation by ophthalmic professionals. Altogether 99 eyes of 99 persons (41 cataractous, 36 pseudophakic and 22 healthy eyes) with a visual acuity range of Snellen 0.3 to 1.3 (0.52 to -0.11 logMAR) were examined. During an average waiting time of 13 months, visual acuity in the study eye decreased from 0.68 logMAR to 0.96 logMAR (from 0.2 to 0.1 in Snellen decimal values). The average decrease in vision was 0.27 logMAR per year. In the fastest quartile, visual acuity change per year was 0.75 logMAR, and in the second fastest 0.29 logMAR, the third and fourth quartiles were virtually unaffected. From 1982 to 2000, the incidence of cataract surgery increased from 1.0 to 7.2 operations per 1000 inhabitants per year in the Vaasa region. The average preoperative visual acuity in the operated eye increased by 0.85 logMAR (in decimal values from 0.03to 0.2) and in the better eye 0.27 logMAR (in decimal values from 0.23 to 0.43) over this period. The proportion of patients profoundly visually handicapped (VA in the better eye <0.1) before the operation fell from 15% to 4%, and that of patients less profoundly visually handicapped (VA in the better eye 0.1 to <0.3) from 47% to 15%. The repeatability visual acuity measurement estimated as a coefficient of repeatability for all 99 eyes was ±0.18 logMAR, and the standard deviation of measurement error was 0.06 logMAR. Eyes with the lowest visual acuity (0.3-0.45) had the largest variability, the coefficient of repeatability values being ±0.24 logMAR and eyes with a visual acuity of 0.7 or better had the smallest, ±0.12 logMAR. The repeatability of refractive error measurement was studied in the same patient material as the repeatability of visual acuity. Differences between measurements 1 and 2 were calculated as three-dimensional vector values and spherical equivalents and expressed by coefficients of repeatability. Coefficients of repeatability for all eyes for vertical, torsional and horisontal vectors were ±0.74D, ±0.34D and ±0.93D, respectively, and for spherical equivalent for all eyes ±0.74D. Eyes with lower visual acuity (0.3-0.45) had larger variability in vector and spherical equivalent values (±1.14), but the difference between visual acuity groups was not statistically significant. The difference in the mean defocus equivalent between measurements 1 and 2 was, however, significantly greater in the lower visual acuity group. If a change of ±0.5D (measured in defocus equivalents) is accepted as a basis for change of spectacles for eyes with good vision, the basis for eyes in the visual acuity range of 0.3 - 0.65 would be ±1D. Differences in repeated visual acuity measurements are partly explained by errors in refractive error measurements.