976 resultados para accumulative test item


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Work Limitations Questionnaire (WLQ) is used to determine the amount of work loss and productivity which stem from certain health conditions, including rheumatoid arthritis and cancer. The questionnaire is currently scored using methodology from Classical Test Theory. Item Response Theory, on the other hand, is a theory based on analyzing item responses. This study wanted to determine the validity of using Item Response Theory (IRT), to analyze data from the WLQ. Item responses from 572 employed adults with dysthymia, major depressive disorder (MDD), double depressive disorder (both dysthymia and MDD), rheumatoid arthritis and healthy individuals were used to determine the validity of IRT (Adler et al., 2006).^ PARSCALE, which is IRT software from Scientific Software International, Inc., was used to calculate estimates of the work limitations based on item responses from the WLQ. These estimates, also known as ability estimates, were then correlated with the raw score estimates calculated from the sum of all the items responses. Concurrent validity, which claims a measurement is valid if the correlation between the new measurement and the valid measurement is greater or equal to .90, was used to determine the validity of IRT methodology for the WLQ. Ability estimates from IRT were found to be somewhat highly correlated with the raw scores from the WLQ (above .80). However, the only subscale which had a high enough correlation for IRT to be considered valid was the time management subscale (r = .90). All other subscales, mental/interpersonal, physical, and output, did not produce valid IRT ability estimates.^ An explanation for these lower than expected correlations can be explained by the outliers found in the sample. Also, acquiescent responding (AR) bias, which is caused by the tendency for people to respond the same way to every question on a questionnaire, and the multidimensionality of the questionnaire (the WLQ is composed of four dimensions and thus four different latent variables) probably had a major impact on the IRT estimates. Furthermore, it is possible that the mental/interpersonal dimension violated the monotonocity assumption of IRT causing PARSCALE to fail to run for these estimates. The monotonicity assumption needs to be checked for the mental/interpersonal dimension. Furthermore, the use of multidimensional IRT methods would most likely remove the AR bias and increase the validity of using IRT to analyze data from the WLQ.^

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The understanding of the structure and dynamics of the intricate network of connections among people that consumes products through Internet appears as an extremely useful asset in order to study emergent properties related to social behavior. This knowledge could be useful, for example, to improve the performance of personal recommendation algorithms. In this contribution, we analyzed five-year records of movie-rating transactions provided by Netflix, a movie rental platform where users rate movies from an online catalog. This dataset can be studied as a bipartite user-item network whose structure evolves in time. Even though several topological properties from subsets of this bipartite network have been reported with a model that combines random and preferential attachment mechanisms [Beguerisse Díaz et al., 2010], there are still many aspects worth to be explored, as they are connected to relevant phenomena underlying the evolution of the network. In this work, we test the hypothesis that bursty human behavior is essential in order to describe how a bipartite user-item network evolves in time. To that end, we propose a novel model that combines, for user nodes, a network growth prescription based on a preferential attachment mechanism acting not only in the topological domain (i.e. based on node degrees) but also in time domain. In the case of items, the model mixes degree preferential attachment and random selection. With these ingredients, the model is not only able to reproduce the asymptotic degree distribution, but also shows an excellent agreement with the Netflix data in several time-dependent topological properties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purposes of this study were (1) to validate of the item-attribute matrix using two levels of attributes (Level 1 attributes and Level 2 sub-attributes), and (2) through retrofitting the diagnostic models to the mathematics test of the Trends in International Mathematics and Science Study (TIMSS), to evaluate the construct validity of TIMSS mathematics assessment by comparing the results of two assessment booklets. Item data were extracted from Booklets 2 and 3 for the 8th grade in TIMSS 2007, which included a total of 49 mathematics items and every student's response to every item. The study developed three categories of attributes at two levels: content, cognitive process (TIMSS or new), and comprehensive cognitive process (or IT) based on the TIMSS assessment framework, cognitive procedures, and item type. At level one, there were 4 content attributes (number, algebra, geometry, and data and chance), 3 TIMSS process attributes (knowing, applying, and reasoning), and 4 new process attributes (identifying, computing, judging, and reasoning). At level two, the level 1 attributes were further divided into 32 sub-attributes. There was only one level of IT attributes (multiple steps/responses, complexity, and constructed-response). Twelve Q-matrices (4 originally specified, 4 random, and 4 revised) were investigated with eleven Q-matrix models (QM1 ~ QM11) using multiple regression and the least squares distance method (LSDM). Comprehensive analyses indicated that the proposed Q-matrices explained most of the variance in item difficulty (i.e., 64% to 81%). The cognitive process attributes contributed to the item difficulties more than the content attributes, and the IT attributes contributed much more than both the content and process attributes. The new retrofitted process attributes explained the items better than the TIMSS process attributes. Results generated from the level 1 attributes and the level 2 attributes were consistent. Most attributes could be used to recover students' performance, but some attributes' probabilities showed unreasonable patterns. The analysis approaches could not demonstrate if the same construct validity was supported across booklets. The proposed attributes and Q-matrices explained the items of Booklet 2 better than the items of Booklet 3. The specified Q-matrices explained the items better than the random Q-matrices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are many models in the literature that have been proposed in the last decades aimed at assessing the reliability, availability and maintainability (RAM) of safety equipment, many of them with a focus on their use to assess the risk level of a technological system or to search for appropriate design and/or surveillance and maintenance policies in order to assure that an optimum level of RAM of safety systems is kept during all the plant operational life. This paper proposes a new approach for RAM modelling that accounts for equipment ageing and maintenance and testing effectiveness of equipment consisting of multiple items in an integrated manner. This model is then used to perform the simultaneous optimization of testing and maintenance for ageing equipment consisting of multiple items. An example of application is provided, which considers a simplified High Pressure Injection System (HPIS) of a typical Power Water Reactor (PWR). Basically, this system consists of motor driven pumps (MDP) and motor operated valves (MOV), where both types of components consists of two items each. These components present different failure and cause modes and behaviours, and they also undertake complex test and maintenance activities depending on the item involved. The results of the example of application demonstrate that the optimization algorithm provide the best solutions when the optimization problem is formulated and solved considering full flexibility in the implementation of testing and maintenance activities taking part of such an integrated RAM model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND Screening of aphasia in acute stroke is crucial for directing patients to early language therapy. The Language Screening Test (LAST), originally developed in French, is a validated language screening test that allows detection of a language deficit within a few minutes. The aim of the present study was to develop and validate two parallel German versions of the LAST. METHODS The LAST includes subtests for naming, repetition, automatic speech, and comprehension. For the translation into German, task constructs and psycholinguistic criteria for item selection were identical to the French LAST. A cohort of 101 stroke patients were tested, all of whom were native German speakers. Validation of the LAST was based on (1) analysis of equivalence of the German versions, which was established by administering both versions successively in a subset of patients, (2) internal validity by means of internal consistency analysis, and (3) external validity by comparison with the short version of the Token Test in another subset of patients. RESULTS The two German versions were equivalent as demonstrated by a high intraclass correlation coefficient of 0.91. Furthermore, an acceptable internal structure of the LAST was found (Cronbach's α = 0.74). A highly significant correlation (r = 0.74, p < 0.0001) between the LAST and the short version of the Token Test indicated good external validity of the scale. CONCLUSION The German version of the LAST, available in two parallel versions, is a new and valid language screening test in stroke.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Item 247

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequency of exposure to very low- and high-frequency words was manipulated in a three-phase (familiarisation, study, and test) design. During familiarisation, words were presented with their definition (once, four times, or not presented). One week (Experiment 1) or one day (Experiment 2) later, participants studied a list of homogeneous pairs (i.e., pair members were matched on background and familiarisation frequency). Item and associative recognition of high- and very low-frequency words presented in intact, rearranged, old-new, or new-new pairs were tested in Experiment 1. Associative recognition of very low-frequency words was tested in Experiment 2. Results showed that prior familiaris ation improved associative recognition of very low-frequency pairs, but had no effect on high-frequency pairs. The role of meaning in the formation of item-to-item and item-to-context associations and the implications for current models of memory are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate monitoring of prevalence and trends in population levels of physical activity (PA) is a fundamental public health need. Test-retest reliability (repeatability) was assessed in population samples for four self-report PA measures: the Active Australia survey (AA, N=356), the short International Physical Activity Questionnaire (IPAQ, N=104), the physical activity items in the Behavioral Risk Factor Surveillance System (BRFSS, N=127) and in the Australian National Health Survey (NHS, N=122). Percent agreement and Kappa statistics were used to assess reliability of classification of activity status as 'active', 'insufficiently active' or 'sedentary'. Intraclass correlations (ICCs) were used to assess agreement on minutes of activity reported for each item of each survey and for total minutes. Percent agreement scores for activity status were very good on all four instruments, ranging from 60% for the NHS to 79% for the IPAQ. Corresponding Kappa statistics ranged from 0.40 (NHS) to 0.52 (AA). For individual items, ICCs were highest for walking (0.45 to 0.78) and vigorous activity (0.22 to 0.64) and lowest for the moderate questions (0.16 to 0.44). All four measures provide acceptable levels of test-retest reliability for assessing both activity status and sedentariness, and moderate reliability for assessing total minutes of activity.