967 resultados para ROC curves


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, machine learning algorithms have been evaluated in applications where assumptions can be reliably made about class priors and/or misclassification costs. In this paper, we consider the case of imprecise environments, where little may be known about these factors and they may well vary significantly when the system is applied. Specifically, the use of precision-recall analysis is investigated and compared to the more well known performance measures such as error-rate and the receiver operating characteristic (ROC). We argue that while ROC analysis is invariant to variations in class priors, this invariance in fact hides an important factor of the evaluation in imprecise environments. Therefore, we develop a generalised precision-recall analysis methodology in which variation due to prior class probabilities is incorporated into a multi-way analysis of variance (ANOVA). The increased sensitivity and reliability of this approach is demonstrated in a remote sensing application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On the basis of convolutional (Hamming) version of recent Neural Network Assembly Memory Model (NNAMM) for intact two-layer autoassociative Hopfield network optimal receiver operating characteristics (ROCs) have been derived analytically. A method of taking into account explicitly a priori probabilities of alternative hypotheses on the structure of information initiating memory trace retrieval and modified ROCs (mROCs, a posteriori probabilities of correct recall vs. false alarm probability) are introduced. The comparison of empirical and calculated ROCs (or mROCs) demonstrates that they coincide quantitatively and in this way intensities of cues used in appropriate experiments may be estimated. It has been found that basic ROC properties which are one of experimental findings underpinning dual-process models of recognition memory can be explained within our one-factor NNAMM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: The aim of the study is to examine the distribution of integrated covariate and its association with blood pressure (BP) among children in Anhui province, China, and assess the predictive value of integrated covariate to children hypertension. Methods: A total of 2,828 subjects (1,588 male and 1,240 female) aged 7-17 years participated in this study. Height, weight, waistline, hipline and BP of all subjects were measured, obesity and overweight were defined by an international standard, specifying the measurement, the reference population, and the age and sex specific cut off points. High BP status was defined as systolic blood pressure (SBP) and/or diastolic blood pressure (DBP) > 95th percentile for age and gender. Results: Our results revealed that the prevalence of children hypertension was 11.03%, the SBP and DBP of obesity group were significantly higher than that of normal group. Anthropometric obesity indices such as body mass index (BMI) were positively correlated with SBP and DBP. Integrated covariate had a better performance than the single covariate in the receiver-operating characteristic (ROC) curve, the cut-off value; the sensitivity and the specificity of the integrated covariate were 0.112, 0.577, 0.683, respectively. Conclusion: Integrated covariate is a simple and effective anthropometric index to identify childhood hypertension.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A non-parametric method was developed and tested to compare the partial areas under two correlated Receiver Operating Characteristic curves. Based on the theory of generalized U-statistics the mathematical formulas have been derived for computing ROC area, and the variance and covariance between the portions of two ROC curves. A practical SAS application also has been developed to facilitate the calculations. The accuracy of the non-parametric method was evaluated by comparing it to other methods. By applying our method to the data from a published ROC analysis of CT image, our results are very close to theirs. A hypothetical example was used to demonstrate the effects of two crossed ROC curves. The two ROC areas are the same. However each portion of the area between two ROC curves were found to be significantly different by the partial ROC curve analysis. For computation of ROC curves with large scales, such as a logistic regression model, we applied our method to the breast cancer study with Medicare claims data. It yielded the same ROC area computation as the SAS Logistic procedure. Our method also provides an alternative to the global summary of ROC area comparison by directly comparing the true-positive rates for two regression models and by determining the range of false-positive values where the models differ. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The growing need for fast sampling of explosives in high throughput areas has increased the demand for improved technology for the trace detection of illicit compounds. Detection of the volatiles associated with the presence of the illicit compounds offer a different approach for sensitive trace detection of these compounds without increasing the false positive alarm rate. This study evaluated the performance of non-contact sampling and detection systems using statistical analysis through the construction of Receiver Operating Characteristic (ROC) curves in real-world scenarios for the detection of volatiles in the headspace of smokeless powder, used as the model system for generalizing explosives detection. A novel sorbent coated disk coined planar solid phase microextraction (PSPME) was previously used for rapid, non-contact sampling of the headspace containers. The limits of detection for the PSPME coupled to IMS detection was determined to be 0.5-24 ng for vapor sampling of volatile chemical compounds associated with illicit compounds and demonstrated an extraction efficiency of three times greater than other commercially available substrates, retaining >50% of the analyte after 30 minutes sampling of an analyte spike in comparison to a non-detect for the unmodified filters. Both static and dynamic PSPME sampling was used coupled with two ion mobility spectrometer (IMS) detection systems in which 10-500 mg quantities of smokeless powders were detected within 5-10 minutes of static sampling and 1 minute of dynamic sampling time in 1-45 L closed systems, resulting in faster sampling and analysis times in comparison to conventional solid phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS) analysis. Similar real-world scenarios were sampled in low and high clutter environments with zero false positive rates. Excellent PSPME-IMS detection of the volatile analytes were visualized from the ROC curves, resulting with areas under the curves (AUC) of 0.85-1.0 and 0.81-1.0 for portable and bench-top IMS systems, respectively. Construction of ROC curves were also developed for SPME-GC-MS resulting with AUC of 0.95-1.0, comparable with PSPME-IMS detection. The PSPME-IMS technique provides less false positive results for non-contact vapor sampling, cutting the cost and providing an effective sampling and detection needed in high-throughput scenarios, resulting in similar performance in comparison to well-established techniques with the added advantage of fast detection in the field.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aims – To develop local contemporary coefficients for the Trauma Injury Severity Score in New Zealand, TRISS(NZ), and to evaluate their performance at predicting survival against the original TRISS coefficients. Methods – Retrospective cohort study of adults who sustained a serious traumatic injury, and who survived until presentation at Auckland City, Middlemore, Waikato, or North Shore Hospitals between 2002 and 2006. Coefficients were estimated using ordinary and multilevel mixed-effects logistic regression models. Results – 1735 eligible patients were identified, 1672 (96%) injured from a blunt mechanism and 63 (4%) from a penetrating mechanism. For blunt mechanism trauma, 1250 (75%) were male and average age was 38 years (range: 15-94 years). TRISS information was available for 1565 patients of whom 204 (13%) died. Area under the Receiver Operating Characteristic (ROC) curves was 0.901 (95%CI: 0.879-0.923) for the TRISS(NZ) model and 0.890 (95% CI: 0.866-0.913) for TRISS (P<0.001). Insufficient data were available to determine coefficients for penetrating mechanism TRISS(NZ) models. Conclusions – Both TRISS models accurately predicted survival for blunt mechanism trauma. However, TRISS(NZ) coefficients were statistically superior to TRISS coefficients. A strong case exists for replacing TRISS coefficients in the New Zealand benchmarking software with these updated TRISS(NZ) estimates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Protein-energy wasting (PEW) is commonly seen in patients with chronic kidney disease (CKD). The condition is characterised by chronic, systemic low-grade inflammation which affects nutritional status by a variety of mechanisms including reducing appetite and food intake and increasing muscle catabolism. PEW is linked with co-morbidities such as cardiovascular disease, and is associated with lower quality of life, increased hospitalisations and a 6-fold increase in risk of death1. Significant gender differences have been found in the severity and effects of several markers of PEW. There have been limited studies testing the ability of anti-inflammatory agents or nutritional interventions to reduce the effects of PEW in dialysis patients. This thesis makes a significant contribution to the understanding of PEW in dialysis patients. It advances understanding of measurement techniques for two of the key components, appetite and inflammation, and explores the effect of fish oil, an anti-inflammatory agent, on markers of PEW in dialysis patients. The first part of the thesis consists of two methodological studies conducted using baseline data. The first study aims to validate retrospective ratings of hunger, desire to eat and fullness on visual analog scales (VAS) (paper and pen and electronic) as a new method of measuring appetite in dialysis patients. The second methodological study aims to assess the ability of a variety of methods available in routine practice to detect the presence of inflammation. The second part of the thesis aims to explore the effect of 12 weeks supplementation with 2g per day of Eicosapentaenoic Acid (EPA), a longchain fatty acid found in fish oil, on markers of PEW. A combination of biomarkers and psychomarkers of appetite and inflammation are the main outcomes being explored, with nutritional status, dietary intake and quality of life included as secondary outcomes. A lead in phase of 3 months prior to baseline was used so that each person acts as their own historical control. The study also examines whether there are gender differences in response to the treatment. Being an exploratory study, an important part of the work is to test the feasibility of the intervention, thus the level of adherence and factors associated with adherence are also presented. The studies were conducted at the hemodialysis unit of the Wesley Hospital. Participants met the following criteria: adult, stage 5 CKD on hemodialysis for at least 3 months, not expected to receive a transplant or switch to another dialysis modality during the study, absence of intellectual impairment or mental illness impairing ability to follow instructions or complete the intervention. A range of intermediate, clinical and patient-centred outcome measures were collected at baseline and 12 weeks. Inflammation was measured using five biomarkers: c-reactive protein (CRP), interleukin-6 (IL6), intercellular adhesion molecule (sICAM-1), vascular cell adhesion molecule (sVCAM-1) and white cell count (WCC). Subjective appetite was measured using the first question from the Appetite and Dietary Assessment (ADAT) tool and VAS for measurements of hunger, desire to eat and fullness. A novel feature of the study was the assessment of the appetite peptides leptin, ghrelin and peptide YY as biomarkers of appetite. Nutritional status/inflammation was assessed using the Malnutrition-Inflammation Score (MIS) and the Patient-Generated Subjective Global Assessment (PG-SGA). Dietary intake was measured using 3-day records. Quality of life was measured using the Kidney Disease Quality of Life Short Form version 1.3 (KDQOL-SF™ v1.3 © RAND University), which combines the Short-Form 36 (SF36) with a kidney-disease specific module2. A smaller range of these variables was available for analysis during the control phase (CRP, ADAT, dietary intake and nutritional status). Statistical analysis was carried out using SPSS version 14 (SPSS Inc, Chicago IL, USA). Analysis of the first part of the thesis involved descriptive and bivariate statistics, as well as Bland-Altman plots to assess agreement between methods, and sensitivity analysis/ROC curves to test the ability of methods to predict the presence of inflammation. The unadjusted (paired ttests) and adjusted (linear mixed model) change over time is presented for the main outcome variables of inflammation and appetite. Results are shown for the whole group followed by analyses according to gender and adherence to treatment. Due to the exploratory nature of the study, trends and clinical significance were considered as important as statistical significance. Twenty-eight patients (mean age 61±17y, 50% male, dialysis vintage 19.5 (4- 101) months) underwent baseline assessment. Seven out of 28 patients (25%) reported sub-optimal appetite (self-reported as fair, poor or very poor) despite all being well nourished (100% SGA A). Using the VAS, ratings of hunger, but not desire to eat or fullness, were significantly (p<0.05) associated with a range of relevant clinical variables including age (r=-0.376), comorbidities (r=-0.380) nutritional status (PG-SGA score, r=-0.451), inflammatory markers (CRP r=-0.383; sICAM-1 r=-0.387) and seven domains of quality of life. Patients expressed a preference for the paper and pen method of administering VAS. None of the tools (appetite, MIS, PG-SGA, albumin or iron) showed an acceptable ability to detect patients who are inflamed. It is recommended that CRP should be tested more frequently as a matter of course rather than seeking alternative methods of measuring inflammation. 27 patients completed the 12 week intervention. 20 patients were considered adherent based on changes in % plasma EPA, which rose from 1.3 (0.94)% to 5.2 (1.1)%, p<0.001, in this group. The major barriers to adherence were forgetting to take the tablets as well as their size. At 12 weeks, inflammatory markers remained steady apart from the white cell count which decreased (7.6(2.5) vs 7.0(2.2) x109/L, p=0.058) and sVCAM-1 which increased (1685(654) vs 2249(925) ng/mL, p=0.001). Subjective appetite using VAS increased (51mm to 57mm, +12%) and there was a trend towards reduction in peptide YY (660(31) vs 600(30) pg/mL, p=0.078). There were some gender differences apparent, with the following adjusted change between baseline and week 12: CRP (males -3% vs females +17%, p=0.19), IL6 (males +17% vs females +48%, p=0.77), sICAM-1 (males -5% vs females +11%, p=0.07), sVCAM-1 (males +54% vs females +19%, p=0.08) and hunger ratings (males 20% vs females -5%, p=0.18). On balance, males experienced a maintainence or reduction in three inflammatory markers and an improvement in hunger ratings, and therefore appeared to have responded better to the intervention. Compared to those who didn’t adhere, adherent patients maintained weight (mean(SE) change: +0.5(1.6) vs - 0.8(1.2) kg, p=0.052) and fat-free mass (-0.1 (1.6) vs -1.8 (1.8) kg, p=0.045). There was no difference in change between the intervention and control phase for CRP, appetite, nutritional status or dietary intake. The thesis makes a significant contribution to the evidence base for understanding of PEW in dialysis patients. It has advanced knowledge of methods of assessing inflammation and appetite. Retrospective ratings of hunger on a VAS appear to be a valid method of assessing appetite although samples which include patients with very poor appetite are required to confirm this. Supplementation with fish oil appeared to improve subjective appetite and dampen the inflammatory response. The effectiveness of the intervention is influenced by gender and adherence. Males appear to be more responsive to the primary outcome variables than females, and the quality of response is improved with better adherence. These results provide evidence to support future interventions aimed at reducing the effects of PEW in dialysis patients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The tear film plays an important role preserving the health of the ocular surface and maintaining the optimal refractive power of the cornea. Moreover dry eye syndrome is one of the most commonly reported eye health problems. This syndrome is caused by abnormalities in the properties of the tear film. Current clinical tools to assess the tear film properties have shown certain limitations. The traditional invasive methods for the assessment of tear film quality, which are used by most clinicians, have been criticized for the lack of reliability and/or repeatability. A range of non-invasive methods of tear assessment have been investigated, but also present limitations. Hence no “gold standard” test is currently available to assess the tear film integrity. Therefore, improving techniques for the assessment of the tear film quality is of clinical significance and the main motivation for the work described in this thesis. In this study the tear film surface quality (TFSQ) changes were investigated by means of high-speed videokeratoscopy (HSV). In this technique, a set of concentric rings formed in an illuminated cone or a bowl is projected on the anterior cornea and their reflection from the ocular surface imaged on a charge-coupled device (CCD). The reflection of the light is produced in the outer most layer of the cornea, the tear film. Hence, when the tear film is smooth the reflected image presents a well structure pattern. In contrast, when the tear film surface presents irregularities, the pattern also becomes irregular due to the light scatter and deviation of the reflected light. The videokeratoscope provides an estimate of the corneal topography associated with each Placido disk image. Topographical estimates, which have been used in the past to quantify tear film changes, may not always be suitable for the evaluation of all the dynamic phases of the tear film. However the Placido disk image itself, which contains the reflected pattern, may be more appropriate to assess the tear film dynamics. A set of novel routines have been purposely developed to quantify the changes of the reflected pattern and to extract a time series estimate of the TFSQ from the video recording. The routine extracts from each frame of the video recording a maximized area of analysis. In this area a metric of the TFSQ is calculated. Initially two metrics based on the Gabor filter and Gaussian gradient-based techniques, were used to quantify the consistency of the pattern’s local orientation as a metric of TFSQ. These metrics have helped to demonstrate the applicability of HSV to assess the tear film, and the influence of contact lens wear on TFSQ. The results suggest that the dynamic-area analysis method of HSV was able to distinguish and quantify the subtle, but systematic degradation of tear film surface quality in the inter-blink interval in contact lens wear. It was also able to clearly show a difference between bare eye and contact lens wearing conditions. Thus, the HSV method appears to be a useful technique for quantitatively investigating the effects of contact lens wear on the TFSQ. Subsequently a larger clinical study was conducted to perform a comparison between HSV and two other non-invasive techniques, lateral shearing interferometry (LSI) and dynamic wavefront sensing (DWS). Of these non-invasive techniques, the HSV appeared to be the most precise method for measuring TFSQ, by virtue of its lower coefficient of variation. While the LSI appears to be the most sensitive method for analyzing the tear build-up time (TBUT). The capability of each of the non-invasive methods to discriminate dry eye from normal subjects was also investigated. The receiver operating characteristic (ROC) curves were calculated to assess the ability of each method to predict dry eye syndrome. The LSI technique gave the best results under both natural blinking conditions and in suppressed blinking conditions, which was closely followed by HSV. The DWS did not perform as well as LSI or HSV. The main limitation of the HSV technique, which was identified during the former clinical study, was the lack of the sensitivity to quantify the build-up/formation phase of the tear film cycle. For that reason an extra metric based on image transformation and block processing was proposed. In this metric, the area of analysis was transformed from Cartesian to Polar coordinates, converting the concentric circles pattern into a quasi-straight lines image in which a block statistics value was extracted. This metric has shown better sensitivity under low pattern disturbance as well as has improved the performance of the ROC curves. Additionally a theoretical study, based on ray-tracing techniques and topographical models of the tear film, was proposed to fully comprehend the HSV measurement and the instrument’s potential limitations. Of special interested was the assessment of the instrument’s sensitivity under subtle topographic changes. The theoretical simulations have helped to provide some understanding on the tear film dynamics, for instance the model extracted for the build-up phase has helped to provide some insight into the dynamics during this initial phase. Finally some aspects of the mathematical modeling of TFSQ time series have been reported in this thesis. Over the years, different functions have been used to model the time series as well as to extract the key clinical parameters (i.e., timing). Unfortunately those techniques to model the tear film time series do not simultaneously consider the underlying physiological mechanism and the parameter extraction methods. A set of guidelines are proposed to meet both criteria. Special attention was given to a commonly used fit, the polynomial function, and considerations to select the appropriate model order to ensure the true derivative of the signal is accurately represented. The work described in this thesis has shown the potential of using high-speed videokeratoscopy to assess tear film surface quality. A set of novel image and signal processing techniques have been proposed to quantify different aspects of the tear film assessment, analysis and modeling. The dynamic-area HSV has shown good performance in a broad range of conditions (i.e., contact lens, normal and dry eye subjects). As a result, this technique could be a useful clinical tool to assess tear film surface quality in the future.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The quality of conceptual business process models is highly relevant for the design of corresponding information systems. In particular, a precise measurement of model characteristics can be beneficial from a business perspective, helping to save costs thanks to early error detection. This is just as true from a software engineering point of view. In this latter case, models facilitate stakeholder communication and software system design. Research has investigated several proposals as regards measures for business process models, from a rather correlational perspective. This is helpful for understanding, for example size and complexity as general driving forces of error probability. Yet, design decisions usually have to build on thresholds, which can reliably indicate that a certain counter-action has to be taken. This cannot be achieved only by providing measures; it requires a systematic identification of effective and meaningful thresholds. In this paper, we derive thresholds for a set of structural measures for predicting errors in conceptual process models. To this end, we use a collection of 2,000 business process models from practice as a means of determining thresholds, applying an adaptation of the ROC curves method. Furthermore, an extensive validation of the derived thresholds was conducted by using 429 EPC models from an Australian financial institution. Finally, significant thresholds were adapted to refine existing modeling guidelines in a quantitative way.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The unique physical and movement characteristics of children necessitate the development of accelerometer equations and cut points that are population specific. The purpose of this study is to develop an ecologically valid cut point for the Biotrainer Pro monitor that reflects a threshold for moderate-intensity physical activity in elementary school children. A sample of 30 children (ages 8-12) wore a Biotrainer monitor while completing a series of 7 movement tasks (calibration phase) and while participating in an organized group activity (cross-validation phase). Videotapes from each session were processed using a computerized direct-observation technique to provide a criterion measure of physical activity. Analyses involved the use of mixed-model regression and receiver operator characteristic (ROC) curves. The results indicated that a cut point of 4 counts/min provides the optimal balance between the related needs for sensitivity (accurately detecting activity) and specificity (limiting misclassification of activity as inactivity). Results with the cross-validation data demonstrated that this value yielded the best overall kappa (.58) and a high classification agreement (84%) for activity determination. The specificity of 93% demonstrates that the proposed cut point can accurately detect activity; however, the lower sensitivity value of 61% suggests that some minutes of activity might be incorrectly classified as inactivity. The cut point of 4 counts/min provides an ecologically valid cut point to capture physical activity in children using the Biotrainer Pro activity monitor.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose expected attainable discrimination (EAD) as a measure to select discrete valued features for reliable discrimination between two classes of data. EAD is an average of the area under the ROC curves obtained when a simple histogram probability density model is trained and tested on many random partitions of a data set. EAD can be incorporated into various stepwise search methods to determine promising subsets of features, particularly when misclassification costs are difficult or impossible to specify. Experimental application to the problem of risk prediction in pregnancy is described.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective To examine the clinical utility of the Cornell Scale for Depression in Dementia (CSDD) in nursing homes. Setting 14 nursing homes in Sydney and Brisbane, Australia. Participants 92 residents with a mean age of 85 years. Measurements Consenting residents were assessed by care staff for depression using the CSDD as part of their routine assessment. Specialist clinicians conducted assessment of depression using the Semi-structured Clinical Diagnostic Interview for DSM-IV-TR Axis I Disorders for residents without dementia or the Provisional Diagnostic Criteria for Depression in Alzheimer Disease for residents with dementia to establish expert clinical diagnoses of depression. The diagnostic performance of the staff completed CSDD was analyzed against expert diagnosis using receiver operating characteristic (ROC) curves. Results The CSDD showed low diagnostic accuracy, with areas under the ROC curve being 0.69, 0.68 and 0.70 for the total sample, residents with dementia and residents without dementia, respectively. At the standard CSDD cutoff score, the sensitivity and specificity were 71% and 59% for the total sample, 69% and 57% for residents with dementia, and 75% and 61% for residents without dementia. The Youden index (for optimizing cut-points) suggested different depression cutoff scores for residents with and without dementia. Conclusion When administered by nursing home staff the clinical utility of the CSDD is highly questionable in identifying depression. The complexity of the scale, the time required for collecting relevant information, and staff skills and knowledge of assessing depression in older people must be considered when using the CSDD in nursing homes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective Explosive ordnance disposal (EOD) often requires technicians to wear multiple protective garments in challenging environmental conditions. The accumulative effect of increased metabolic cost coupled with decreased heat dissipation associated with these garments predisposes technicians to high levels of physiological strain. It has been proposed that a perceptual strain index (PeSI) using subjective ratings of thermal sensation and perceived exertion as surrogate measures of core body temperature and heart rate, may provide an accurate estimation of physiological strain. Therefore, this study aimed to determine if the PeSI could estimate the physiological strain index (PSI) across a range of metabolic workloads and environments while wearing heavy EOD and chemical protective clothing. Methods Eleven healthy males wore an EOD and chemical protective ensemble while walking on a treadmill at 2.5, 4 and 5.5 km·h− 1 at 1% grade in environmental conditions equivalent to wet bulb globe temperature (WBGT) 21, 30 and 37 °C. WBGT conditions were randomly presented and a maximum of three randomised treadmill walking trials were completed in a single testing day. Trials were ceased at a maximum of 60-min or until the attainment of termination criteria. A Pearson's correlation coefficient, mixed linear model, absolute agreement and receiver operating characteristic (ROC) curves were used to determine the relationship between the PeSI and PSI. Results A significant moderate relationship between the PeSI and the PSI was observed [r = 0.77; p < 0.001; mean difference = 0.8 ± 1.1 a.u. (modified 95% limits of agreement − 1.3 to 3.0)]. The ROC curves indicated that the PeSI had a good predictive power when used with two, single-threshold cut-offs to differentiate between low and high levels of physiological strain (area under curve: PSI three cut-off = 0.936 and seven cut-off = 0.841). Conclusions These findings support the use of the PeSI for monitoring physiological strain while wearing EOD and chemical protective clothing. However, future research is needed to confirm the validity of the PeSI for active EOD technicians operating in the field.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Depression is a common psychiatric disorder in older people. The study aimed to examine the screening accuracy of the Geriatric Depression Scale (GDS) and the Collateral Source version of the Geriatric Depression Scale (CS-GDS) in the nursing home setting. Methods Eighty-eight residents from 14 nursing homes were assessed for depression using the GDS and the CS-GDS, and validated against clinician diagnosed depression using the Semi-structured Clinical Diagnostic Interview for DSM-IV-TR Axis I Disorders (SCID) for residents without dementia and the Provisional Diagnostic Criteria for Depression in Alzheimer Disease (PDCdAD) for those with dementia. The screening performances of five versions of the GDS (30-, 15-, 10-, 8-, and 4-item) and two versions of the CS-GDS (30- and 15-item) were analyzed using receiver operating characteristic (ROC) curves. Results Among residents without dementia, both the self-rated (AUC = 0.75–0.79) and proxy-rated (AUC = 0.67) GDS variations performed significantly better than chance in screening for depression. However, neither instrument adequately identified depression among residents with dementia (AUC between 0.57 and 0.70). Among the GDS variations, the 4- and 8-item scales had the highest AUC and the optimal cut-offs were >0 and >3, respectively. Conclusions The validity of the GDS in detecting depression requires a certain level of cognitive functioning. While the CS-GDS is designed to remedy this issue by using an informant, it did not have adequate validity in detecting depression among residents with dementia. Further research is needed on informant selection and other factors that can potentially influence the validity of proxy-based measures in the nursing home setting.