7 resultados para Count Data
em DigitalCommons@The Texas Medical Center
Resumo:
Persistently low white blood cell count (WBC) and neutrophil count is a well-described phenomenon in persons of African ancestry, whose etiology remains unknown. We recently used admixture mapping to identify an approximately 1-megabase region on chromosome 1, where ancestry status (African or European) almost entirely accounted for the difference in WBC between African Americans and European Americans. To identify the specific genetic change responsible for this association, we analyzed genotype and phenotype data from 6,005 African Americans from the Jackson Heart Study (JHS), the Health, Aging and Body Composition (Health ABC) Study, and the Atherosclerosis Risk in Communities (ARIC) Study. We demonstrate that the causal variant must be at least 91% different in frequency between West Africans and European Americans. An excellent candidate is the Duffy Null polymorphism (SNP rs2814778 at chromosome 1q23.2), which is the only polymorphism in the region known to be so differentiated in frequency and is already known to protect against Plasmodium vivax malaria. We confirm that rs2814778 is predictive of WBC and neutrophil count in African Americans above beyond the previously described admixture association (P = 3.8 x 10(-5)), establishing a novel phenotype for this genetic variant.
Resumo:
OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.
Resumo:
Information overload is a significant problem for modern medicine. Searching MEDLINE for common topics often retrieves more relevant documents than users can review. Therefore, we must identify documents that are not only relevant, but also important. Our system ranks articles using citation counts and the PageRank algorithm, incorporating data from the Science Citation Index. However, citation data is usually incomplete. Therefore, we explore the relationship between the quantity of citation information available to the system and the quality of the result ranking. Specifically, we test the ability of citation count and PageRank to identify "important articles" as defined by experts from large result sets with decreasing citation information. We found that PageRank performs better than simple citation counts, but both algorithms are surprisingly robust to information loss. We conclude that even an incomplete citation database is likely to be effective for importance ranking.
Resumo:
Information overload is a significant problem for modern medicine. Searching MEDLINE for common topics often retrieves more relevant documents than users can review. Therefore, we must identify documents that are not only relevant, but also important. Our system ranks articles using citation counts and the PageRank algorithm, incorporating data from the Science Citation Index. However, citation data is usually incomplete. Therefore, we explore the relationship between the quantity of citation information available to the system and the quality of the result ranking. Specifically, we test the ability of citation count and PageRank to identify "important articles" as defined by experts from large result sets with decreasing citation information. We found that PageRank performs better than simple citation counts, but both algorithms are surprisingly robust to information loss. We conclude that even an incomplete citation database is likely to be effective for importance ranking.
Resumo:
In numerous intervention studies and education field trials, random assignment to treatment occurs in clusters rather than at the level of observation. This departure of random assignment of units may be due to logistics, political feasibility, or ecological validity. Data within the same cluster or grouping are often correlated. Application of traditional regression techniques, which assume independence between observations, to clustered data produce consistent parameter estimates. However such estimators are often inefficient as compared to methods which incorporate the clustered nature of the data into the estimation procedure (Neuhaus 1993).1 Multilevel models, also known as random effects or random components models, can be used to account for the clustering of data by estimating higher level, or group, as well as lower level, or individual variation. Designing a study, in which the unit of observation is nested within higher level groupings, requires the determination of sample sizes at each level. This study investigates the design and analysis of various sampling strategies for a 3-level repeated measures design on the parameter estimates when the outcome variable of interest follows a Poisson distribution. ^ Results study suggest that second order PQL estimation produces the least biased estimates in the 3-level multilevel Poisson model followed by first order PQL and then second and first order MQL. The MQL estimates of both fixed and random parameters are generally satisfactory when the level 2 and level 3 variation is less than 0.10. However, as the higher level error variance increases, the MQL estimates become increasingly biased. If convergence of the estimation algorithm is not obtained by PQL procedure and higher level error variance is large, the estimates may be significantly biased. In this case bias correction techniques such as bootstrapping should be considered as an alternative procedure. For larger sample sizes, those structures with 20 or more units sampled at levels with normally distributed random errors produced more stable estimates with less sampling variance than structures with an increased number of level 1 units. For small sample sizes, sampling fewer units at the level with Poisson variation produces less sampling variation, however this criterion is no longer important when sample sizes are large. ^ 1Neuhaus J (1993). “Estimation efficiency and Tests of Covariate Effects with Clustered Binary Data”. Biometrics , 49, 989–996^
Resumo:
Patients who had started HAART (Highly Active Anti-Retroviral Treatment) under previous aggressive DHHS guidelines (1997) underwent a life-long continuous HAART that was associated with many short term as well as long term complications. Many interventions attempted to reduce those complications including intermittent treatment also called pulse therapy. Many studies were done to study the determinants of rate of fall in CD4 count after interruption as this data would help guide treatment interruptions. The data set used here was a part of a cohort study taking place at the Johns Hopkins AIDS service since January 1984, in which the data were collected both prospectively and retrospectively. The patients in this data set consisted of 47 patients receiving via pulse therapy with the aim of reducing the long-term complications. ^ The aim of this project was to study the impact of virologic and immunologic factors on the rate of CD4 loss after treatment interruption. The exposure variables under investigation included CD4 cell count and viral load at treatment initiation. The rates of change of CD4 cell count after treatment interruption was estimated from observed data using advanced longitudinal data analysis methods (i.e., linear mixed model). Using random effects accounted for repeated measures of CD4 per person after treatment interruption. The regression coefficient estimates from the model was then used to produce subject specific rates of CD4 change accounting for group trends in change. The exposure variables of interest were age, race, and gender, CD4 cell counts and HIV RNA levels at HAART initiation. ^ The rate of fall of CD4 count did not depend on CD4 cell count or viral load at initiation of treatment. Thus these factors may not be used to determine who can have a chance of successful treatment interruption. CD4 and viral load were again studied by t-tests and ANOVA test after grouping based on medians and quartiles to see any difference in means of rate of CD4 fall after interruption. There was no significant difference between the groups suggesting that there was no association between rate of fall of CD4 after treatment interruption and above mentioned exposure variables. ^
Resumo:
Purpose. This project was designed to describe the association between wasting and CD4 cell counts in HIV-infected men in order to better understand the role of wasting in progression of HIV infection.^ Methods. Baseline and prevalence data were collected from a cross-sectional survey of 278 HIV-infected men seen at the Houston Veterans Affairs Medical Center Special Medicine Clinic, from June 1, 1991 to January 1, 1994. A follow-up study was conducted among those at risk, to investigate the incidence of wasting and the association between wasting and low CD4 cell counts. Wasting was described by four methods. Z-scores for age-, sex-, and height-adjusted weight; sex-, and age-adjusted mid-arm muscle circumference (MAMC); and fat-free mass; and the ratio of extra-cellular mass (ECM) to body-cell mass (BCM) $>$ 1.20. FFM, ECM, and BCM were estimated from bioelectrical impedance analysis. MAMC was calculated from triceps skinfold and mid-arm circumference. The relationship between wasting and covariates was examined with logistic regression in the cross-sectional study, and with Poisson regression in the follow-up study. The association between death and wasting was examined with Cox's regression.^ Results. The prevalence of wasting ranged from 5% (weight and ECM:BCM) to almost 14% (MAMC and FFM) among the 278 men examined. The odds of wasting, associated with baseline CD4 cell count $<$200, was significant for each method but weight, and ranged from 4.6 to 12.7. Use of antiviral therapy was significantly protective of MAMC, FFM and ECM:BCM (OR $\approx$ 0.2), whereas the need for antibacterial therapy was a risk (OR 3.1, 95% CI 1.1-8.7). The average incidence of wasting ranged from 4 to 16 per 100 person-years among the approximately 145 men followed for 160 person-years. Low CD4 cell count seemed to increase the risk of wasting, but statistical significance was not reached. The effect of the small sample size on the power to detect a significant association should be considered. Wasting, by MAMC and FFM, was significantly associated with death, after adjusting for baseline serum albumin concentration and CD4 cell count.^ Conclusions. Wasting by MAMC and FFM were strongly associated with baseline CD4 cell counts in both the prevalence and incidence study and strong predictors of death. Of the two methods, MAMC is convenient, has available reference population data, may be the most appropriate for assessing the nutritional status of HIV-infected men. ^