15 resultados para multiple discriminant analysis
em DigitalCommons@The Texas Medical Center
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
The relative influence of race, income, education, and Food Stamp Program participation/nonparticipation on the food and nutrient intake of 102 fecund women ages 18-45 years in a Florida urban clinic population was assessed using the technique of multiple regression analysis. Study subgroups were defined by race and Food Stamp Program participation status. Education was found to have the greatest influence on food and nutrient intake. Race was the next most influential factor followed in order by Food Stamp Program participation and income. The combined effect of the four independent variables explained no more than 19 percent of the variance for any of the food and nutrient intake variables. This would indicate that a more complex model of influences is needed if variations in food and nutrient intake are to be fully explained.^ A socioeconomic questionnaire was administered to investigate other factors of influence. The influence of the mother, frequency and type of restaurant dining, and perceptions of food intake and weight were found to be factors deserving further study.^ Dietary data were collected using the 24-hour recall and food frequency checklist. Descriptive dietary findings indicated that iron and calcium were nutrients where adequacy was of concern for all study subgroups. White Food Stamp Program participants had the greatest number of mean nutrient intake values falling below the 1980 Recommended Dietary Allowances (RDAs). When Food Stamp Program participants were contrasted to nonparticipants, mean intakes of six nutrients (kilocalories, calcium, iron, vitamin A, thiamin, and riboflavin) were below the 1980 RDA compared to five mean nutrient intakes (kilocalories, calcium, iron, thiamin and riboflavin) for the nonparticipants. Use of the Index of Nutritional Quality (INQ), however, revealed that the quality of the diet of Food Stamp Program participants per 1000 kilocalories was adequate with exception of calcium and iron. Intakes of these nutrients were also not adequate on a 1000 kilocalorie basis for the nonparticipant group. When mean nutrient intakes of the groups were compared using Student's t-test oleicacid intake was the only significant difference found. Being a nonparticipant in the Food Stamp Program was found to be associated with more frequent consumption of cookies, sweet rolls, doughnuts, and honey. The findings of this study contradict the negative image of the Food Stamp Program participant and emphasize the importance of education. ^
Resumo:
Background. Accurate measurement of attitudes toward participation in cancer treatment trials (CTs) and cancer prevention trials (CPTs) across varied groups could assist health researchers and educators when addressing attitudinal barriers to participation in these trials. ^ Methods. The Attitudes toward Cancer Trials Scales (ACTS) instrument development was based on a conceptual model developed from research literature, clinical practice experience, and empirical testing of items with a sample of 312 respondents. The ACTS contains two scales, the Cancer Trials (CT) scale (4 components; 18 items) and the Cancer Prevention Trials (CPT) scale (3 components; 16 items). Cronbach's alpha values for the CT and CPT scales, respectively, were 0.86 and 0.89. These two scales along with sociodemographic and cancer trial history variables were distributed in a mail survey of former patients of a large cancer research center. The disproportionate stratified probability sampling procedure yielded 925 usable responses (54% response rate). ^ Results. Prevalence of favorable attitudes toward CTs and CPTs was 66% and 69%, respectively. There were no significant differences in mean scale scores by cancer site or gender, but African Americans had more favorable attitudes toward CTs than European Americans. Multiple regression analysis indicated that older age, lower education level, and prior CT participation history were associated with more favorable attitudes toward CTs. Prior CT participation and prior CPT participation were associated with more favorable attitudes toward CPTs. Results also provided evidence of reliability and construct validity for both scales. ^ Conclusions. Middle age, higher education, and European American ethnicity are associated with less positive attitudes about participating in cancer treatment trials. Availability of a psychometrically sound instrument to measure attitudes may facilitate a better understanding decision making regarding participation in CTs and CPTs. It is this author's intention that the ACTS' scales will be used by other investigators to measure attitudes toward CTs and CPTs in various groups of persons, and that the many issues regarding participation in trials might become more explicit. ^
Resumo:
This study compared three body measurements, height, hip width (bitrochanteric) and foot length, in 120 Hispanic women who had their first birth by cesarean section (N = 60) or by spontaneous vaginal delivery (N = 60). The objective of the study was to see if there were differences in these measurements that could be useful in predicting cephalopelvic disproportion. Data were collected from two public hospitals in Houston Texas over a 10 month period from December 1994 to October 1995. The statistical technique used to evaluate the measures was discriminant analysis.^ Women who delivered by cesarean section were older, shorter, had shorter feet and delivered heavier infants. There were no differences in the bitrochanteric widths of the women or in the mean gestational age or Apgar scores of the infants.^ Significantly more of the mothers and infants were ill following cesarean section delivery. Maternal illness was usually infection; infant illness was primarily infection or respiratory difficulties.^ Discriminant analysis is a technique which allows for classification and prediction to which group a particular entity will belong given a certain set of variables. Using discriminant analysis, with a probability of cesarean section 50 percent, the best combination to classify who would have a cesarean section was height and hip width, correctly classifying 74.2 percent of those who needed surgery. When the probability of cesarean section was 10 percent and probability of vaginal delivery was 90 percent, the best predictor of who would need operative delivery was height, hip width and age, correctly classifying 56.2 percent. In the population from which the study participants were selected the incidence of cephalopelvic disproportion was low, approximately 1 percent.^ With the technologic assistance available in most of the developed world, it is likely that the further pursuit of different measures and their use would not be of much benefit in attempting to predict and diagnose disproportion. However, in areas of the world where much of obstetrics is "hands on", the availability of technology extremely limited, and the incidence of disproportion larger, the use of anthropometric measures might be useful and of some potential benefit. ^
Resumo:
Recently it has been proposed that the evaluation of effects of pollutants on aquatic organisms can provide an early warning system of potential environmental and human health risks (NRC 1991). Unfortunately there are few methods available to aquatic biologists to conduct assessments of the effects of pollutants on aquatic animal community health. The primary goal of this research was to develop and evaluate the feasibility of such a method. Specifically, the primary objective of this study was to develop a prototype rapid bioassessment technique similar to the Index of Biotic Integrity (IBI) for the upper Texas and Northwestern Gulf of Mexico coastal tributaries. The IBI consists of a series of "metrics" which describes specific attributes of the aquatic community. Each of these metrics are given a score which is then subtotaled to derive a total assessment of the "health" of the aquatic community. This IBI procedure may provide an additional assessment tool for professionals in water quality management.^ The experimental design consisted primarily of compiling previously collected data from monitoring conducted by the Texas Natural Resource Conservation Commission (TNRCC) at five bayous classified according to potential for anthropogenic impact and salinity regime. Standardized hydrological, chemical, and biological monitoring had been conducted in each of these watersheds. The identification and evaluation of candidate metrics for inclusion in the estuarine IBI was conducted through the use of correlation analysis, cluster analysis, stepwise and normal discriminant analysis, and evaluation of cumulative distribution frequencies. Scores of each included metric were determined based on exceedances of specific percentiles. Individual scores were summed and a total IBI score and rank for the community computed.^ Results of these analyses yielded the proposed metrics and rankings listed in this report. Based on the results of this study, incorporation of an estuarine IBI method as a water quality assessment tool is warranted. Adopted metrics were correlated to seasonal trends and less so to salinity gradients observed during the study (0-25 ppt). Further refinement of this method is needed using a larger more inclusive data set which includes additional habitat types, salinity ranges, and temporal variation. ^
Resumo:
Background. The increasing emphasis on medical outcomes and cost containment has made it imperative to identify patient populations in which aggressive nutritional care can improve quality of care. The aim of this prospective study was to implement a standardized early jejunal feeding protocol for patients undergoing small and large bowel resection, and to evaluate its effect on patient outcome and cost.^ Methods. Treatment patients (n = 81) who met protocol inclusion criteria had a jejunal feeding tube inserted at the time of surgery. Feeding was initiated at 10 cc/hour within 12 hours after bowel resection and progressed if hemodynamically stable. The control group (n = 159) received usual care. Outcome measures included postoperative length of stay, total direct cost, nosocomial infection rate and health status (SF-36) scores.^ Results. By postoperative day 4, the use of total parenteral nutrition (TPN) was significantly greater in the control group compared to the treatment group; however, total nutritional intake was significantly less. Multiple regression analysis indicated an increased likelihood of infection with the use of TPN. A reduction of 3.5 postoperative days (p =.013) with 4.3 fewer TPN days per patient (p =.001) and a 9.6% reduction in infection rate (p =.042) was demonstrated in the treatment group. There was no difference in health status scores between groups at discharge and 3 months post-discharge.^ Conclusion. These positive outcomes and an average total cost savings of $4,145 per treatment patient indicate that the treatment protocol was effective. ^
Resumo:
The purpose of this study was to examine the relationship between enterotoxigenic ETEC and travelers' diarrhea over a period of five years in Guadalajara, Mexico. Specifically, this study identified and characterized ETEC from travelers with diarrhea. The objectives were to study the colonization factor antigens, toxins and antibiotic sensitivity patterns in ETEC from 1992 to 1997 and to study the molecular epidemiology of ETEC by plasmid content and DNA restriction fragment patterns. ^ In this survey of travelers' diarrhea in Guadalajara, Mexico, 928 travelers with diarrhea were screened for enteric pathogens between 1992 and 1997. ETEC were isolated in 195 (19.9%) of the patients, representing the most frequent enteric pathogen identified. ^ A total of 31 antimicrobial susceptibility patterns were identified among ETEC isolates over the five-year period. ^ The 195 ETEC isolates contained two to six plasmids each, which ranged in size from 2.0 to 23 kbp. ^ Three different reproducible rRNA gene restriction patterns (ribotypes R-1 to R-3) were obtained among the 195 isolates with the enzyme, HindIII. ^ Colonization factor antigens (CFAs) were identified in 99 (51%) of the 195 ETEC strains studied. ^ Cluster analysis of the observations seen in the four assays all confirmed the five distinct groups of study-year strains of ETEC. Each group had a >95% similarity level of strains within the group and <60% similarity level between the groups. In addition, discriminant analysis of assay variables used in predicting the ETEC strains, reveal a >80% relationship between both the plasmid and rRNA content of ETEC strains and study-year. ^ These findings, based on laboratory observations of the differences in biochemical, antimicrobial susceptibility, plasmid and ribotype content, suggest complex epidemiology for ETEC strains in a population with travelers' diarrhea. The findings of this study may have implications for our understanding of the epidemiology, transmission, treatment, control and prevention of the disease. It has been suggested that an ETEC vaccine for humans should contain the most prevalent CFAs. Therefore, it is important to know the prevalence of these factors in ETEC in various geographical areas. ^ CFAs described in this dissertation may be used in different epidemiological studies in which the prevalence of CFAs and other properties on ETEC will be evaluated. Furthermore, in spite of an intense search in near 200 ETEC isolates for strains that may have clonal relationship, we failed to identify such strains. However, further studies are in progress to construct suitable live vaccine strains and to introduce several of CFAs in the same host organism by recombinant DNA techniques (Dr. Ann-Mari Svennerholm's lab). (Abstract shortened by UMI.)^
Resumo:
Objective. This study examines post-crisis family stress, coping, communication, and adaptation using the Double ABC-X Model of Family Adaptation in families with a pregnant or postpartum adolescent living at home. ^ Methods. Ninety-eight pregnant and parenting adolescents between ages 14 and 18 years (Group 1 at 20 or more weeks gestation; Group 2 at delivery and 8 weeks postpartum) and their parent(s) completed instruments congruent with the model to measure family stress, coping, communication, and adaptation. Descriptive family data was obtained. Mother-daughter data was analyzed for differences between subjects and within subjects using paired t-tests. Correlational analysis was used to examine relationships among variables. ^ Results. More than 90% of families were Hispanic. There were no significant differences between mother and daughter mean scores for family stress or communication. Adolescent coping was not significantly correlated to family coping at any interval. Adolescent family adaptation scores were significantly lower than mothers' scores at delivery and 8 weeks postpartum. Mean individual ratings of family variables did not differ significantly between delivery and 8 weeks postpartum. Simultaneous multiple regression analysis showed that stress, coping, and communication significantly influenced adaptation for mothers and daughters at all three intervals. The relative contributions of the three independent variables exhibited different patterns for mothers and daughters. Parent-adolescent communication accounted for most of the variability in adaptation for daughters at all three intervals. Daughters' family stress ratings were significant for adaptability (p = .01) during the pregnancy and for cohesion (p = .03) at delivery. Adolescent coping (p = .03) was significant for cohesion at 8 weeks postpartum. Family stress was a significant influence at all three intervals for mothers' ratings of family adaptation. Parent-adolescent communication was significant for mother's perception of both family cohesion (p < .001) and adaptability (p < .001) at delivery and 8 weeks, but not during pregnancy. ^ Conclusions. Mothers' and daughters' ratings of family processes were similar regarding family stress and communication, but were significantly different for family adaptation. Adolescent coping may not reflect family coping. Family communication is a powerful component in family functioning and may be an important focus for interventions with adolescents and parents. ^
Resumo:
Background/significance. The scarcity of reliable and valid Spanish language instruments for health related research has hindered research with the Hispanic population. Research suggests that fatalistic attitudes are related to poor cancer screening behaviors and may be one reason for low participation of Mexican-Americans in cancer screening. This problem is of major concern because Mexican-Americans constitute the largest Hispanic subgroup in the U.S.^ Purpose. The purposes of this study were: (1) To translate the Powe Fatalism Inventory, (PFI) into Spanish, and culturally adapt the instrument to the Mexican-American culture as found along the U.S.-Mexico border and (2) To test the equivalence between the Spanish translated, culturally adapted version of the PFI and the English version of the PFI to include clarity, content validity, reading level and reliability.^ Design. Descriptive, cross-sectional.^ Methods. The Spanish language translation used a translation model which incorporates a cultural adaptation process. The SPFI was administered to 175 bilingual participants residing in a midsize, U.S-Mexico border city. Data analysis included estimation of Cronbach's alpha, factor analysis, paired samples t-test comparison and multiple regression analysis using SPSS software, as well as measurement of content validity and reading level of the SPFI. ^ Findings. A reliability estimate using Cronbach's alpha coefficient was 0.81 for the SPFI compared to 0.80 for the PFI in this study. Factor Analysis extracted four factors which explained 59% of the variance. Paired t-test comparison revealed no statistically significant differences between the SPFI and PFI total or individual item scores. Content Validity Index was determined to be 1.0. Reading Level was assessed to be less than a 6th grade reading level. The correlation coefficient between the SPFI and PFI was 0.95.^ Conclusions. This study provided strong psychometric evidence that the Spanish translated, culturally adapted SPFI is an equivalent tool to the English version of the PFI in measuring cancer fatalism. This indicates that the two forms of the instrument can be used interchangeably in a single study to accommodate reading and speaking abilities of respondents. ^
Resumo:
The main objective of this study was to attempt to develop some indicators for measuring the food safety status of a country. A conceptual model was put forth by the investigator. The assumption was that food safety status was multifactorily influenced by medico-health levels, food-nutrition programs, and consumer protection activities. However, all these in turn depended upon socio-economic status of the country.^ Twenty-six indicators were reviewed and examined. Seventeen were first screened and three were finally selected, by the stepwise multiple regression analysis, to reflect the food safety status. Sixty-one countries/areas were included in this study.^ The three indicators were life expectancy at birth with multiple correlation coefficient (R2 = 34.62%), adult literacy rate (R2 = 29.66%), and child mortality rate for ages 1-4 (R2 = 9.99%). They showed a cumulative R2 of 57.79%. ^
Resumo:
The purpose of this study was to assess the impact of the Arkansas Long-Term Care Demonstration Project upon Arkansas' Medicaid expenditures and upon the clients it serves. A Retrospective Medicaid expenditure study component used analyses of variance techniques to test for the Project's effects upon aggregated expenditures for 28 demonstration and control counties representing 25 percent of the State's population over four years, 1979-1982.^ A second approach to the study question utilized a 1982 prospective sample of 458 demonstration and control clients from the same 28 counties. The disability level or need for care of each patient was established a priori. The extent to which an individual's variation in Medicaid utilization and costs was explained by patient need, presence or absence of the channeling project's placement decision or some other patient characteristic was examined by multiple regression analysis. Long-term and acute care Medicaid, Medicare, third party, self-pay and the grand total of all Medicaid claims were analyzed for project effects and explanatory relationships.^ The main project effect was to increase personal care costs without reducing nursing home or acute care costs (Prospective Study). Expansion of clients appeared to occur in personal care (Prospective Study) and minimum care nursing home (Retrospective Study) for the project areas. Cost-shifting between Medicaid and Medicare in the project areas and two different patterns of utilization in the North and South projects tended to offset each other such that no differences in total costs between the project areas and demonstration areas occurred. The project was significant ((beta) = .22, p < .001) only for personal care costs. The explanatory power of this personal care regression model (R('2) = .36) was comparable to other reported health services utilization models. Other variables (Medicare buy-in, level of disability, Social Security Supplemental Income (SSI), net monthly income, North/South areas and age) explained more variation in the other twelve cost regression models. ^
Resumo:
The National Health Planning and Resources Development Act of 1974 (Public Law 93-641) requires that health systems agencies (HSAs) plan for their health service areas by the use of existing data to the maximum extent practicable. Health planning is based on the identificaton of health needs; however, HSAs are, at present, identifying health needs in their service areas in some approximate terms. This lack of specificity has greatly reduced the effectiveness of health planning. The intent of this study is, therefore, to explore the feasibility of predicting community levels of hospitalized morbidity by diagnosis by the use of existing data so as to allow health planners to plan for the services associated with specific diagnoses.^ The specific objectives of this study are (a) to obtain by means of multiple regression analysis a prediction equation for hospital admission by diagnosis, i.e., select the variables that are related to demand for hospital admissions; (b) to examine how pertinent the variables selected are; and (c) to see if each equation obtained predicts well for health service areas.^ The existing data on hospital admissions by diagnosis are those collected from the National Hospital Discharge Surveys, and are available in a form aggregated to the nine census divisions. When the equations established with such data are applied to local health service areas for prediction, the application is subject to the criticism of the theory of ecological fallacy. Since HSAs have to rely on the availability of existing data, it is imperative to examine whether or not the theory of ecological fallacy holds true in this case.^ The results of the study show that the equations established are highly significant and the independent variables in the equations explain the variation in the demand for hospital admission well. The predictability of these equations is good when they are applied to areas at the same ecological level but become poor, predominantly due to ecological fallacy, when they are applied to health service areas.^ It is concluded that HSAs can not predict hospital admissions by diagnosis without primary data collection as discouraged by Public Law 93-641. ^
Resumo:
The efficacy of waste stabilization lagoons for the treatment of five priority pollutants and two widely used commercial compounds was evaluated in laboratory model ponds. Three ponds were designed to simulate a primary anaerobic lagoon, a secondary facultative lagoon, and a tertiary aerobic lagoon. Biodegradation, volatilization, and sorption losses were quantified for bis(2-chloroethyl) ether, benzene, toluene, naphthalene, phenanthrene, ethylene glycol, and ethylene glycol monoethyl ether. A statistical model using a log normal transformation indicated biodegradation of bis(2-chloroethyl) ether followed first-order kinetics. Additionally, multiple regression analysis indicated biochemical oxygen demand was the water quality variable most highly correlated with bis(2-chloroethyl) ether effluent concentration. ^
Resumo:
Using a retrospective cross-sectional approach, this study quantitatively analyzed foodborne illness data, restaurant inspection data, and census-derived socioeconomic and demographic data within Harris County, Texas between 2005 and 2010. The main research question investigated involved determining the extent to which contextual and regulatory conditions distinguish outbreak and non-outbreak establishments within Harris County. Two groups of Harris County establishments were analyzed: outbreak and non-outbreak restaurants. STATA 11 was employed to determine the average profiles of each category across both the regulatory and socioeconomic (contextual) variables. Cross tabulations of all of the non-quantitative variables were also performed, and finally, a discriminant analysis was conducted to assess how well the variables were able to allocate the restaurants into their respective categories. Contextual and regulatory conditions were found to be minimally associated with the occurrence of foodborne outbreaks within Harris County. Across both the categories (outbreak and non-outbreak establishments), variables included were extremely similar in means, and when possible to observe, distributions. The variables analyzed in this study, both regulatory and contextual, were not found to significantly allocate the establishments into their correct outbreak or non-outbreak categories. The implications of these findings are that regulatory processes and guidelines in place in Harris County do not effectively to distinguish outbreak from non-outbreak restaurants. Additionally, no socioeconomic or racial/ethnic patterns are apparent in the incidence of foodborne disease in the county. ^