923 resultados para Categorical variable
Resumo:
BACKGROUND Listeria (L.) monocytogenes causes fatal infections in many species including ruminants and humans. In ruminants, rhombencephalitis is the most prevalent form of listeriosis. Using multilocus variable number tandem repeat analysis (MLVA) we recently showed that L. monocytogenes isolates from ruminant rhombencephalitis cases are distributed over three genetic complexes (designated A, B and C). However, the majority of rhombencephalitis strains and virtually all those isolated from cattle cluster in MLVA complex A, indicating that strains of this complex may have increased neurotropism and neurovirulence. The aim of this study was to investigate whether ruminant rhombencephalitis strains have an increased ability to propagate in the bovine hippocampal brain-slice model and can be discriminated from strains of other sources. For this study, forty-seven strains were selected and assayed on brain-slice cultures, a bovine macrophage cell line (BoMac) and a human colorectal adenocarcinoma cell line (Caco-2). They were isolated from ruminant rhombencephalitis cases (n = 21) and other sources including the environment, food, human neurolisteriosis cases and ruminant/human non-encephalitic infection cases (n = 26). RESULTS All but one L. monocytogenes strain replicated in brain slices, irrespectively of the source of the isolate or MLVA complex. The replication of strains from MLVA complex A was increased in hippocampal brain-slice cultures compared to complex C. Immunofluorescence revealed that microglia are the main target cells for L. monocytogenes and that strains from MLVA complex A caused larger infection foci than strains from MLVA complex C. Additionally, they caused larger plaques in BoMac cells, but not CaCo-2 cells. CONCLUSIONS Our brain slice model data shows that all L. monocytogenes strains should be considered potentially neurovirulent. Secondly, encephalitis strains cannot be conclusively discriminated from non-encephalitis strains with the bovine organotypic brain slice model. The data indicates that MLVA complex A strains are particularly adept at establishing encephalitis possibly by virtue of their higher resistance to antibacterial defense mechanisms in microglia cells, the main target of L. monocytogenes.
Resumo:
Several theories assume that successful team coordination is partly based on knowledge that helps anticipating individual contributions necessary in a situational task. It has been argued that a more ecological perspective needs to be considered in contexts evolving dynamically and unpredictably. In football, defensive plays are usually coordinated according to strategic concepts spanning all members and large areas of the playfield. On the other hand, fewer people are involved in offensive plays as these are less projectable and strongly constrained by ecological characteristics. The aim of this study is to test the effects of ecological constraints and player knowledge on decision making in offensive game scenarios. It is hypothesized that both knowledge about team members and situational constraints will influence decisional processes. Effects of situational constraints are expected to be of higher magnitude. Two teams playing in the fourth league of the Swiss Football Federation participate in the study. Forty customized game scenarios were developed based on the coaches’ information about player positions and game strategies. Each player was shown in ball possession four times. Participants were asked to take the perspective of the player on the ball and to choose a passing destination and a recipient. Participants then rated domain specific strengths (e.g., technical skills, game intelligence) of each of their teammates. Multilevel models for categorical dependent variables (team members) will be specified. Player knowledge (rated skills) and ecological constraints (operationalized as each players’ proximity and availability for ball reception) are included as predictor variables. Data are currently being collected. Results will yield effects of parameters that are stable across situations as well as of variable parameters that are bound to situational context. These will enable insight into the degree to which ecological constraints and more enduring team knowledge are involved in decisional processes aimed at coordinating interpersonal action.
Resumo:
The Interstellar Boundary Explorer (IBEX) has observed the interstellar neutral (ISN) gas flow over the past 6 yr during winter/spring when the Earth's motion opposes the ISN flow. Since IBEX observes the interstellar atom trajectories near their perihelion, we can use an analytical model based upon orbital mechanics to determine the interstellar parameters. Interstellar flow latitude, velocity, and temperature are coupled to the flow longitude and are restricted by the IBEX observations to a narrow tube in this parameter space. In our original analysis we found that pointing the spacecraft spin axis slightly out of the ecliptic plane significantly influences the ISN flow vector determination. Introducing the spacecraft spin axis tilt into the analytical model has shown that IBEX observations with various spin axis tilt orientations can substantially reduce the range of acceptable solutions to the ISN flow parameters as a function of flow longitude. The IBEX operations team pointed the IBEX spin axis almost exactly within the ecliptic plane during the 2012-2014 seasons, and about 5° below the ecliptic for half of the 2014 season. In its current implementation the analytical model describes the ISN flow most precisely for the spin axis orientation exactly in the ecliptic. This analysis refines the derived ISN flow parameters with a possible reconciliation between velocity vectors found with IBEX and Ulysses, resulting in a flow longitude lambda∞ = 74.°5 ± 1.°7 and latitude beta∞ = -5.°2 ± 0.°3, but at a substantially higher ISN temperature than previously reported.
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
Studies on the relationship between psychosocial determinants and HIV risk behaviors have produced little evidence to support hypotheses based on theoretical relationships. One limitation inherent in many articles in the literature is the method of measurement of the determinants and the analytic approach selected. ^ To reduce the misclassification associated with unit scaling of measures specific to internalized homonegativity, I evaluated the psychometric properties of the Reactions to Homosexuality scale in a confirmatory factor analytic framework. In addition, I assessed the measurement invariance of the scale across racial/ethnic classifications in a sample of men who have sex with men. The resulting measure contained eight items loading on three first-order factors. Invariance assessment identified metric and partial strong invariance between racial/ethnic groups in the sample. ^ Application of the updated measure to a structural model allowed for the exploration of direct and indirect effects of internalized homonegativity on unprotected anal intercourse. Pathways identified in the model show that drug and alcohol use at last sexual encounter, the number of sexual partners in the previous three months and sexual compulsivity all contribute directly to risk behavior. Internalized homonegativity reduced the likelihood of exposure to drugs, alcohol or higher numbers of partners. For men who developed compulsive sexual behavior as a coping strategy for internalized homonegativity, there was an increase in the prevalence odds of risk behavior. ^ In the final stage of the analysis, I conducted a latent profile analysis of the items in the updated Reactions to Homosexuality scale. This analysis identified five distinct profiles, which suggested that the construct was not homogeneous in samples of men who have sex with men. Lack of prior consideration of these distinct manifestations of internalized homonegativity may have contributed to the analytic difficulty in identifying a relationship between the trait and high-risk sexual practices. ^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
This study evaluates the effectiveness of the Children and Youth Projects' Adolescent Family Life Program, a comprehensive program serving pregnant and parenting adolescents in the economically disadvantaged area of West Dallas. The underlying question asked is what are the relative contributions of the comprehensive, school-linked Adolescent Family Life (AFL) Program compared with the Maternal Health and Family Planning Program (MHFPP), a categorical provider of family planning and reproductive services, towards meeting the immediate and intermediate term needs of adolescent mothers. Also addressed are the protective effects of participation in the Dallas Independent School District Health Special Program, a segregated school for pregnant adolescents.^ A cohort of 339 West Dallas adolescent mothers who delivered babies during a two-year period, 1986 through 1987, are monitored by linking records from Parkland Hospital, the primary provider to hospital services to indigent women in Dallas, the Dallas Independent School District, and the prenatal care providers, the AFL and MHFP Programs. Information is collected on each teen describing her demographic, fertility, service utilization and educational characteristics.^ The study tests the hypothesis that adolescents receiving services from the comprehensive AFL program will be less likely to have a repeat birth and to discontinue school during the 24 month study period, compared with categorical provider clients. Although the study finds that there are no statistically significant differences in repeat deliveries, using survival analysis, or in school continuation between programs, important findings are revealed about the ethnic differences. Black and Hispanic fertility and educational behaviors are compared, and their implications for program design and evaluation discussed. ^
Resumo:
When choosing among models to describe categorical data, the necessity to consider interactions makes selection more difficult. With just four variables, considering all interactions, there are 166 different hierarchical models and many more non-hierarchical models. Two procedures have been developed for categorical data which will produce the "best" subset or subsets of each model size where size refers to the number of effects in the model. Both procedures are patterned after the Leaps and Bounds approach used by Furnival and Wilson for continuous data and do not generally require fitting all models. For hierarchical models, likelihood ratio statistics (G('2)) are computed using iterative proportional fitting and "best" is determined by comparing, among models with the same number of effects, the Pr((chi)(,k)('2) (GREATERTHEQ) G(,ij)('2)) where k is the degrees of freedom for ith model of size j. To fit non-hierarchical as well as hierarchical models, a weighted least squares procedure has been developed.^ The procedures are applied to published occupational data relating to the occurrence of byssinosis. These results are compared to previously published analyses of the same data. Also, the procedures are applied to published data on symptoms in psychiatric patients and again compared to previously published analyses.^ These procedures will make categorical data analysis more accessible to researchers who are not statisticians. The procedures should also encourage more complex exploratory analyses of epidemiologic data and contribute to the development of new hypotheses for study. ^
Resumo:
The purpose of this study was to analyze the implementation of national family planning policy in the United States, which was embedded in four separate statutes during the period of study, Fiscal Years 1976-81. The design of the study utilized a modification of the Sabatier and Mazmanian framework for policy analysis, which defined implementation as the carrying out of statutory policy. The study was divided into two phases. The first part of the study compared the implementation of family planning policy by each of the pertinent statutes. The second part of the study identified factors that were associated with implementation of federal family planning policy within the context of block grants.^ Implemention was measured here by federal dollars spent for family planning, adjusted for the size of the respective state target populations. Expenditure data were collected from the Alan Guttmacher Institute and from each of the federal agencies having administrative authority for the four pertinent statutes, respectively. Data from the former were used for most of the analysis because they were more complete and more reliable.^ The first phase of the study tested the hypothesis that the coherence of a statute is directly related to effective implementation. Equity in the distribution of funds to the states was used to operationalize effective implementation. To a large extent, the results of the analysis supported the hypothesis. In addition to their theoretical significance, these findings were also significant for policymakers insofar they demonstrated the effectiveness of categorical legislation in implementing desired health policy.^ Given the current and historically intermittent emphasis on more state and less federal decision-making in health and human serives, the second phase of the study focused on state level factors that were associated with expenditures of social service block grant funds for family planning. Using the Sabatier-Mazmanian implementation model as a framework, many factors were tested. Those factors showing the strongest conceptual and statistical relationship to the dependent variable were used to construct a statistical model. Using multivariable regression analysis, this model was applied cross-sectionally to each of the years of the study. The most striking finding here was that the dominant determinants of the state spending varied for each year of the study (Fiscal Years 1976-1981). The significance of these results was that they provided empirical support of current implementation theory, showing that the dominant determinants of implementation vary greatly over time. ^
Resumo:
Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^
Resumo:
This thesis project is motivated by the potential problem of using observational data to draw inferences about a causal relationship in observational epidemiology research when controlled randomization is not applicable. Instrumental variable (IV) method is one of the statistical tools to overcome this problem. Mendelian randomization study uses genetic variants as IVs in genetic association study. In this thesis, the IV method, as well as standard logistic and linear regression models, is used to investigate the causal association between risk of pancreatic cancer and the circulating levels of soluble receptor for advanced glycation end-products (sRAGE). Higher levels of serum sRAGE were found to be associated with a lower risk of pancreatic cancer in a previous observational study (255 cases and 485 controls). However, such a novel association may be biased by unknown confounding factors. In a case-control study, we aimed to use the IV approach to confirm or refute this observation in a subset of study subjects for whom the genotyping data were available (178 cases and 177 controls). Two-stage IV method using generalized method of moments-structural mean models (GMM-SMM) was conducted and the relative risk (RR) was calculated. In the first stage analysis, we found that the single nucleotide polymorphism (SNP) rs2070600 of the receptor for advanced glycation end-products (AGER) gene meets all three general assumptions for a genetic IV in examining the causal association between sRAGE and risk of pancreatic cancer. The variant allele of SNP rs2070600 of the AGER gene was associated with lower levels of sRAGE, and it was neither associated with risk of pancreatic cancer, nor with the confounding factors. It was a potential strong IV (F statistic = 29.2). However, in the second stage analysis, the GMM-SMM model failed to converge due to non- concaveness probably because of the small sample size. Therefore, the IV analysis could not support the causality of the association between serum sRAGE levels and risk of pancreatic cancer. Nevertheless, these analyses suggest that rs2070600 was a potentially good genetic IV for testing the causality between the risk of pancreatic cancer and sRAGE levels. A larger sample size is required to conduct a credible IV analysis.^