7 resultados para equilibrium asset pricing models with latent variables
em DigitalCommons@The Texas Medical Center
Resumo:
Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^
Resumo:
Ethnic violence appears to be the major source of violence in the world. Ethnic hostilities are potentially all-pervasive because most countries in the world are multi-ethnic. Public health's focus on violence documents its increasing role in this issue.^ The present study is based on a secondary analysis of a dataset of responses by 272 individuals from four ethnic groups (Anglo, African, Mexican, and Vietnamese Americans) who answered questions regarding variables related to ethnic violence from a general questionnaire which was distributed to ethnically diverse purposive, nonprobability, self-selected groups of individuals in Houston, Texas, in 1993.^ One goal was psychometric: learning about issues in analysis of datasets with modest numbers, comparison of two approaches to dealing with missing observations not missing at random (conducting analysis on two datasets), transformation analysis of continuous variables for logistic regression, and logistic regression diagnostics.^ Regarding the psychometric goal, it was concluded that measurement model analysis was not possible with a relatively small dataset with nonnormal variables, such as Likert-scaled variables; therefore, exploratory factor analysis was used. The two approaches to dealing with missing values resulted in comparable findings. Transformation analysis suggested that the continuous variables were in the correct scale, and diagnostics that the model fit was adequate.^ The substantive portion of the analysis included the testing of four hypotheses. Hypothesis One proposed that attitudes/efficacy regarding alternative approaches to resolving grievances from the general questionnaire represented underlying factors: nonpunitive social norms and strategies for addressing grievances--using the political system, organizing protests, using the system to punish offenders, and personal mediation. Evidence was found to support all but one factor, nonpunitive social norms.^ Hypothesis Two proposed that the factor variables and the other independent variables--jail, grievance, male, young, and membership in a particular ethnic group--were associated with (non)violence. Jail, grievance, and not using the political system to address grievances were associated with a greater likelihood of intergroup violence.^ No evidence was found to support Hypotheses Three and Four, which proposed that grievance and ethnic group membership would interact with other variables (i.e., age, gender, etc.) to produce variant levels of subgroup (non)violence.^ The generalizability of the results of this study are constrained by the purposive self-selected nature of the sample and small sample size (n = 272).^ Suggestions for future research include incorporating other possible variables or factors predictive of intergroup violence in models of the kind tested here, and the development and evaluation of interventions that promote electoral and nonelectoral political participation as means of reducing interethnic conflict. ^
Resumo:
Cachexia is very common among patients with advanced pancreatic cancer and is a marker of poor prognosis. Weight loss in cachexia is due to both adipose and muscle compartments, and sarcopenia (severe muscle depletion) is associated with worse outcomes. Curcumin has shown a myriad of biological effects, including anti-cancer and anti-inflammatory. The ability of curcumin to attenuate cachexia and muscle loss has been tested in animal models, with conflicting results so far. The hypothesis of this study was that patients with advanced pancreatic cancer treated with curcumin for two months have less fat and muscle loss as compared to matched controls not treated with this compound. A matched 1:2 case-control retrospective study was conducted with 22 patients with pancreatic cancer who were treated with curcumin on a previous protocol and 44 untreated controls with the same diagnosis matched by age, gender, time from advanced cancer, body mass index, and number of prior therapies. Data was collected regarding oncologic treatment, medication use, weights, heights, and survival. Body composition was determined by computerized tomography analyses at two timepoints separated by 60±20 days. For treated patients, the first image was at the beginning of treatment and for controls it was determined by the matching time from advanced cancer. The evolution of body composition over time was quantitatively analyzed comparing both groups. All patients lost weight both due to fat and muscle losses, and patients treated with curcumin presented greater losses both in lean adipose body mass. Use of medications, chemotherapy, age, time from advanced cancer, baseline albumin, performance status, and number of prior therapies were not independently correlated with changes in body composition variables. Patients treated with curcumin had borderline shorter survival when compared with untreated patients. Sarcopenic treated patients had significantly shorter survival than non-sarcopenic counterparts, and sarcopenia status was not associated with survival among the controls. Treated patients with shorter survival showed a tendency to lose more lean and especially fat body mass as compared to untreated patients, maybe suggesting an effect of curcumin on shifting weight loss towards the end of life by impacting its mechanisms.
Resumo:
In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^
Resumo:
Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
My dissertation focuses on developing methods for gene-gene/environment interactions and imprinting effect detections for human complex diseases and quantitative traits. It includes three sections: (1) generalizing the Natural and Orthogonal interaction (NOIA) model for the coding technique originally developed for gene-gene (GxG) interaction and also to reduced models; (2) developing a novel statistical approach that allows for modeling gene-environment (GxE) interactions influencing disease risk, and (3) developing a statistical approach for modeling genetic variants displaying parent-of-origin effects (POEs), such as imprinting. In the past decade, genetic researchers have identified a large number of causal variants for human genetic diseases and traits by single-locus analysis, and interaction has now become a hot topic in the effort to search for the complex network between multiple genes or environmental exposures contributing to the outcome. Epistasis, also known as gene-gene interaction is the departure from additive genetic effects from several genes to a trait, which means that the same alleles of one gene could display different genetic effects under different genetic backgrounds. In this study, we propose to implement the NOIA model for association studies along with interaction for human complex traits and diseases. We compare the performance of the new statistical models we developed and the usual functional model by both simulation study and real data analysis. Both simulation and real data analysis revealed higher power of the NOIA GxG interaction model for detecting both main genetic effects and interaction effects. Through application on a melanoma dataset, we confirmed the previously identified significant regions for melanoma risk at 15q13.1, 16q24.3 and 9p21.3. We also identified potential interactions with these significant regions that contribute to melanoma risk. Based on the NOIA model, we developed a novel statistical approach that allows us to model effects from a genetic factor and binary environmental exposure that are jointly influencing disease risk. Both simulation and real data analyses revealed higher power of the NOIA model for detecting both main genetic effects and interaction effects for both quantitative and binary traits. We also found that estimates of the parameters from logistic regression for binary traits are no longer statistically uncorrelated under the alternative model when there is an association. Applying our novel approach to a lung cancer dataset, we confirmed four SNPs in 5p15 and 15q25 region to be significantly associated with lung cancer risk in Caucasians population: rs2736100, rs402710, rs16969968 and rs8034191. We also validated that rs16969968 and rs8034191 in 15q25 region are significantly interacting with smoking in Caucasian population. Our approach identified the potential interactions of SNP rs2256543 in 6p21 with smoking on contributing to lung cancer risk. Genetic imprinting is the most well-known cause for parent-of-origin effect (POE) whereby a gene is differentially expressed depending on the parental origin of the same alleles. Genetic imprinting affects several human disorders, including diabetes, breast cancer, alcoholism, and obesity. This phenomenon has been shown to be important for normal embryonic development in mammals. Traditional association approaches ignore this important genetic phenomenon. In this study, we propose a NOIA framework for a single locus association study that estimates both main allelic effects and POEs. We develop statistical (Stat-POE) and functional (Func-POE) models, and demonstrate conditions for orthogonality of the Stat-POE model. We conducted simulations for both quantitative and qualitative traits to evaluate the performance of the statistical and functional models with different levels of POEs. Our results showed that the newly proposed Stat-POE model, which ensures orthogonality of variance components if Hardy-Weinberg Equilibrium (HWE) or equal minor and major allele frequencies is satisfied, had greater power for detecting the main allelic additive effect than a Func-POE model, which codes according to allelic substitutions, for both quantitative and qualitative traits. The power for detecting the POE was the same for the Stat-POE and Func-POE models under HWE for quantitative traits.