Biblioteca Digital

978 resultados para Categorical data

Brain areas involved in medial temporal lobe seizures: a principal component analysis of ictal SPECT data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study describes brain areas involved in medial temporal lobe (mTL) seizures of 12 patients. All patients showed so-called oro-alimentary behavior within the first 20 s of clinical seizure manifestation characteristic of mTL seizures. Single photon emission computed tomography (SPECT) images of regional cerebral blood flow (rCBF) were acquired from the patients in ictal and interictal phases and from normal volunteers. Image analysis employed categorical comparisons with statistical parametric mapping and principal component analysis (PCA) to assess functional connectivity. PCA supplemented the findings of the categorical analysis by decomposing the covariance matrix containing images of patients and healthy subjects into distinct component images of independent variance, including areas not identified by the categorical analysis. Two principal components (PCs) discriminated the subject groups: patients with right or left mTL seizures and normal volunteers, indicating distinct neuronal networks implicated by the seizure. Both PCs were correlated with seizure duration, one positively and the other negatively, confirming their physiological significance. The independence of the two PCs yielded a clear clustering of subject groups. The local pattern within the temporal lobe describes critical relay nodes which are the counterpart of oro-alimentary behavior: (1) right mesial temporal zone and ipsilateral anterior insula in right mTL seizures, and (2) temporal poles on both sides that are densely interconnected by the anterior commissure. Regions remote from the temporal lobe may be related to seizure propagation and include positively and negatively loaded areas. These patterns, the covarying areas of the temporal pole and occipito-basal visual association cortices, for example, are related to known anatomic paths.

Estimating parameters in Markov models for longitudinal studies with missing data or surrogate outcomes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^

An ordinal logistic regression model with misclassification of the outcome variable and categorical covariate

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^

Mixture modeling for joint analysis of survival, discrete, and continuous data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^

MGOF: Stata module to perform goodness-of-fit tests for multinomial data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

mgof computes goodness-of-fit tests for the distribution of a discrete (categorical, multinomial) variable. The default is to perform classical large sample chi-squared approximation tests based on Pearson's X2 statistic and the log likelihood ratio (G2) statistic or a statistic from the Cressie-Read family. Alternatively, mgof computes exact tests using Monte Carlo methods or exhaustive enumeration. A Kolmogorov-Smirnov test for discrete data is also provided. The moremata package, also available from SSC, is required.

Measuring work group diversity of categorical variables

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The article presents abstracts of papers for a conference on research methods including "On the Folly of Rewarding A While Hoping for B: A Critical Assessment of Theory Development," "All That Jazz: A Methodological Story of Stories," and "An Accounting of Counting: Universalism, Particularism, and the Counting of Qualitative Data."

Classification of Chenopodium Genus Populations and Species Based on Continuous and Categorical Variables

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 62H30

New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Methods for Imputing Missing Values and Synthesizing Confidential Values for Continuous and Magnitude Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract

Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.

The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.

The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.

The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.

Reference list of 265 sources used for the discovery of relationships between data clusters and metadata properties

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.

Does time ever fly or slow down? The difficult interpretation of psychophysical data on time perception.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Time perception is studied with subjective or semi-objective psychophysical methods. With subjective methods, observers provide quantitative estimates of duration and data depict the psychophysical function relating subjective duration to objective duration. With semi-objective methods, observers provide categorical or comparative judgments of duration and data depict the psychometric function relating the probability of a certain judgment to objective duration. Both approaches are used to study whether subjective and objective time run at the same pace or whether time flies or slows down under certain conditions. We analyze theoretical aspects affecting the interpretation of data gathered with the most widely used semi-objective methods, including single-presentation and paired-comparison methods. For this purpose, a formal model of psychophysical performance is used in which subjective duration is represented via a psychophysical function and the scalar property. This provides the timing component of the model, which is invariant across methods. A decisional component that varies across methods reflects how observers use subjective durations to make judgments and give the responses requested under each method. Application of the model shows that psychometric functions in single-presentation methods are uninterpretable because the various influences on observed performance are inextricably confounded in the data. In contrast, data gathered with paired-comparison methods permit separating out those influences. Prevalent approaches to fitting psychometric functions to data are also discussed and shown to be inconsistent with widely accepted principles of time perception, implicitly assuming instead that subjective time equals objective time and that observed differences across conditions do not reflect differences in perceived duration but criterion shifts. These analyses prompt evidence-based recommendations for best methodological practice in studies on time perception.

Sociodemographic factors influencing adherence to antenatal iron supplementation recommendations among pregnant women in Malawi: Analysis of data from the 2010 Malawi Demographic and Health Survey

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Aim: Maternal morbidity and mortality statistics remain unacceptably high in Malawi. Prominent among the risk factors in the country is anaemia in pregnancy, which generally results from nutritional inadequacy (particularly iron deficiency) and malaria, among other factors. This warrants concerted efforts to increase iron intake among reproductive-age women. This study, among women in Malawi, examined factors determining intake of supplemental iron for at least 90 days during pregnancy. Methods: A weighted sample of 10,750 women (46.7%), from the 23,020 respondents of the 2010 Malawi Demographic and Health Survey (MDHS), were utilized for the study. Univariate, bivariate, and regression techniques were employed. While univariate analysis revealed the percent distributions of all variables, bivariate analysis was used to examine the relationships between individual independent variables and adherence to iron supplementation. Chi-square tests of independence were conducted for categorical variables, with the significance level set at P < 0.05. Two binary logistic regression models were used to evaluate the net effect of independent variables on iron supplementation adherence. Results: Thirty-seven percent of the women adhered to the iron supplementation recommendations during pregnancy. Multivariate analysis indicated that younger age, urban residence, higher education, higher wealth status, and attending antenatal care during the first trimester were significantly associated with increased odds of taking iron supplementation for 90 days or more during pregnancy (P < 0.01). Conclusions: The results indicate low adherence to the World Health Organization’s iron supplementation recommendations among pregnant women in Malawi, and this contributes to negative health outcomes for both mothers and children. Focusing on education interventions that target populations with low rates of iron supplement intake, including campaigns to increase the number of women who attend antenatal care clinics in the first trimester, are recommended to increase adherence to iron supplementation recommendations.

Model Consistency and Data Specification in Property DCF Studies

Relevância:

20.00% 20.00%

Publicador:

A Conditional Autoregressive Gaussiean Process for Irregularly Spaced Multivariate Data with Application to Modelling Large Sets of Binary Data

Relevância:

20.00% 20.00%

Publicador:

Web Data Mining and Reasoning Model

Relevância:

20.00% 20.00%

Publicador:

«
1
2
...
5
6
7
8
9
10
11
...
65
66
»