Biblioteca Digital

958 resultados para multivariate binary data

Assessing the fit of unidimensional IRT models for binary data under model misspecification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model misspecification affects the classical test statistics used to assess the fit of the Item Response Theory (IRT) models. Robust tests have been derived under model misspecification, as the Generalized Lagrange Multiplier and Hausman tests, but their use has not been largely explored in the IRT framework. In the first part of the thesis, we introduce the Generalized Lagrange Multiplier test to detect differential item response functioning in IRT models for binary data under model misspecification. By means of a simulation study and a real data analysis, we compare its performance with the classical Lagrange Multiplier test, computed using the Hessian and the cross-product matrix, and the Generalized Jackknife Score test. The power of these tests is computed empirically and asymptotically. The misspecifications considered are local dependence among items and non-normal distribution of the latent variable. The results highlight that, under mild model misspecification, all tests have good performance while, under strong model misspecification, the performance of the tests deteriorates. None of the tests considered show an overall superior performance than the others. In the second part of the thesis, we extend the Generalized Hausman test to detect non-normality of the latent variable distribution. To build the test, we consider a seminonparametric-IRT model, that assumes a more flexible latent variable distribution. By means of a simulation study and two real applications, we compare the performance of the Generalized Hausman test with the M2 limited information goodness-of-fit test and the Likelihood-Ratio test. Additionally, the information criteria are computed. The Generalized Hausman test has a better performance than the Likelihood-Ratio test in terms of Type I error rates and the M2 test in terms of power. The performance of the Generalized Hausman test and the information criteria deteriorates when the sample size is small and with a few items.

Fitting genetic models to twin data with binary and ordered categorical responses: A comparison of structural equation modelling and Bayesian hierarchical models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a correct model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.

Multivariate models for correlated count data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we deal with the problem of overdispersion beyond extra zeros for a collection of counts that can be correlated. Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial distributions have been considered. First, we propose a multivariate count model in which all counts follow the same distribution and are correlated. Then we extend this model in a sense that correlated counts may follow different distributions. To accommodate correlation among counts, we have considered correlated random effects for each individual in the mean structure, thus inducing dependency among common observations to an individual. The method is applied to real data to investigate variation in food resources use in a species of marsupial in a locality of the Brazilian Cerrado biome. © 2013 Copyright Taylor and Francis Group, LLC.

Coreferentiality: A New Method for the Hypothesis-Based Analysis of Phenotypes Characterized by Multivariate Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many multifactorial biologic effects, particularly in the context of complex human diseases, are still poorly understood. At the same time, the systematic acquisition of multivariate data has become increasingly easy. The use of such data to analyze and model complex phenotypes, however, remains a challenge. Here, a new analytic approach is described, termed coreferentiality, together with an appropriate statistical test. Coreferentiality is the indirect relation of two variables of functional interest in respect to whether they parallel each other in their respective relatedness to multivariate reference data, which can be informative for a complex effect or phenotype. It is shown that the power of coreferentiality testing is comparable to multiple regression analysis, sufficient even when reference data are informative only to a relatively small extent of 2.5%, and clearly exceeding the power of simple bivariate correlation testing. Thus, coreferentiality testing uses the increased power of multivariate analysis, however, in order to address a more straightforward interpretable bivariate relatedness. Systematic application of this approach could substantially improve the analysis and modeling of complex phenotypes, particularly in the context of human study where addressing functional hypotheses by direct experimentation is often difficult.

Perfil lipídico e fatores biológicos e ambientais : o papel da atividade física

Relevância:

90.00% 90.00%

Publicador:

Resumo:

RESUMO - Introdução: A inatividade física é um dos determinantes major das doenças crónicas não transmissíveis sendo a quarta maior causa de mortalidade no mundo, nomeadamente para as doenças vasculares. A prática regular de atividade física produz adaptações vasculares responsáveis por efeitos benéficos na prevenção e tratamento dos diferentes fatores de risco vascular, nomeadamente através do seu efeito no metabolismo das lipoproteínas. Objetivos: Analisar a interferência da atividade física no perfil lipídico de uma população residente em Portugal. Métodos: Estudo observacional descritivo transversal exploratório com 1027 indivíduos (idade: 18 aos 80 anos, 49% mulheres). Os dados foram analisados em SPSS (versão 20), tendo-se utilizado métodos de estatística descritiva e de análise bivariável entre os factores de risco vascular e as variáveis do perfil lipídico e ainda uma análise multivariável de regressão logística binária para medir a razão de riscos pelo odds ratio. O nível de significância foi estabelecido em 5%. Resultados: Na análise da relação entre atividade física e os biomarcadores do perfil lipídico verificou-se que existem benefícios no que diz respeito ao aumento dos níveis de HDL e de apoA1 e na diminuição dos níveis de TG com a prática regular de atividade física. Conclusões: A atividade física apresenta um papel importante na regulação do perfil lipídico evidenciando a necessidade de implementar estratégias multissectoriais de prevenção dos fatores de risco vascular, nomeadamente na área dos estilos de vida saudáveis que são fundamentais para a prevenção destas condições de saúde e para gerar ganhos em saúde.

Measuring an effect size from dichotomized data : contrasted results whether using a correlation or an odds-ratio

Relevância:

90.00% 90.00%

Publicador:

Resumo:

It is well known that dichotomizing continuous data has the effect to decrease statistical power when the goal is to test for a statistical association between two variables. Modern researchers however are focusing not only on statistical significance but also on an estimation of the "effect size" (i.e., the strength of association between the variables) to judge whether a significant association is also clinically relevant. In this article, we are interested in the consequences of dichotomizing continuous data on the value of an effect size in some classical settings. It turns out that the conclusions will not be the same whether using a correlation or an odds ratio to summarize the strength of association between the variables: Whereas the value of a correlation is typically decreased by a factor pi/2 after each dichotomization, the value of an odds ratio is at the same time raised to the power 2. From a descriptive statistical point of view, it is thus not clear whether dichotomizing continuous data leads to a decrease or to an increase in the effect size, as illustrated using a data set to investigate the relationship between motor and intellectual functions in children and adolescents

Implementing Green IT approach for transferring Big Data over Parallel Data Link

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.

Regression models for bivariate survival data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.

Design considerations in the sequential analysis of matched case–control data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A role for sequential test procedures is emerging in genetic and epidemiological studies using banked biological resources. This stems from the methodology's potential for improved use of information relative to comparable fixed sample designs. Studies in which cost, time and ethics feature prominently are particularly suited to a sequential approach. In this paper sequential procedures for matched case–control studies with binary data will be investigated and assessed. Design issues such as sample size evaluation and error rates are identified and addressed. The methodology is illustrated and evaluated using both real and simulated data sets.

Choosing between Cox proportional hazards and logistic models for interval-censored data via bootstrap

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work develops a new methodology in order to discriminate models for interval-censored data based on bootstrap residual simulation by observing the deviance difference from one model in relation to another, according to Hinde (1992). Generally, this sort of data can generate a large number of tied observations and, in this case, survival time can be regarded as discrete. Therefore, the Cox proportional hazards model for grouped data (Prentice & Gloeckler, 1978) and the logistic model (Lawless, 1982) can befitted by means of generalized linear models. Whitehead (1989) considered censoring to be an indicative variable with a binomial distribution and fitted the Cox proportional hazards model using complementary log-log as a link function. In addition, a logistic model can be fitted using logit as a link function. The proposed methodology arises as an alternative to the score tests developed by Colosimo et al. (2000), where such models can be obtained for discrete binary data as particular cases from the Aranda-Ordaz distribution asymmetric family. These tests are thus developed with a basis on link functions to generate such a fit. The example that motivates this study was the dataset from an experiment carried out on a flax cultivar planted on four substrata susceptible to the pathogen Fusarium oxysoprum. The response variable, which is the time until blighting, was observed in intervals during 52 days. The results were compared with the model fit and the AIC values.

A new snowfall detection algorithm for high latitude regions based on a combination of active and passive sensors

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Precipitation retrieval over high latitudes, particularly snowfall retrieval over ice and snow, using satellite-based passive microwave spectrometers, is currently an unsolved problem. The challenge results from the large variability of microwave emissivity spectra for snow and ice surfaces, which can mimic, to some degree, the spectral characteristics of snowfall. This work focuses on the investigation of a new snowfall detection algorithm specific for high latitude regions, based on a combination of active and passive sensors able to discriminate between snowing and non snowing areas. The space-borne Cloud Profiling Radar (on CloudSat), the Advanced Microwave Sensor units A and B (on NOAA-16) and the infrared spectrometer MODIS (on AQUA) have been co-located for 365 days, from October 1st 2006 to September 30th, 2007. CloudSat products have been used as truth to calibrate and validate all the proposed algorithms. The methodological approach followed can be summarised into two different steps. In a first step, an empirical search for a threshold, aimed at discriminating the case of no snow, was performed, following Kongoli et al. [2003]. This single-channel approach has not produced appropriate results, a more statistically sound approach was attempted. Two different techniques, which allow to compute the probability above and below a Brightness Temperature (BT) threshold, have been used on the available data. The first technique is based upon a Logistic Distribution to represent the probability of Snow given the predictors. The second technique, defined Bayesian Multivariate Binary Predictor (BMBP), is a fully Bayesian technique not requiring any hypothesis on the shape of the probabilistic model (such as for instance the Logistic), which only requires the estimation of the BT thresholds. The results obtained show that both methods proposed are able to discriminate snowing and non snowing condition over the Polar regions with a probability of correct detection larger than 0.5, highlighting the importance of a multispectral approach.

Distancias genéticas entre perfiles moleculares obtenidos desde marcadores multilocus multialélicos

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Para expresar la magnitud de la identidad genética (similaridad) o su complemento (distancia) entre dos individuos caracterizados molecularmente a través de marcadores del tipo microsatélites (SSR), que son multilocusmultialélicos, es necesario elegir una métrica acorde con la naturaleza multivariada de los datos. Comúnmente, las métricas de distancias genéticas son diseñadas para expresar, en un único número, la diferencia genética entre dos poblaciones y son expresadas como función de la frecuencia alélica poblacional. Dichas métricas pueden también ser utilizadas para calcular la distancia entre perfiles individuales, pero las frecuencias alélicas no son continuas en este caso. Alternativamente, se pueden usar distancias geométricas obtenidas como el complemento del índice de similaridad para datos binarios que indican la presencia/ ausencia de cada alelo en un individuo. El objetivo de este trabajo fue evaluar simultáneamente el desempeño de ambos tipos de métricas para ordenar y clasificar individuos en una base de datos generadas a partir de loci de marcadores microsatélites SSR. Se calcularon 11 métricas de distancias a partir de 17 loci SSR obtenidos desde 17 introducciones de un banco de germoplasma de soja [Glycine max (L.) Merr.]. Se evaluó el consenso de los resultados obtenidos para la clasificación de los 17 perfiles moleculares desde varias métricas. Los resultados sugieren que los diferentes tipos de métricas producen información similar para comparar individuos. No obstante, se realizó una clasificación de las métricas que responden a diferencias entre los núcleos de las expresiones de cálculo.

Reference list of sources used for two experimental data files dataBSRN and dataMixed

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Increasing amounts of data is collected in most areas of research and application. The degree to which this data can be accessed, analyzed, and retrieved, is a decisive in obtaining progress in fields such as scientific research or industrial production. We present a novel methodology supporting content-based retrieval and exploratory search in repositories of multivariate research data. In particular, our methods are able to describe two-dimensional functional dependencies in research data, e.g. the relationship between ination and unemployment in economics. Our basic idea is to use feature vectors based on the goodness-of-fit of a set of regression models to describe the data mathematically. We denote this approach Regressional Features and use it for content-based search and, since our approach motivates an intuitive definition of interestingness, for exploring the most interesting data. We apply our method on considerable real-world research datasets, showing the usefulness of our approach for user-centered access to research data in a Digital Library system.

Fatores associados à integridade perineal e à episiotomia no parto normal: estudo transversal

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Introdução: Investigar os fatores associados à condição perineal no parto vaginal pode possibilitar modificações no cuidado com o períneo, de forma a contribuir para menores frequências de episiotomia e de lacerações perineais. Objetivos: Identificar os fatores associados à episiotomia; identificar os fatores associados à integridade perineal no parto vaginal; descrever os motivos apontados para a realização de episiotomia por enfermeiras obstétricas; e identificar as manobras de proteção perineal realizadas por enfermeiras obstétricas em um Centro de Parto Normal. Método: Estudo transversal com coleta de dados prospectiva por meio de formulário aplicado junto às enfermeiras obstétricas de um Centro de Parto Normal intra-hospitalar de São Paulo e que incluiu dados de todas as mulheres que deram à luz neste serviço no período de fevereiro de 2014 a janeiro de 2015. Na análise estatística, as associações entre as variáveis dependentes (episiotomia e integridade perineal) e as variáveis sociodemográficas, obstétricas e assistenciais foram estimadas por meio de Odds Ratios (OR), calculadas por meio de regressão logística binária univariada e múltipla com intervalos de confiança de 95 por cento (IC 95 por cento ), no programa estatístico SPSS versão 20. Foram realizadas análises separadas para cada variável dependente. Os motivos para a realização de episiotomia e o uso de manobras de proteção perineal foram descritos por meio de frequências e porcentagens. O estudo foi aprovado nos Comitês de Ética em Pesquisa das instituições proponente e coparticipante. Resultados: Foram analisados os dados de 802 mulheres (frequência de episiotomia de 23,8 por cento , 191 mulheres; integridade perineal de 25,9 por cento , 208 mulheres; laceração perineal de 50,3 por cento , 403 mulheres). Os fatores independentemente associados à episiotomia foram: não ter parto vaginal anterior (OR 26,72; IC 95 por cento 15,42-46,30), uso de ocitocina durante o trabalho de parto (OR 1,69; IC 95 por cento 1,12-2,57), puxos dirigidos (OR 2,05; IC 95 por cento 1,23-3,43), intercorrência no trabalho de parto (OR 2,61; IC 95 por cento 1,43-4,77) e posição semissentada no parto (5,45; IC 95 por cento 1,06-28,01). O uso de uma manobra de proteção perineal (OR 0,11; IC 95 por cento 0,04-0,26) ou de duas manobras ou mais (OR 0,09; IC 95 por cento 0,04-0,22) se apresentou como fator de proteção contra a episiotomia. Em relação à integridade perineal, os fatores independentemente associados foram: ter parto vaginal anterior (OR 3,88; IC 95 por cento 2,41-6,23) e cor da pele autorreferida não branca (OR 1,43; IC 95 por cento 1,01-2,04). As indicações para episiotomia incluíram, predominantemente, motivos relacionados às condições e dimensões do períneo. As manobras de proteção perineal foram utilizadas em aproximadamente 95 por cento dos partos vaginais, mas não impactaram as taxas de integridade perineal. Conclusões: As variáveis associadas à episiotomia incluíram, em sua maioria, fatores que podem ser controlados pelo profissional de saúde. Estas variáveis não impactaram as taxas de integridade perineal. Informar os profissionais que atuam na assistência ao parto e as mulheres que buscam esse atendimento sobre os fatores associados à condição perineal no parto vaginal pode contribuir para a redução da frequência de episiotomia e para preservar a integridade perineal no parto vaginal.

Higher-order co-occurrences for exploratory point pattern analysis and decision tree clustering on spatial data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

«
1
2
3
4
5
6
7
8
...
63
64
»