978 resultados para VARIABLE SELECTION


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In longitudinal data analysis, our primary interest is in the regression parameters for the marginal expectations of the longitudinal responses; the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly for correlated discrete outcome data. Marginal modeling approaches such as generalized estimating equations (GEEs) have received much attention in the context of longitudinal regression. These methods are based on the estimates of the first two moments of the data and the working correlation structure. The confidence regions and hypothesis tests are based on the asymptotic normality. The methods are sensitive to misspecification of the variance function and the working correlation structure. Because of such misspecifications, the estimates can be inefficient and inconsistent, and inference may give incorrect results. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of a confidence region for the parameter of interest. We extend our approach to variable selection for highdimensional longitudinal data with many covariates. In this situation it is necessary to identify a submodel that adequately represents the data. Including redundant variables may impact the model’s accuracy and efficiency for inference. We propose a penalized empirical likelihood (PEL) variable selection based on GEEs; the variable selection and the estimation of the coefficients are carried out simultaneously. We discuss its characteristics and asymptotic properties, and present an algorithm for optimizing PEL. Simulation studies show that when the model assumptions are correct, our method performs as well as existing methods, and when the model is misspecified, it has clear advantages. We have applied the method to two case examples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Este trabalho incide na análise dos açúcares majoritários nos alimentos (glucose, frutose e sacarose) com uma língua eletrónica potenciométrica através de calibração multivariada com seleção de sensores. A análise destes compostos permite contribuir para a avaliação do impacto dos açúcares na saúde e seu efeito fisiológico, além de permitir relacionar atributos sensoriais e atuar no controlo de qualidade e autenticidade dos alimentos. Embora existam diversas metodologias analíticas usadas rotineiramente na identificação e quantificação dos açúcares nos alimentos, em geral, estes métodos apresentam diversas desvantagens, tais como lentidão das análises, consumo elevado de reagentes químicos e necessidade de pré-tratamentos destrutivos das amostras. Por isso se decidiu aplicar uma língua eletrónica potenciométrica, construída com sensores poliméricos selecionados considerando as sensibilidades aos açucares obtidas em trabalhos anteriores, na análise dos açúcares nos alimentos, visando estabelecer uma metodologia analítica e procedimentos matemáticos para quantificação destes compostos. Para este propósito foram realizadas análises em soluções padrão de misturas ternárias dos açúcares em diferentes níveis de concentração e em soluções de dissoluções de amostras de mel, que foram previamente analisadas em HPLC para se determinar as concentrações de referência dos açúcares. Foi então feita uma análise exploratória dos dados visando-se remover sensores ou observações discordantes através da realização de uma análise de componentes principais. Em seguida, foram construídos modelos de regressão linear múltipla com seleção de variáveis usando o algoritmo stepwise e foi verificado que embora fosse possível estabelecer uma boa relação entre as respostas dos sensores e as concentrações dos açúcares, os modelos não apresentavam desempenho de previsão satisfatório em dados de grupo de teste. Dessa forma, visando contornar este problema, novas abordagens foram testadas através da construção e otimização dos parâmetros de um algoritmo genético para seleção de variáveis que pudesse ser aplicado às diversas ferramentas de regressão, entre elas a regressão pelo método dos mínimos quadrados parciais. Foram obtidos bons resultados de previsão para os modelos obtidos com o método dos mínimos quadrados parciais aliado ao algoritmo genético, tanto para as soluções padrão quanto para as soluções de mel, com R²ajustado acima de 0,99 e RMSE inferior a 0,5 obtidos da relação linear entre os valores previstos e experimentais usando dados dos grupos de teste. O sistema de multi-sensores construído se mostrou uma ferramenta adequada para a análise dos iii açúcares, quando presentes em concentrações maioritárias, e alternativa a métodos instrumentais de referência, como o HPLC, por reduzir o tempo da análise e o valor monetário da análise, bem como, ter um preparo mínimo das amostras e eliminar produtos finais poluentes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The current study was a cross-sectional examination of data collected during an HIV risk reduction intervention in south Florida. The purpose of the study was to explore the relationships between neighborhood stress, parenting, attitudes, and adolescent sexual intentions and behavior. The Theory of Planned Behavior was used as a model to guide variable selection and propose an interaction pathway between predictors and outcomes. Potential predictor variables measured for adolescents ages 13-18 (n=196) included communication about sex, parent-family connectedness, parental presence, parent-adolescent activity participation, attitudes about sex and condom use, neighborhood disorder, and exposure to violence. Outcomes were behavioral intentions and sexual behavior for the previous eight months. Neighborhood data was supplemented with ZIP Code level data from regional sources and included median household income, percentage of minority and Hispanic residents, and number of foreclosures. Statistical tests included t-tests, Pearson’s correlations, and hierarchical linear regressions. Results showed that males and older adolescents reported less positive behavioral intentions than females and adolescents younger than 16. Intentions were associated with condom attitudes, sexual attitudes, and parental presence; unprotected sexual behavior was associated with parental presence. The best fit model for intentions included gender, sexual attitudes, condom attitudes, parental presence, and neighborhood disorder. The unsafe sexual behavior model included whether the participant lived with both natural parents in the previous year, and the percent of Hispanic residents in the neighborhood. Study findings indicate that more research on adolescent sexual behavior is warranted, specifically examining the differentials between variables that affect intentions and those that affect behavior. A focus on gender and age differences during intervention development may allow for better targeting and more efficacious interventions. Adding peer and media influences to the framework of attitudes, parenting, and neighborhood may offer more insight into patterns of adolescent sexual behavior risk.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Combinatorial optimization problems are typically tackled by the branch-and-bound paradigm. We propose to learn a variable selection policy for branch-and-bound in mixed-integer linear programming, by imitation learning on a diversified variant of the strong branching expert rule. We encode states as bipartite graphs and parameterize the policy as a graph convolutional neural network. Experiments on a series of synthetic problems demonstrate that our approach produces policies that can improve upon expert-designed branching rules on large problems, and generalize to instances significantly larger than seen during training.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The synovial membrane (SM) of affected joints in ankylosing spondylitis (AS) is infiltrated by germinal center-like aggregates (foci) of lymphocytes similar to rheumatoid arthritis (RA). We characterized the rearranged heavy chain variable segment (VH) genes in the SM for gene usage and the mutational pattern to elucidate the B lymphocyte involvement in AS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Plasmodium vivax malaria is a major public health challenge in Latin America, Asia and Oceania, with 130-435 million clinical cases per year worldwide. Invasion of host blood cells by P. vivax mainly depends on a type I membrane protein called Duffy binding protein (PvDBP). The erythrocyte-binding motif of PvDBP is a 170 amino-acid stretch located in its cysteine-rich region II (PvDBP(II)), which is the most variable segment of the protein. Methods: To test whether diversifying natural selection has shaped the nucleotide diversity of PvDBP(II) in Brazilian populations, this region was sequenced in 122 isolates from six different geographic areas. A Bayesian method was applied to test for the action of natural selection under a population genetic model that incorporates recombination. The analysis was integrated with a structural model of PvDBP(II), and T-and B-cell epitopes were localized on the 3-D structure. Results: The results suggest that: (i) recombination plays an important role in determining the haplotype structure of PvDBP(II), and (ii) PvDBP(II) appears to contain neutrally evolving codons as well as codons evolving under natural selection. Diversifying selection preferentially acts on sites identified as epitopes, particularly on amino acid residues 417, 419, and 424, which show strong linkage disequilibrium. Conclusions: This study shows that some polymorphisms of PvDBP(II) are present near the erythrocyte-binding domain and might serve to elude antibodies that inhibit cell invasion. Therefore, these polymorphisms should be taken into account when designing vaccines aimed at eliciting antibodies to inhibit erythrocyte invasion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Context tree models have been introduced by Rissanen in [25] as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper investigates non-asymptotic properties of two popular procedures of context tree estimation: Rissanen's algorithm Context and penalized maximum likelihood. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory conditions are required: the proof relies on new deviation inequalities for empirical probabilities of independent interest. The under-estimation properties rely on classical hypotheses for processes of infinite memory. These results improve on and generalize the bounds obtained in Duarte et al. (2006) [12], Galves et al. (2008) [18], Galves and Leonardi (2008) [17], Leonardi (2010) [22], refining asymptotic results of Buhlmann and Wyner (1999) [4] and Csiszar and Talata (2006) [9]. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

H-1 NMR spectra of the thyroid hormone thyroxine recorded at low temperature and high field show splitting into two peaks of the resonance due to the H2,6 protons of the inner (tyrosyl) ring. A single resonance is observed in 600 MHz spectra at temperatures above 185 K. An analysis of the line shape as a function of temperature shows that the coalescence phenomenon is due to an exchange process with a barrier of 37 kJ mol(-1). This is identical to the barrier for coalescence of the H2',6' protons of the outer (phenolic) ring reported previously for the thyroid hormones and their analogues. It is proposed that the separate peaks at low temperature are due to resonances for H2,6 in cisoid and transoid conformers which are populated in approximately equal populations. These two peaks are averaged resonances for the individual H2 and H6 protons. Conversion of cisoid to transoid forms can occur via rotation of either the alanyl side chain or the outer ring, from one face of the inner ring to the other. It is proposed that the latter process is the one responsible for the observed coalescence phenomenon. The barrier to rotation of the alanyl side chain is greater than or equal to 37 kJ mol(-1), which is significantly larger than has previously been reported for Csp(2)-Csp(3) bonds in other Ph-CH2-X systems. The recent crystal structure of a hormone agonist bound to the ligand-binding domain of the rat thyroid hormone receptor (Wagner et al. Nature 1995, 378, 690-697) shows the transoid form to be the bound conformation. The significant energy barrier to cisoid/transoid interconversion determined in the current study combined with the tight fit of the hormone to its receptor suggests that interconversion between the forms cannot occur at the receptor site but that selection for the preferred bound form occurs from the 50% population of the transoid form in solution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper considers the instrumental variable regression model when there is uncertainty about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainty can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very exible and can be easily adapted to analyze any of the di¤erent priors that have been proposed in the Bayesian instrumental variables literature. We show how to calculate the probability of any relevant restriction (e.g. the posterior probability that over-identifying restrictions hold) and discuss diagnostic checking using the posterior distribution of discrepancy vectors. We illustrate our methods in a returns-to-schooling application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The availability of rich firm-level data sets has recently led researchers to uncover new evidence on the effects of trade liberalization. First, trade openness forces the least productive firms to exit the market. Secondly, it induces surviving firms to increase their innovation efforts and thirdly, it increases the degree of product market competition. In this paper we propose a model aimed at providing a coherent interpretation of these findings. We introducing firm heterogeneity into an innovation-driven growth model, where incumbent firms operating in oligopolistic industries perform cost-reducing innovations. In this framework, trade liberalization leads to higher product market competition, lower markups and higher quantity produced. These changes in markups and quantities, in turn, promote innovation and productivity growth through a direct competition effect, based on the increase in the size of the market, and a selection effect, produced by the reallocation of resources towards more productive firms. Calibrated to match US aggregate and firm-level statistics, the model predicts that a 10 percent reduction in variable trade costs reduces markups by 1:15 percent, firm surviving probabilities by 1 percent, and induces an increase in productivity growth of about 13 percent. More than 90 percent of the trade-induced growth increase can be attributed to the selection effect.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is generally accepted that most plant populations are locally adapted. Yet, understanding how environmental forces give rise to adaptive genetic variation is a challenge in conservation genetics and crucial to the preservation of species under rapidly changing climatic conditions. Environmental variation, phylogeographic history, and population demographic processes all contribute to spatially structured genetic variation, however few current models attempt to separate these confounding effects. To illustrate the benefits of using a spatially-explicit model for identifying potentially adaptive loci, we compared outlier locus detection methods with a recently-developed landscape genetic approach. We analyzed 157 loci from samples of the alpine herb Gentiana nivalis collected across the European Alps. Principle coordinates of neighbor matrices (PCNM), eigenvectors that quantify multi-scale spatial variation present in a data set, were incorporated into a landscape genetic approach relating AFLP frequencies with 23 environmental variables. Four major findings emerged. 1) Fifteen loci were significantly correlated with at least one predictor variable (R (adj) (2) > 0.5). 2) Models including PCNM variables identified eight more potentially adaptive loci than models run without spatial variables. 3) When compared to outlier detection methods, the landscape genetic approach detected four of the same loci plus 11 additional loci. 4) Temperature, precipitation, and solar radiation were the three major environmental factors driving potentially adaptive genetic variation in G. nivalis. Techniques presented in this paper offer an efficient method for identifying potentially adaptive genetic variation and associated environmental forces of selection, providing an important step forward for the conservation of non-model species under global change.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two populations of the wasp Trypoxylon rogenhoferi Kohl, 1884 from São Carlos and Luís Antônio, State of São Paulo, Brazil, were observed and sampled from May 1999 to February 2001 using trap-nests. This mass-provisioning wasp was used to test some aspects of optimal sex allocation theory. Both populations fit all the predictions of the models of Green and Brockmann and Grafen. Maternal provisions determined the size of each offspring, and females allocated well-stocked brood cells to daughters, the sex that benefits most being large. This strategy resulted in a difference in size between the sexes. In São Carlos, female weight at emergence was 1.18 times that of males, in Luís Antônio this value was 1.13. The brood cell volume was correlated with both wing length and weight at emergence in both sexes, and the chance that a given brood cell contained a male offspring decreased with increased brood cell volume. In T. rogenhoferi female body size was related to fitness. Larger females were able to collect more mass of spiders per day, the spiders they captured were heavier, and they provisioned more brood cells per day. They also produced larger daughters. For males, no relationship between body size and fitness was found, but the data were scarce. Since the patterns of provisioning were variable among different females in both study sites, it is possible that the females not follow a unique strategy for sex allocation. The sex ratio and/or investment ratio in the São Carlos population was female-biased and in Luís Antônio, male-biased. In spite of the influence of trap-nests diameters on male production in Luís Antônio, there is some evidence that in São Carlos population the local availability of prey and/or lower rate of parasitism may be major forces in determining the observed sex ratio, but further studies are necessary to verify such hypothesis.