965 resultados para Variable selection
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.
Predictors of adolescent sexual intentions and behavior: Attitudes, parenting, and neighborhood risk
Resumo:
The current study was a cross-sectional examination of data collected during an HIV risk reduction intervention in south Florida. The purpose of the study was to explore the relationships between neighborhood stress, parenting, attitudes, and adolescent sexual intentions and behavior. The Theory of Planned Behavior was used as a model to guide variable selection and propose an interaction pathway between predictors and outcomes. Potential predictor variables measured for adolescents ages 13–18 (n=196) included communication about sex, parent-family connectedness, parental presence, parent-adolescent activity participation, attitudes about sex and condom use, neighborhood disorder, and exposure to violence. Outcomes were behavioral intentions and sexual behavior for the previous eight months. Neighborhood data was supplemented with ZIP Code level data from regional sources and included median household income, percentage of minority and Hispanic residents, and number of foreclosures. Statistical tests included t-tests, Pearson's correlations, and hierarchical linear regressions. Results showed that males and older adolescents reported less positive behavioral intentions than females and adolescents younger than 16. Intentions were associated with condom attitudes, sexual attitudes, and parental presence; unprotected sexual behavior was associated with parental presence. The best fit model for intentions included gender, sexual attitudes, condom attitudes, parental presence, and neighborhood disorder. The unsafe sexual behavior model included whether the participant lived with both natural parents in the previous year, and the percent of Hispanic residents in the neighborhood. Study findings indicate that more research on adolescent sexual behavior is warranted, specifically examining the differentials between variables that affect intentions and those that affect behavior. A focus on gender and age differences during intervention development may allow for better targeting and more efficacious interventions. Adding peer and media influences to the framework of attitudes, parenting, and neighborhood may offer more insight into patterns of adolescent sexual behavior risk.
Resumo:
In longitudinal data analysis, our primary interest is in the regression parameters for the marginal expectations of the longitudinal responses; the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly for correlated discrete outcome data. Marginal modeling approaches such as generalized estimating equations (GEEs) have received much attention in the context of longitudinal regression. These methods are based on the estimates of the first two moments of the data and the working correlation structure. The confidence regions and hypothesis tests are based on the asymptotic normality. The methods are sensitive to misspecification of the variance function and the working correlation structure. Because of such misspecifications, the estimates can be inefficient and inconsistent, and inference may give incorrect results. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of a confidence region for the parameter of interest. We extend our approach to variable selection for highdimensional longitudinal data with many covariates. In this situation it is necessary to identify a submodel that adequately represents the data. Including redundant variables may impact the model’s accuracy and efficiency for inference. We propose a penalized empirical likelihood (PEL) variable selection based on GEEs; the variable selection and the estimation of the coefficients are carried out simultaneously. We discuss its characteristics and asymptotic properties, and present an algorithm for optimizing PEL. Simulation studies show that when the model assumptions are correct, our method performs as well as existing methods, and when the model is misspecified, it has clear advantages. We have applied the method to two case examples.
Resumo:
Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.
Quantificação de açúcares com uma língua eletrónica: calibração multivariada com seleção de sensores
Resumo:
Este trabalho incide na análise dos açúcares majoritários nos alimentos (glucose, frutose e sacarose) com uma língua eletrónica potenciométrica através de calibração multivariada com seleção de sensores. A análise destes compostos permite contribuir para a avaliação do impacto dos açúcares na saúde e seu efeito fisiológico, além de permitir relacionar atributos sensoriais e atuar no controlo de qualidade e autenticidade dos alimentos. Embora existam diversas metodologias analíticas usadas rotineiramente na identificação e quantificação dos açúcares nos alimentos, em geral, estes métodos apresentam diversas desvantagens, tais como lentidão das análises, consumo elevado de reagentes químicos e necessidade de pré-tratamentos destrutivos das amostras. Por isso se decidiu aplicar uma língua eletrónica potenciométrica, construída com sensores poliméricos selecionados considerando as sensibilidades aos açucares obtidas em trabalhos anteriores, na análise dos açúcares nos alimentos, visando estabelecer uma metodologia analítica e procedimentos matemáticos para quantificação destes compostos. Para este propósito foram realizadas análises em soluções padrão de misturas ternárias dos açúcares em diferentes níveis de concentração e em soluções de dissoluções de amostras de mel, que foram previamente analisadas em HPLC para se determinar as concentrações de referência dos açúcares. Foi então feita uma análise exploratória dos dados visando-se remover sensores ou observações discordantes através da realização de uma análise de componentes principais. Em seguida, foram construídos modelos de regressão linear múltipla com seleção de variáveis usando o algoritmo stepwise e foi verificado que embora fosse possível estabelecer uma boa relação entre as respostas dos sensores e as concentrações dos açúcares, os modelos não apresentavam desempenho de previsão satisfatório em dados de grupo de teste. Dessa forma, visando contornar este problema, novas abordagens foram testadas através da construção e otimização dos parâmetros de um algoritmo genético para seleção de variáveis que pudesse ser aplicado às diversas ferramentas de regressão, entre elas a regressão pelo método dos mínimos quadrados parciais. Foram obtidos bons resultados de previsão para os modelos obtidos com o método dos mínimos quadrados parciais aliado ao algoritmo genético, tanto para as soluções padrão quanto para as soluções de mel, com R²ajustado acima de 0,99 e RMSE inferior a 0,5 obtidos da relação linear entre os valores previstos e experimentais usando dados dos grupos de teste. O sistema de multi-sensores construído se mostrou uma ferramenta adequada para a análise dos iii açúcares, quando presentes em concentrações maioritárias, e alternativa a métodos instrumentais de referência, como o HPLC, por reduzir o tempo da análise e o valor monetário da análise, bem como, ter um preparo mínimo das amostras e eliminar produtos finais poluentes.
Predictors of Adolescent Sexual Intentions and Behavior: Attitudes, Parenting, and Neighborhood Risk
Resumo:
The current study was a cross-sectional examination of data collected during an HIV risk reduction intervention in south Florida. The purpose of the study was to explore the relationships between neighborhood stress, parenting, attitudes, and adolescent sexual intentions and behavior. The Theory of Planned Behavior was used as a model to guide variable selection and propose an interaction pathway between predictors and outcomes. Potential predictor variables measured for adolescents ages 13-18 (n=196) included communication about sex, parent-family connectedness, parental presence, parent-adolescent activity participation, attitudes about sex and condom use, neighborhood disorder, and exposure to violence. Outcomes were behavioral intentions and sexual behavior for the previous eight months. Neighborhood data was supplemented with ZIP Code level data from regional sources and included median household income, percentage of minority and Hispanic residents, and number of foreclosures. Statistical tests included t-tests, Pearson’s correlations, and hierarchical linear regressions. Results showed that males and older adolescents reported less positive behavioral intentions than females and adolescents younger than 16. Intentions were associated with condom attitudes, sexual attitudes, and parental presence; unprotected sexual behavior was associated with parental presence. The best fit model for intentions included gender, sexual attitudes, condom attitudes, parental presence, and neighborhood disorder. The unsafe sexual behavior model included whether the participant lived with both natural parents in the previous year, and the percent of Hispanic residents in the neighborhood. Study findings indicate that more research on adolescent sexual behavior is warranted, specifically examining the differentials between variables that affect intentions and those that affect behavior. A focus on gender and age differences during intervention development may allow for better targeting and more efficacious interventions. Adding peer and media influences to the framework of attitudes, parenting, and neighborhood may offer more insight into patterns of adolescent sexual behavior risk.
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
Many wireless applications demand a fast mechanism to detect the packet from a node with the highest priority ("best node") only, while packets from nodes with lower priority are irrelevant. In this paper, we introduce an extremely fast contention-based multiple access algorithm that selects the best node and requires only local information of the priorities of the nodes. The algorithm, which we call Variable Power Multiple Access Selection (VP-MAS), uses the local channel state information from the accessing nodes to the receiver, and maps the priorities onto the receive power. It is based on a key result that shows that mapping onto a set of discrete receive power levels is optimal, when the power levels are chosen to exploit packet capture that inherently occurs in a wireless physical layer. The VP-MAS algorithm adjusts the expected number of users that contend in each step and their respective transmission powers, depending on whether previous transmission attempts resulted in capture, idle channel, or collision. We also show how reliable information regarding the total received power at the receiver can be used to improve the algorithm by enhancing the feedback mechanism. The algorithm detects the packet from the best node in 1.5 to 2.1 slots, which is considerably lower than the 2.43 slot average achieved by the best algorithm known to date.
Resumo:
In many wireless applications, it is highly desirable to have a fast mechanism to resolve or select the packet from the user with the highest priority. Furthermore, individual priorities are often known only locally at the users. In this paper we introduce an extremely fast, local-information-based multiple access algorithm that selects the best node in 1.8 to 2.1 slots,which is much lower than the 2.43 slot average achieved by the best algorithm known to date. The algorithm, which we call Variable Power Multiple Access Selection (VP-MAS), uses the local channel state information from the accessing nodes to the receiver, and maps the priorities into the receive power.It is inherently distributed and scales well with the number of users. We show that mapping onto a discrete set of receive power levels is optimal, and provide a complete characterization for it. The power levels are chosen to exploit packet capture that inherently occurs in a wireless physical layer. The VP-MAS algorithm adjusts the expected number of users that contend in each step and their respective transmission powers, depending on whether previous transmission attempts resulted in capture,idle channel, or collision.
Resumo:
The synovial membrane (SM) of affected joints in ankylosing spondylitis (AS) is infiltrated by germinal center-like aggregates (foci) of lymphocytes similar to rheumatoid arthritis (RA). We characterized the rearranged heavy chain variable segment (VH) genes in the SM for gene usage and the mutational pattern to elucidate the B lymphocyte involvement in AS.
Resumo:
Purpose: Data from two randomized phase III trials were analyzed to evaluate prognostic factors and treatment selection in the first-line management of advanced non-small cell lung cancer patients with performance status (PS) 2. Patients and Methods: Patients randomized to combination chemotherapy (carboplatin and paclitaxel) in one trial and single-agent therapy (gemcitabine or vinorelbine) in the second were included in these analyses. Both studies had identical eligibility criteria and were conducted simultaneously. Comparison of efficacy and safety was performed between the two cohorts. A regression analysis identified prognostic factors and subgroups of patients that may benefit from combination or single-agent therapy. Results: Two hundred one patients were treated with combination and 190 with single-agent therapy. Objective responses were 37 and 15%, respectively. Median time to progression was 4.6 months in the combination arm and 3.5 months in the single-agent arm (p < 0.001). Median survival imes were 8.0 and 6.6 months, and 1-year survival rates were 31 and 26%, respectively. Albumin <3.5 g, extrathoracic metastases, lactate dehydrogenase ≥200 IU, and 2 comorbid conditions predicted outcome. Patients with 0-2 risk factors had similar outcomes independent of treatment, whereas patients with 3-4 factors had a nonsignificant improvement in median survival with combination chemotherapy. Conclusion: Our results show that PS2 non-small cell lung cancer patients are a heterogeneous group who have significantly different outcomes. Patients treated with first-line combination chemotherapy had a higher response and longer time to progression, whereas overall survival did not appear significantly different. A prognostic model may be helpful in selecting PS 2 patients for either treatment strategy. © 2009 by the International Association for the Study of Lung Cancer.
Resumo:
Ureaplasmas are the microorganisms most frequently isolated from the amniotic fluid of pregnant women and can cause chronic intrauterine infections. These tiny bacteria are thought to undergo rapid evolution and exhibit a hypermutatable phenotype; however, little is known about how ureaplasmas respond to selective pressures in utero. Using an ovine model of chronic intra-amniotic infection, we investigated if exposure of ureaplasmas to sub-inhibitory concentrations of erythromycin could induce phenotypic or genetic indicators of macrolide resistance. At 55 days gestation, 12 pregnant ewes received an intra-amniotic injection of a non-clonal, clinical U. parvum strain, followed by: (i) erythromycin treatment (IM, 30 mg/kg/day, n=6); or (ii) saline (IM, n=6) at 100 days gestation. Fetuses were then delivered surgically at 125 days gestation. Despite injecting the same inoculum into all ewes, significant differences between amniotic fluid and chorioamnion ureaplasmas were detected following chronic intra-amniotic infection. Numerous polymorphisms were observed in domain V of the 23S rRNA gene of ureaplasmas isolated from the chorioamnion (but not the amniotic fluid), resulting in a mosaic-like sequence. Chorioamnion isolates also harboured the macrolide resistance genes erm(B) and msr(D) and were associated with variable roxithromycin minimum inhibitory concentrations. Remarkably, this variability occurred independently of exposure of ureaplasmas to erythromycin, suggesting that low-level erythromycin exposure does not induce ureaplasmal macrolide resistance in utero. Rather, the significant differences observed between amniotic fluid and chorioamnion ureaplasmas suggest that different anatomical sites may select for ureaplasma sub-types within non-clonal, clinical strains. This may have implications for the treatment of intrauterine ureaplasma infections.