932 resultados para combinatorial protocol in multiple linear regressions
Resumo:
The HIV-1 RT inhibitory activity of 2-(2,6-dihalophenyl)-3-(substituted pyridin-2-yl)-thiazolidin-4-ones has been analyzed with different topological descriptors obtained from DRAGON software. Here, simple topological descriptors (TOPO), Galvez topological charge indices (GVZ) and 2D autocorrelation descriptors (2DAUTO) have been found to yield good predictive models for the activity of these compounds. The correlations obtained from the TOPO class descriptors suggest that less extended or compact saturated structural templates would be better for the activity. The participating GVZ class descriptors suggest that they have same degree of influence on the activity. In 2DAUTO class, the large participation of descriptors of lags seven and three indicate the association of activity information with the seven and three centered structural fragments of these compounds. The physicochemical weighting components of these descriptors suggest homogeneous influence of mass, volume, electronegativity and/ or polarizability on the activity.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.
Resumo:
We present an exact test for whether two random variables that have known bounds on their support are negatively correlated. The alternative hypothesis is that they are not negatively correlated. No assumptions are made on the underlying distributions. We show by example that the Spearman rank correlation test as the competing exact test of correlation in nonparametric settings rests on an additional assumption on the data generating process without which it is not valid as a test for correlation.We then show how to test for the significance of the slope in a linear regression analysis that invovles a single independent variable and where outcomes of the dependent variable belong to a known bounded set.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
The human immunodeficiency virus-1 reverse transcriptase inhibitory activity of 2-(2,6-disubstituted phenyl)-3-(substituted pyrimidin-2-yl)-thiazolidin-4-ones have been analyzed using combinatorial protocol in multiple linear regression (CP-MLR) with several electronic and molecular surface area features of the compounds obtained from Molecular Operating Environment (MOE) software. The study has indicated the role of different charged molecular surface areas in modeling the inhibitory activity of the compounds. The derived models collectively suggested that the compounds should be compact without bulky substitutions on its peripheries for better HIV-1 RT inhibitory activity. It also emphasized the necessity of hydrophobicity and compact structural features for their activity. The scope of the descriptors identified for these analogues have been verified by extending the dataset with different 2-(disubstituted phenyl)-3-(substituted pyridin-2-yl)-thiazolidin-4-ones. The joint analysis of extended dataset highlighted the information content of identified descriptors in modeling the HIV-1 RT inhibitory activity of the compounds.
Resumo:
The antimycobacterial activity of nitro/ acetamido alkenol derivatives and chloro/ amino alkenol derivatives has been analyzed through combinatorial protocol in multiple linear regression (CP-MLR) using different topological descriptors obtained from Dragon software. Among the topological descriptor classes considered in the study, the activity is correlated with simple topological descriptors (TOPO) and more complex 2D autocorrelation descriptors (2DAUTO). In model building the descriptors from other classes, that is, empirical, constitutional, molecular walk counts, modified Burden eigenvalues and Galvez topological charge indices have made secondary contribution in association with TOPO and / or 2DAUTO classes. The structure-activity correlations obtained with the TOPO descriptors suggest that less branched and saturated structural templates would be better for the activity. For both the series of compounds, in 2DAUTO the activity has been correlated to the descriptors having mass, volume and/ or polarizability as weighting component. In these two series of compounds, however, the regression coefficients of the descriptors have opposite arithmetic signs with respect to one another. Outwardly these two series of compounds appear very similar. But in terms of activity they belong to different segments of descriptor-activity profiles. This difference in the activity of these two series of compounds may be mainly due to the spacing difference between the C1 (also C6) substituents and rest of the functional groups in them.
Resumo:
Two series of closely related antimalarial agents, 7-chloro-4-(3’,5’-disubstitutedanilino) quinolines, have been analyzed using Combinatorial Protocol in Multiple Linear Regression (CP-MLR) for the structure-activity relations with more than 450 topological descriptors for each set. The study clearly suggested that 3’- and 5’- substituents of the anilino moiety map different domains in the activity space. While one domain favors the compact structural frames having aromatic, heterocyclic ring(s) substituted with closely spaced F, NO2 and O functional groups, the other prefers structural frames enriched with unsaturation, loops, branches, electronic content and devoid of carbonyl function. Also, this study gives an indication in favour of the electron rich centres in the aniline substituent groups for better antimalarial activity; an observation in line with several of the previous reports too. The models developed and the participating descriptors suggest that the substituent groups of the 4-anilino moiety of the 4-(3’, 5’-disubstitutedanilino)quinolines hold scope for further modification in the optimisation of the antimalarial activity.
Resumo:
2002 Mathematics Subject Classification: 62J05, 62G35.
Resumo:
Interaction effect is an important scientific interest for many areas of research. Common approach for investigating the interaction effect of two continuous covariates on a response variable is through a cross-product term in multiple linear regression. In epidemiological studies, the two-way analysis of variance (ANOVA) type of method has also been utilized to examine the interaction effect by replacing the continuous covariates with their discretized levels. However, the implications of model assumptions of either approach have not been examined and the statistical validation has only focused on the general method, not specifically for the interaction effect.^ In this dissertation, we investigated the validity of both approaches based on the mathematical assumptions for non-skewed data. We showed that linear regression may not be an appropriate model when the interaction effect exists because it implies a highly skewed distribution for the response variable. We also showed that the normality and constant variance assumptions required by ANOVA are not satisfied in the model where the continuous covariates are replaced with their discretized levels. Therefore, naïve application of ANOVA method may lead to an incorrect conclusion. ^ Given the problems identified above, we proposed a novel method modifying from the traditional ANOVA approach to rigorously evaluate the interaction effect. The analytical expression of the interaction effect was derived based on the conditional distribution of the response variable given the discretized continuous covariates. A testing procedure that combines the p-values from each level of the discretized covariates was developed to test the overall significance of the interaction effect. According to the simulation study, the proposed method is more powerful then the least squares regression and the ANOVA method in detecting the interaction effect when data comes from a trivariate normal distribution. The proposed method was applied to a dataset from the National Institute of Neurological Disorders and Stroke (NINDS) tissue plasminogen activator (t-PA) stroke trial, and baseline age-by-weight interaction effect was found significant in predicting the change from baseline in NIHSS at Month-3 among patients received t-PA therapy.^
Resumo:
Pós-graduação em Agronomia (Produção Vegetal) - FCAV
Resumo:
Introdução: O Programa Bolsa Família é a principal estratégia brasileira para amenizar a pobreza e vulnerabilidade social, com diferentes impactos na vida dos beneficiários. O aumento da renda, em função do benefício, poderia trazer resultados positivos na alimentação, uma vez que possibilitam uma maior diversidade da dieta. Porém, poderia trazer resultados negativos como a ingestão excessiva de energia e consequente aumento da adiposidade. As avaliações dos impactos do programa em termos de obesidade e massa gorda de crianças são inexistentes. Objetivo: Avaliar o impacto do Programa Bolsa Família no estado nutricional (IMC/idade) e na composição corporal aos 6 anos de idade entre as crianças da Coorte de Nascimentos de Pelotas (RS), 2004. Métodos: Os dados foram provenientes da integração dos bancos da Coorte de Nascimentos de Pelotas de 2004 e do Cadastro Único do Governo Federal. Foi realizada análise descritiva da cobertura e focalização do programa, com informações do nascimento e dos 6 anos de idade (n=4231). Considerou-se focalização o percentual de elegíveis entre o total de beneficiários e cobertura o percentual de famílias elegíveis que são beneficiárias do programa. Nos modelos de impacto (n=3446), as exposições principais foram o recebimento do benefício: beneficiário em 2010, no período de 2004-2010; o valor médio mensal recebido e o tempo de recebimento. Foram gerados modelos de regressão linear para os desfechos score-Z do índice de massa corporal por idade (IMC/I), percentual e índice de massa gorda (IMG), e percentual e índice de massa livre de gordura (IMLG); e de Poisson, com ajuste robusto, para o desfecho obesidade (score-Z IMC/I 2), todos estratificados por sexo. As informações antropométricas e de composição corporal (BOD POD) foram obtidas do acompanhamento aos 6-7 anos de idade. Potenciais fatores de confusão foram identificados por modelo hierárquico e por um diagrama causal (DAG). Para analisar os impactos foram usadas como medidas de efeito a diferença de médias na regressão linear múltipla (IMC/I, por cento MG, IMG, por cento MLG e IMLG, variáveis contínuas) e a razão de prevalência (obesidade, variável binária). Para permanecer no modelo, considerou-se valor p0,20. A análise dos dados foi realizada por meio do software STATA. Resultados: Entre 2004-2010, a proporção de famílias beneficiárias na coorte aumentou (11 por cento para 34 por cento ) enquanto, de acordo com a renda familiar, a proporção de famílias elegíveis diminui (29 por cento para 16 por cento ). No mesmo período, a cobertura do programa aumentou tanto pela renda familiar quanto pelo IEN. Já a focalização caiu de 78 por cento para 32 por cento de acordo com a renda familiar e, de acordo com o IEN, manteve-se em 37 por cento . A média (não ajustada) de IMC e de MG dos não beneficiários foi superior a dos não beneficiários tanto em meninos quanto em meninas. Meninos do 3º tercil de valor per capita recebido e meninas com menos de 7 meses de benefício em 2010 tiveram IMC maior do que, respectivamente, aqueles dos demais tercis e daquelas com mais de 7 meses de benefício em 2010; esse padrão foi semelhante para obesidade. Meninas não beneficiárias tiveram MG maior do que as beneficiárias e superior também aos meninos, independente de ser beneficiário ou não. Em relação à MLG observou-se um comportamento contrário, no qual meninas beneficiárias tiveram maior MLG, quando comparadas com meninas não beneficiárias e, meninos quando comparados com meninas. Nos modelos de regressão ajustados, não houve diferença significativa entre beneficiários e não beneficiários em nenhum desfecho. Conclusões: De acordo com os resultados, as famílias que receberam maiores valores per capita parecem incluir crianças com maior média de IMC. O programa, nessa análise, parece não ter impacto sobre a composição corporal das crianças, nem em termos de massa gorda, tampouco em termos de massa livre de gordura.
Resumo:
The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response.With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.
Resumo:
In the literature on tests of normality, much concern has been expressed over the problems associated with residual-based procedures. Indeed, the specialized tables of critical points which are needed to perform the tests have been derived for the location-scale model; hence reliance on available significance points in the context of regression models may cause size distortions. We propose a general solution to the problem of controlling the size normality tests for the disturbances of standard linear regression, which is based on using the technique of Monte Carlo tests.
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
The paper considers vector discrete optimization problem with linear fractional functions of criteria on a feasible set that has combinatorial properties of combinations. Structural properties of a feasible solution domain and of Pareto–optimal (efficient), weakly efficient, strictly efficient solution sets are examined. A relation between vector optimization problems on a combinatorial set of combinations and on a continuous feasible set is determined. One possible approach is proposed in order to solve a multicriteria combinatorial problem with linear- fractional functions of criteria on a set of combinations.