59 resultados para Sample selection model
Resumo:
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Aqueous dispersions of the anionic phospholipid dimyristoyl phosphatidylglycerol (DMPG) at pH above the apparent pK of DMPG and concentrations in the interval 70-300 mM have been investigated by small (SAXS) and wide-angle X-ray scattering, differential scanning calorimetry, and polarized optical microscopy. The order. disorder transition of the hydrocarbon chains occurs along an interval of about 10 degrees C (between T(m)(on) similar to 20 degrees C and T(m)(off) similar to 30 degrees C). Such melting regime was previously characterized at lower concentrations, up to 70 mM DMPG, when sample transparency was correlated with the presence of pores across the bilayer. At higher concentrations considered here, the melting regime persists but is not transparent. Defined SAXS peaks appear and a new lamellar phase L(p) with pores is proposed to exist above 70 mM DMPG, starting at similar to 23 degrees C (similar to 3 degrees C above T(m)(on)) and losing correlation after T(m)(off). A new model for describing the X-ray scattering of bilayers with pores, presented here, is able to explain the broad band attributed to in-plane correlation between pores. The majority of cell membranes have a net negative charge, and the opening of pores across the membrane tuned by ionic strength, temperature, and lipid composition is likely to have biological relevance.
Resumo:
The Birnbaum-Saunders distribution has been used quite effectively to model times to failure for materials subject to fatigue and for modeling lifetime data. In this paper we obtain asymptotic expansions, up to order n(-1/2) and under a sequence of Pitman alternatives, for the non-null distribution functions of the likelihood ratio, Wald, score and gradient test statistics in the Birnbaum-Saunders regression model. The asymptotic distributions of all four statistics are obtained for testing a subset of regression parameters and for testing the shape parameter. Monte Carlo simulation is presented in order to compare the finite-sample performance of these tests. We also present two empirical applications. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The Birnbaum-Saunders regression model is becoming increasingly popular in lifetime analyses and reliability studies. In this model, the signed likelihood ratio statistic provides the basis for testing inference and construction of confidence limits for a single parameter of interest. We focus on the small sample case, where the standard normal distribution gives a poor approximation to the true distribution of the statistic. We derive three adjusted signed likelihood ratio statistics that lead to very accurate inference even for very small samples. Two empirical applications are presented. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We consider consider the problem of dichotomizing a continuous covariate when performing a regression analysis based on a generalized estimation approach. The problem involves estimation of the cutpoint for the covariate and testing the hypothesis that the binary covariate constructed from the continuous covariate has a significant impact on the outcome. Due to the multiple testing used to find the optimal cutpoint, we need to make an adjustment to the usual significance test to preserve the type-I error rates. We illustrate the techniques on one data set of patients given unrelated hematopoietic stem cell transplantation. Here the question is whether the CD34 cell dose given to patient affects the outcome of the transplant and what is the smallest cell dose which is needed for good outcomes. (C) 2010 Elsevier BM. All rights reserved.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is interest in studying latent variables. These latent variables are directly considered in the Item Response Models (IRM) and they are usually called latent traits. A usual assumption for parameter estimation of the IRM, considering one group of examinees, is to assume that the latent traits are random variables which follow a standard normal distribution. However, many works suggest that this assumption does not apply in many cases. Furthermore, when this assumption does not hold, the parameter estimates tend to be biased and misleading inference can be obtained. Therefore, it is important to model the distribution of the latent traits properly. In this paper we present an alternative latent traits modeling based on the so-called skew-normal distribution; see Genton (2004). We used the centred parameterization, which was proposed by Azzalini (1985). This approach ensures the model identifiability as pointed out by Azevedo et al. (2009b). Also, a Metropolis Hastings within Gibbs sampling (MHWGS) algorithm was built for parameter estimation by using an augmented data approach. A simulation study was performed in order to assess the parameter recovery in the proposed model and the estimation method, and the effect of the asymmetry level of the latent traits distribution on the parameter estimation. Also, a comparison of our approach with other estimation methods (which consider the assumption of symmetric normality for the latent traits distribution) was considered. The results indicated that our proposed algorithm recovers properly all parameters. Specifically, the greater the asymmetry level, the better the performance of our approach compared with other approaches, mainly in the presence of small sample sizes (number of examinees). Furthermore, we analyzed a real data set which presents indication of asymmetry concerning the latent traits distribution. The results obtained by using our approach confirmed the presence of strong negative asymmetry of the latent traits distribution. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this article, we deal with the issue of performing accurate small-sample inference in the Birnbaum-Saunders regression model, which can be useful for modeling lifetime or reliability data. We derive a Bartlett-type correction for the score test and numerically compare the corrected test with the usual score test and some other competitors.
Resumo:
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.
Resumo:
In chemical analyses performed by laboratories, one faces the problem of determining the concentration of a chemical element in a sample. In practice, one deals with the problem using the so-called linear calibration model, which considers that the errors associated with the independent variables are negligible compared with the former variable. In this work, a new linear calibration model is proposed assuming that the independent variables are subject to heteroscedastic measurement errors. A simulation study is carried out in order to verify some properties of the estimators derived for the new model and it is also considered the usual calibration model to compare it with the new approach. Three applications are considered to verify the performance of the new approach. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
We analyse the finite-sample behaviour of two second-order bias-corrected alternatives to the maximum-likelihood estimator of the parameters in a multivariate normal regression model with general parametrization proposed by Patriota and Lemonte [A. G. Patriota and A. J. Lemonte, Bias correction in a multivariate regression model with genereal parameterization, Stat. Prob. Lett. 79 (2009), pp. 1655-1662]. The two finite-sample corrections we consider are the conventional second-order bias-corrected estimator and the bootstrap bias correction. We present the numerical results comparing the performance of these estimators. Our results reveal that analytical bias correction outperforms numerical bias corrections obtained from bootstrapping schemes.
Resumo:
Chagas disease, caused by the protozoan Trypanosoma cruzi, is one of the most serious amongst the so-called neglected diseases in Latin America, specially in Brazil. So far there has been no effective treatment for the chronic phase of this disease. Cruzain is a major cysteine protease of T cruzi and it is recognized as a valid target for Chagas disease chemotherapy. The mechanism of cruzain action is associated with the nucleophilic attack of an activated sulfur atom towards electrophilic groups. In this report, features of a putative pharmacophore model of the enzyme, developed as a virtual screening tool for the selection of potential cruzain inhibitors, are described. The final proposed model was applied to the ZINC v.7 database and afterwards experimentally validated by an enzymatic inhibition assay. One of the compounds selected by the model showed cruzain inhibition in the low micromolar range.
Resumo:
There is a need of scientific evidence of claimed nutraceutical effects, but also there is a social movement towards the use of natural products and among them algae are seen as rich resources. Within this scenario, the development of methodology for rapid and reliable assessment of markers of efficiency and security of these extracts is necessary. The rat treated with streptozotocin has been proposed as the most appropriate model of systemic oxidative stress for studying antioxidant therapies. Cystoseira is a brown alga containing fucoxanthin and other carothenes whose pressure-assisted extracts were assayed to discover a possible beneficial effect on complications related to diabetes evolution in an acute but short-term model. Urine was selected as the sample and CE-TOF-MS as the analytical technique to obtain the fingerprints in a non-target metabolomic approach. Multivariate data analysis revealed a good clustering of the groups and permitted the putative assignment of compounds statistically significant in the classification. Interestingly a group of compounds associated to lysine glycation and cleavage from proteins was found to be increased in diabetic animals receiving vehicle as compared to control animals receiving vehicle (N6, N6, N6-trimethyl-L-lysine, N-methylnicotinamide, galactosylhydroxylysine, L-carnitine, N6-acetyl-N6-hydroxylysine, fructose-lysine, pipecolic acid, urocanic acid, amino-isobutanoate, formylisoglutamine. Fructoselysine significantly decreased after the treatment changing from a 24% increase to a 19% decrease. CE-MS fingerprinting of urine has provided a group of compounds different to those detected with other techniques and therefore proves the necessity of a cross-platform analysis to obtain a broad view of biological samples.
Resumo:
The possibility to compress analyte bands at the beginning of CE runs has many advantages. Analytes at low concentration can be analyzed with high signal-to-noise ratios by using the so-called sample stacking methods. Moreover, sample injections with very narrow initial band widths (small initial standard deviations) are sometimes useful, especially if high resolutions among the bands are required in the shortest run time. In the present work, a method of sample stacking is proposed and demonstrated. It is based on BGEs with high thermal sensitive pHs (high dpH/dT) and analytes with low dpK(a)/dT. High thermal sensitivity means that the working pK(a) of the BGE has a high dpK(a)/dT in modulus. For instance, Tris and Ethanolamine have dpH/dT = -0.028/degrees C and -0.029/degrees C, respectively, whereas carboxylic acids have low dpK(a)/dT values, i.e. in the -0.002/degrees C to+0.002/degrees C range. The action of cooling and heating sections along the capillary during the runs affects also the local viscosity, conductivity, and electric field strength. The effect of these variables on electrophoretic velocity and band compression is theoretically calculated using a simple model. Finally, this stacking method was demonstrated for amino acids derivatized with naphthalene-2,3-dicarboxaldehyde and fluorescamine using a temperature difference of 70 degrees C between two neighbor sections and Tris as separation buffer. In this case, the BGE has a high pH thermal coefficient whereas the carboxylic groups of the analytes have low pK(a) thermal coefficients. The application of these dynamic thermal gradients increased peak height by a factor of two (and decreased the standard deviations of peaks by a factor of two) of aspartic acid and glutamic acid derivatized with naphthalene-2,3-dicarboxaldehyde and serine derivatized with fluorescamine. The effect of thermal compression of bands was not observed when runs were accomplished using phosphate buffer at pH 7 (negative control). Phosphate has a low dpH/dT in this pH range, similar to the dK(a)/dT of analytes. It is shown that vertical bar dK(a)/dT-dpH/dT vertical bar >> 0 is one determinant factor to have significant stacking produced by dynamic thermal junctions.
Resumo:
In a previous work [M. Mandaji, et al., this issue] a sample stacking method was theoretically modeled and experimentally demonstrated for analytes with low dpK(a)/dT (analytes carrying carboxylic groups) and BGEs with high dpH/dT (high pH-temperature-coefficients). In that work, buffer pH was modulated with temperature, inducing electrophoretic mobility changes in the analytes. In the present work, the opposite conditions are studied and tested, i.e. analytes with high dpK(a)/dT and BGEs that exhibit low dpH/dT. It is well known that organic bases such as amines, imidazoles, and benzimidazoles exhibit high dpK(a)/dT. Temperature variations induce instantaneous changes on the basicity of these and other basic groups. Therefore, the electrophoretic velocity of some analytes changes abruptly when temperature variations are applied along the capillary. This is true only if BGE pH remains constant or if it changes in the opposite direction of pK(a) of the analyte. The presence of hot and cold sections along the capillary also affects local viscosity, conductivity, and electric field strength. The effect of these variables on electrophoretic velocity and band stacking efficacy was also taken into account in the theoretical model presented. Finally, this stacking method is demonstrated for lysine partially derivatized with naphthalene-2,3-dicarboxaldehyde. In this case, the amino group of the lateral chain was left underivatized and only the alpha amino group was derivatized. Therefore, the basicity of the lateral amino group, and consequently the electrophoretic mobility, was modulated with temperature while the pH of the buffer used remained unchanged.