836 resultados para Regression imputation
Resumo:
Contexte - La variation interindividuelle de la réponse aux corticostéroïdes (CS) est un problème important chez les patients atteints de maladies inflammatoires d’intestin. Ce problème est bien plus accentué chez les enfants avec la prévalence de la corticodépendance extrêmement (~40 %) élevée. La maladie réfractaire au CS a des répercussions sur le développement et le bien-être physique et psychologique des patients et impose des coûts médicaux élevés, particulièrement avec la maladie active comparativement à la maladie en rémission, le coût étant 2-3 fois plus élevé en ambulatoire et 20 fois plus élevé en hôpital. Il est ainsi primordial de déterminer les marqueurs prédictifs de la réponse aux CS. Les efforts précédents de découvrir les marqueurs cliniques et démographiques ont été équivoques, ce qui souligne davantage le besoin de marqueurs moléculaires. L'action des CS se base sur des processus complexes déterminés génétiquement. Deux gènes, le ABCB1, appartenant à la famille des transporteurs transmembraneaux, et le NR3C1, encodant le récepteur glucocorticoïde, sont des éléments importants des voies métaboliques. Nous avons postulé que les variations dans ces gènes ont un rôle dans la variabilité observée de la réponse aux CS et pourraient servir en tant que les marqueurs prédictifs. Objectifs - Nous avons visé à: (1) examiner le fardeau de la maladie réfractaire aux CS chez les enfants avec la maladie de Crohn (MC) et le rôle des caractéristiques cliniques et démographiques potentiellement liés à la réponse; (2) étudier l'association entre les variantes d'ADN de gène ABCB1 et la réponse aux CS; (3) étudier les associations entre les variantes d'ADN de gène NR3C1 et la réponse aux CS. Méthodes - Afin d’atteindre ces objectifs, nous avons mené une étude de cohorte des patients recrutés dans deux cliniques pédiatriques tertiaires de gastroentérologie à l’Ottawa (CHEO) et à Montréal (HSJ). Les patients avec la MC ont été diagnostiqués avant l'âge de 18 ans selon les critères standard radiologiques, endoscopiques et histopathologiques. La corticorésistance et la corticodépendance ont été définies en adaptant les critères reconnus. L’ADN, acquise soit du sang ou de la salive, était génotypée pour des variations à travers de gènes ABCB1 et NR3C1 sélectionnées à l’aide de la méthodologie de tag-SNP. La fréquence de la corticorésistance et la corticodépendance a été estimée assumant une distribution binomiale. Les associations entre les variables cliniques/démographiques et la réponse aux CS ont été examinées en utilisant la régression logistique en ajustant pour des variables potentielles de confusion. Les associations entre variantes génétiques de ABCB1 et NR3C1 et la réponse aux CS ont été examinées en utilisant la régression logistique assumant différents modèles de la transmission. Les associations multimarqueurs ont été examinées en utilisant l'analyse de haplotypes. Les variantes nongénotypées ont été imputées en utilisant les données de HAPMAP et les associations avec SNPs imputés ont été examinées en utilisant des méthodes standard. Résultats - Parmi 645 patients avec la MC, 364 (56.2%) ont reçu CS. La majorité de patients étaient des hommes (54.9 %); présentaient la maladie de l’iléocôlon (51.7%) ou la maladie inflammatoire (84.6%) au diagnostic et étaient les Caucasiens (95.6 %). Huit pourcents de patients étaient corticorésistants et 40.9% - corticodépendants. Le plus bas âge au diagnostic (OR=1.34, 95% CI: 1.03-3.01, p=0.040), la maladie cœxistante de la région digestive supérieure (OR=1.35, 95% CI: 95% CI: 1.06-3.07, p=0.031) et l’usage simultané des immunomodulateurs (OR=0.35, 95% CI: 0.16-0.75, p=0.007) ont été associés avec la corticodépendance. Un total de 27 marqueurs génotypés à travers de ABCB1 (n=14) et NR3C1 (n=13) ont été en l'Équilibre de Hardy-Weinberg, à l’exception d’un dans le gène NR3C1 (rs258751, exclu). Dans ABCB1, l'allèle rare de rs2032583 (OR=0.56, 95% CI: 0.34-0.95, p=0.029) et génotype hétérozygote (OR=0.52, 95% CI: 0.28-0.95 p=0.035) ont été négativement associes avec la dépendance de CS. Un haplotype à 3 marqueurs, comprenant le SNP fonctionnel rs1045642 a été associé avec la dépendance de CS (p empirique=0.004). 24 SNPs imputés introniques et six haplotypes ont été significativement associés avec la dépendance de CS. Aucune de ces associations n'a cependant maintenu la signification après des corrections pour des comparaisons multiples. Dans NR3C1, trois SNPs: rs10482682 (OR=1.43, 95% CI: 0.99-2.08, p=0.047), rs6196 (OR=0.55, 95% CI: 0.31-0.95, p=0.024), et rs2963155 (OR=0.64, 95% CI: 0.42-0.98, p=0.039), ont été associés sous un modèle additif, tandis que rs4912911 (OR=0.37, 95% CI: 0.13-1.00, p=0.03) et rs2963156 (OR=0.32, 95% CI: 0.07-1.12, p=0.047) - sous un modèle récessif. Deux haplotypes incluant ces 5 SNPs (AAACA et GGGCG) ont été significativement (p=0.006 et 0.01 empiriques) associés avec la corticodépendance. 19 SNPs imputés ont été associés avec la dépendance de CS. Deux haplotypes multimarqueurs (p=0.001), incluant les SNPs génotypés et imputés, ont été associés avec la dépendance de CS. Conclusion - Nos études suggèrent que le fardeau de la corticodépendance est élevé parmi les enfants avec le CD. Les enfants plus jeunes au diagnostic et ceux avec la maladie coexistante de la région supérieure ainsi que ceux avec des variations dans les gènes ABCB1 et NR3C1 étaient plus susceptibles de devenir corticodépendants.
Resumo:
The main objective of this letter is to formulate a new approach of learning a Mahalanobis distance metric for nearest neighbor regression from a training sample set. We propose a modified version of the large margin nearest neighbor metric learning method to deal with regression problems. As an application, the prediction of post-operative trunk 3-D shapes in scoliosis surgery using nearest neighbor regression is described. Accuracy of the proposed method is quantitatively evaluated through experiments on real medical data.
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
An improved color video super-resolution technique using kernel regression and fuzzy enhancement is presented in this paper. A high resolution frame is computed from a set of low resolution video frames by kernel regression using an adaptive Gaussian kernel. A fuzzy smoothing filter is proposed to enhance the regression output. The proposed technique is a low cost software solution to resolution enhancement of color video in multimedia applications. The performance of the proposed technique is evaluated using several color videos and it is found to be better than other techniques in producing high quality high resolution color videos
Resumo:
In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576
Resumo:
Background: The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped. ----- Methods: Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets. ----- Results: Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams. ----- Conclusions: Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.
Resumo:
We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1-epsilon)^{- 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1-epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.
Resumo:
Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
Resumo:
This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’t includes below detection limits and/or zero values, and since most of the geological data responds to lognormal distributions, these “zero data” represent a mathematical challenge for the interpretation. We need to start by recognizing that there are zero values in geology. For example the amount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-exists with nepheline. Another common essential zero is a North azimuth, however we can always change that zero for the value of 360°. These are known as “Essential zeros”, but what can we do with “Rounded zeros” that are the result of below the detection limit of the equipment? Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimes we need to differentiate between a sodic and a potassic alteration. Pre-classification into groups requires a good knowledge of the distribution of the data and the geochemical characteristics of the groups which is not always available. Considering the zero values equal to the limit of detection of the used equipment will generate spurious distributions, especially in ternary diagrams. Same situation will occur if we replace the zero values by a small amount using non-parametric or parametric techniques (imputation). The method that we are proposing takes into consideration the well known relationships between some elements. For example, in copper porphyry deposits, there is always a good direct correlation between the copper values and the molybdenum ones, but while copper will always be above the limit of detection, many of the molybdenum values will be “rounded zeros”. So, we will take the lower quartile of the real molybdenum values and establish a regression equation with copper, and then we will estimate the “rounded” zero values of molybdenum by their corresponding copper values. The method could be applied to any type of data, provided we establish first their correlation dependency. One of the main advantages of this method is that we do not obtain a fixed value for the “rounded zeros”, but one that depends on the value of the other variable. Key words: compositional data analysis, treatment of zeros, essential zeros, rounded zeros, correlation dependency
Resumo:
In CoDaWork’05, we presented an application of discriminant function analysis (DFA) to 4 different compositional datasets and modelled the first canonical variable using a segmented regression model solely based on an observation about the scatter plots. In this paper, multiple linear regressions are applied to different datasets to confirm the validity of our proposed model. In addition to dating the unknown tephras by calibration as discussed previously, another method of mapping the unknown tephras into samples of the reference set or missing samples in between consecutive reference samples is proposed. The application of these methodologies is demonstrated with both simulated and real datasets. This new proposed methodology provides an alternative, more acceptable approach for geologists as their focus is on mapping the unknown tephra with relevant eruptive events rather than estimating the age of unknown tephra. Kew words: Tephrochronology; Segmented regression
Resumo:
Based on Rijt-Plooij and Plooij’s (1992) research on emergence of regression periods in the first two years of life, the presence of such periods in a group of 18 babies (10 boys and 8 girls, aged between 3 weeks and 14 months) from a Catalonian population was analyzed. The measurements were a questionnaire filled in by the infants’ mothers, a semi-structured weekly tape-recorded interview, and observations in their homes. The procedure and the instruments used in the project follow those proposed by Rijt-Plooij and Plooij. Our results confirm the existence of the regression periods in the first year of children’s life. Inter-coder agreement for trained coders was 78.2% and within-coder agreement was 90.1 %. In the discussion, the possible meaning and relevance of regression periods in order to understand development from a psychobiological and social framework is commented upon
Resumo:
Resumen tomado de la publicaci??n