912 resultados para Cheminformatics, (Q)SAR, Cross-Validation, Visualization, 3D-Space
Resumo:
Ce mémoire s’intéresse à l’étude du critère de validation croisée pour le choix des modèles relatifs aux petits domaines. L’étude est limitée aux modèles de petits domaines au niveau des unités. Le modèle de base des petits domaines est introduit par Battese, Harter et Fuller en 1988. C’est un modèle de régression linéaire mixte avec une ordonnée à l’origine aléatoire. Il se compose d’un certain nombre de paramètres : le paramètre β de la partie fixe, la composante aléatoire et les variances relatives à l’erreur résiduelle. Le modèle de Battese et al. est utilisé pour prédire, lors d’une enquête, la moyenne d’une variable d’intérêt y dans chaque petit domaine en utilisant une variable auxiliaire administrative x connue sur toute la population. La méthode d’estimation consiste à utiliser une distribution normale, pour modéliser la composante résiduelle du modèle. La considération d’une dépendance résiduelle générale, c’est-à-dire autre que la loi normale donne une méthodologie plus flexible. Cette généralisation conduit à une nouvelle classe de modèles échangeables. En effet, la généralisation se situe au niveau de la modélisation de la dépendance résiduelle qui peut être soit normale (c’est le cas du modèle de Battese et al.) ou non-normale. L’objectif est de déterminer les paramètres propres aux petits domaines avec le plus de précision possible. Cet enjeu est lié au choix de la bonne dépendance résiduelle à utiliser dans le modèle. Le critère de validation croisée sera étudié à cet effet.
Resumo:
Introduction Prediction of soft tissue changes following orthognathic surgery has been frequently attempted in the past decades. It has gradually progressed from the classic “cut and paste” of photographs to the computer assisted 2D surgical prediction planning; and finally, comprehensive 3D surgical planning was introduced to help surgeons and patients to decide on the magnitude and direction of surgical movements as well as the type of surgery to be considered for the correction of facial dysmorphology. A wealth of experience was gained and numerous published literature is available which has augmented the knowledge of facial soft tissue behaviour and helped to improve the ability to closely simulate facial changes following orthognathic surgery. This was particularly noticed following the introduction of the three dimensional imaging into the medical research and clinical applications. Several approaches have been considered to mathematically predict soft tissue changes in three dimensions, following orthognathic surgery. The most common are the Finite element model and Mass tensor Model. These were developed into software packages which are currently used in clinical practice. In general, these methods produce an acceptable level of prediction accuracy of soft tissue changes following orthognathic surgery. Studies, however, have shown a limited prediction accuracy at specific regions of the face, in particular the areas around the lips. Aims The aim of this project is to conduct a comprehensive assessment of hard and soft tissue changes following orthognathic surgery and introduce a new method for prediction of facial soft tissue changes. Methodology The study was carried out on the pre- and post-operative CBCT images of 100 patients who received their orthognathic surgery treatment at Glasgow dental hospital and school, Glasgow, UK. Three groups of patients were included in the analysis; patients who underwent Le Fort I maxillary advancement surgery; bilateral sagittal split mandibular advancement surgery or bimaxillary advancement surgery. A generic facial mesh was used to standardise the information obtained from individual patient’s facial image and Principal component analysis (PCA) was applied to interpolate the correlations between the skeletal surgical displacement and the resultant soft tissue changes. The identified relationship between hard tissue and soft tissue was then applied on a new set of preoperative 3D facial images and the predicted results were compared to the actual surgical changes measured from their post-operative 3D facial images. A set of validation studies was conducted. To include: • Comparison between voxel based registration and surface registration to analyse changes following orthognathic surgery. The results showed there was no statistically significant difference between the two methods. Voxel based registration, however, showed more reliability as it preserved the link between the soft tissue and skeletal structures of the face during the image registration process. Accordingly, voxel based registration was the method of choice for superimposition of the pre- and post-operative images. The result of this study was published in a refereed journal. • Direct DICOM slice landmarking; a novel technique to quantify the direction and magnitude of skeletal surgical movements. This method represents a new approach to quantify maxillary and mandibular surgical displacement in three dimensions. The technique includes measuring the distance of corresponding landmarks digitized directly on DICOM image slices in relation to three dimensional reference planes. The accuracy of the measurements was assessed against a set of “gold standard” measurements extracted from simulated model surgery. The results confirmed the accuracy of the method within 0.34mm. Therefore, the method was applied in this study. The results of this validation were published in a peer refereed journal. • The use of a generic mesh to assess soft tissue changes using stereophotogrammetry. The generic facial mesh played a major role in the soft tissue dense correspondence analysis. The conformed generic mesh represented the geometrical information of the individual’s facial mesh on which it was conformed (elastically deformed). Therefore, the accuracy of generic mesh conformation is essential to guarantee an accurate replica of the individual facial characteristics. The results showed an acceptable overall mean error of the conformation of generic mesh 1 mm. The results of this study were accepted for publication in peer refereed scientific journal. Skeletal tissue analysis was performed using the validated “Direct DICOM slices landmarking method” while soft tissue analysis was performed using Dense correspondence analysis. The analysis of soft tissue was novel and produced a comprehensive description of facial changes in response to orthognathic surgery. The results were accepted for publication in a refereed scientific Journal. The main soft tissue changes associated with Le Fort I were advancement at the midface region combined with widening of the paranasal, upper lip and nostrils. Minor changes were noticed at the tip of the nose and oral commissures. The main soft tissue changes associated with mandibular advancement surgery were advancement and downward displacement of the chin and lower lip regions, limited widening of the lower lip and slight reversion of the lower lip vermilion combined with minimal backward displacement of the upper lip were recorded. Minimal changes were observed on the oral commissures. The main soft tissue changes associated with bimaxillary advancement surgery were generalized advancement of the middle and lower thirds of the face combined with widening of the paranasal, upper lip and nostrils regions. In Le Fort I cases, the correlation between the changes of the facial soft tissue and the skeletal surgical movements was assessed using PCA. A statistical method known as ’Leave one out cross validation’ was applied on the 30 cases which had Le Fort I osteotomy surgical procedure to effectively utilize the data for the prediction algorithm. The prediction accuracy of soft tissue changes showed a mean error ranging between (0.0006mm±0.582) at the nose region to (-0.0316mm±2.1996) at the various facial regions.
Resumo:
International audience
Resumo:
Cette thèse développe des méthodes bootstrap pour les modèles à facteurs qui sont couram- ment utilisés pour générer des prévisions depuis l'article pionnier de Stock et Watson (2002) sur les indices de diffusion. Ces modèles tolèrent l'inclusion d'un grand nombre de variables macroéconomiques et financières comme prédicteurs, une caractéristique utile pour inclure di- verses informations disponibles aux agents économiques. Ma thèse propose donc des outils éco- nométriques qui améliorent l'inférence dans les modèles à facteurs utilisant des facteurs latents extraits d'un large panel de prédicteurs observés. Il est subdivisé en trois chapitres complémen- taires dont les deux premiers en collaboration avec Sílvia Gonçalves et Benoit Perron. Dans le premier article, nous étudions comment les méthodes bootstrap peuvent être utilisées pour faire de l'inférence dans les modèles de prévision pour un horizon de h périodes dans le futur. Pour ce faire, il examine l'inférence bootstrap dans un contexte de régression augmentée de facteurs où les erreurs pourraient être autocorrélées. Il généralise les résultats de Gonçalves et Perron (2014) et propose puis justifie deux approches basées sur les résidus : le block wild bootstrap et le dependent wild bootstrap. Nos simulations montrent une amélioration des taux de couverture des intervalles de confiance des coefficients estimés en utilisant ces approches comparativement à la théorie asymptotique et au wild bootstrap en présence de corrélation sérielle dans les erreurs de régression. Le deuxième chapitre propose des méthodes bootstrap pour la construction des intervalles de prévision permettant de relâcher l'hypothèse de normalité des innovations. Nous y propo- sons des intervalles de prédiction bootstrap pour une observation h périodes dans le futur et sa moyenne conditionnelle. Nous supposons que ces prévisions sont faites en utilisant un ensemble de facteurs extraits d'un large panel de variables. Parce que nous traitons ces facteurs comme latents, nos prévisions dépendent à la fois des facteurs estimés et les coefficients de régres- sion estimés. Sous des conditions de régularité, Bai et Ng (2006) ont proposé la construction d'intervalles asymptotiques sous l'hypothèse de Gaussianité des innovations. Le bootstrap nous permet de relâcher cette hypothèse et de construire des intervalles de prédiction valides sous des hypothèses plus générales. En outre, même en supposant la Gaussianité, le bootstrap conduit à des intervalles plus précis dans les cas où la dimension transversale est relativement faible car il prend en considération le biais de l'estimateur des moindres carrés ordinaires comme le montre une étude récente de Gonçalves et Perron (2014). Dans le troisième chapitre, nous suggérons des procédures de sélection convergentes pour les regressions augmentées de facteurs en échantillons finis. Nous démontrons premièrement que la méthode de validation croisée usuelle est non-convergente mais que sa généralisation, la validation croisée «leave-d-out» sélectionne le plus petit ensemble de facteurs estimés pour l'espace généré par les vraies facteurs. Le deuxième critère dont nous montrons également la validité généralise l'approximation bootstrap de Shao (1996) pour les regressions augmentées de facteurs. Les simulations montrent une amélioration de la probabilité de sélectionner par- cimonieusement les facteurs estimés comparativement aux méthodes de sélection disponibles. L'application empirique revisite la relation entre les facteurs macroéconomiques et financiers, et l'excès de rendement sur le marché boursier américain. Parmi les facteurs estimés à partir d'un large panel de données macroéconomiques et financières des États Unis, les facteurs fortement correlés aux écarts de taux d'intérêt et les facteurs de Fama-French ont un bon pouvoir prédictif pour les excès de rendement.
Resumo:
Current research on achievement goals acknowledges that students can manifest different goal patterns. This study aimed to adapt and validate a self-report scale to assess the goal orientations of Portuguese students. A total of 2675 (age range 9–24 years) Portuguese students completed the Goal Orientations Scale (GOS). Through a cross-validation procedure, confirmatory factor analysis and descriptive statistics supports the existence of four different goal orientations: task, self-enhancing, self-defeating and avoidance orientations. The reliability and the internal validity estimates confirm that the GOS is an adequate instrument in assessing student goal orientations.
Resumo:
Agroforestry has large potential for carbon (C) sequestration while providing many economical, social, and ecological benefits via its diversified products. Airborne lidar is considered as the most accurate technology for mapping aboveground biomass (AGB) over landscape levels. However, little research in the past has been done to study AGB of agroforestry systems using airborne lidar data. Focusing on an agroforestry system in the Brazilian Amazon, this study first predicted plot-level AGB using fixed-effects regression models that assumed the regression coefficients to be constants. The model prediction errors were then analyzed from the perspectives of tree DBH (diameter at breast height)?height relationships and plot-level wood density, which suggested the need for stratifying agroforestry fields to improve plot-level AGB modeling. We separated teak plantations from other agroforestry types and predicted AGB using mixed-effects models that can incorporate the variation of AGB-height relationship across agroforestry types. We found that, at the plot scale, mixed-effects models led to better model prediction performance (based on leave-one-out cross-validation) than the fixed-effects models, with the coefficient of determination (R2) increasing from 0.38 to 0.64. At the landscape level, the difference between AGB densities from the two types of models was ~10% on average and up to ~30% at the pixel level. This study suggested the importance of stratification based on tree AGB allometry and the utility of mixed-effects models in modeling and mapping AGB of agroforestry systems.
Resumo:
Objetivo: El propósito del estudio fue describir estadísticamente las etapas de cambio comportamental frente al consumo de sustancias psicoactivas –SPA– (alcohol, tabaco y drogas ilegales) en escolares entre 9 y 17 años de Bogotá- Colombia, pertenecientes al estudio FUPRECOL. Método: Se trata de un estudio descriptivo y transversal en 6.965 niños y adolescentes entre 9 y 17 años, pertenecientes a 24 instituciones educativas oficiales de Bogotá - Colombia. La medición de los procesos de cambio propuestos por el Modelo Transteórico (MTT), aplicados al consumo de drogas, tabaco y alcohol se aplicaron de manera auto-diligenciada mediante un cuestionario estructurado. Resultados: De la muestra evaluada, el 58,4% fueron mujeres con un promedio de edad 12,74 ± 2.38 años. En la población en general, frente al consumo de drogas, el 6% de los escolares se encontraban en etapa de pre-contemplación, 44 % en contemplación; 30% en preparación/acción, 20% en mantenimiento. Con relación al consumo de alcohol, el 5% de los niños y adolescentes se encontraban en etapa de pre-contemplación, 36 % en contemplación; 12% en preparación/acción, 46% en mantenimiento. Frente al tabaco, el 4% de los niños y adolescentes se encontraban en etapa de pre-contemplación, 33 % en contemplación; 12% en preparación/acción, 51% en mantenimiento. Conclusiones: En los escolares evaluados, un importante porcentaje se ubica en la etapa de mantenimiento frente a la intención de consumo de tabaco y alcohol. Frente al consumo de drogas ilegales los niños y adolescentes están en la etapa de contemplación. Se requieren esfuerzos mayores para fomentar programas preventivos que enseñen sobre el riesgo del abuso/dependencia de este tipo de sustancias psicoactiva sobre la salud; dándole prioridad en las agendas y políticas públicas dentro del ámbito escolar.
Resumo:
Background There is a wide variation of recurrence risk of Non-small-cell lung cancer (NSCLC) within the same Tumor Node Metastasis (TNM) stage, suggesting that other parameters are involved in determining this probability. Radiomics allows extraction of quantitative information from images that can be used for clinical purposes. The primary objective of this study is to develop a radiomic prognostic model that predicts a 3 year disease free-survival (DFS) of resected Early Stage (ES) NSCLC patients. Material and Methods 56 pre-surgery non contrast Computed Tomography (CT) scans were retrieved from the PACS of our institution and anonymized. Then they were automatically segmented with an open access deep learning pipeline and reviewed by an experienced radiologist to obtain 3D masks of the NSCLC. Images and masks underwent to resampling normalization and discretization. From the masks hundreds Radiomic Features (RF) were extracted using Py-Radiomics. Hence, RF were reduced to select the most representative features. The remaining RF were used in combination with Clinical parameters to build a DFS prediction model using Leave-one-out cross-validation (LOOCV) with Random Forest. Results and Conclusion A poor agreement between the radiologist and the automatic segmentation algorithm (DICE score of 0.37) was found. Therefore, another experienced radiologist manually segmented the lesions and only stable and reproducible RF were kept. 50 RF demonstrated a high correlation with the DFS but only one was confirmed when clinicopathological covariates were added: Busyness a Neighbouring Gray Tone Difference Matrix (HR 9.610). 16 clinical variables (which comprised TNM) were used to build the LOOCV model demonstrating a higher Area Under the Curve (AUC) when RF were included in the analysis (0.67 vs 0.60) but the difference was not statistically significant (p=0,5147).
Resumo:
Universidade Estadual de Campinas. Faculdade de Educação Física
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
This paper describes 2D-QSAR and 3D-QSAR studies against Candida albicans and Cryptococcus neofarmans for a set of 20 bisbenzamidines. In the studies of 2D-QSAR with C. albicans it was obtained a correlation between log MIC-1 and lipolo component-Z (r² = 0.68; Q² = 0.51). In the case of C. neofarmans a correlation between log MIC-1 and lipolo component-Z and of Balaban index (r² = 0.85; Q² = 0.6) was obtained. 3D-QSAR studies using CoMFA showed that the steric fields contributed more to the predicted activities for Candida albicans (94.9%) and Cryptococcus neofarmans (97.9%).
Resumo:
Soils are an important component in the biogeochemical cycle of carbon, storing about four times more carbon than biomass plants and nearly three times more than the atmosphere. Moreover, the carbon content is directly related on the capacity of water retention, fertility. among other properties. Thus, soil carbon quantification in field conditions is an important challenge related to carbon cycle and global climatic changes. Nowadays. Laser Induced Breakdown Spectroscopy (LIBS) can be used for qualitative elemental analyses without previous treatment of samples and the results are obtained quickly. New optical technologies made possible the portable LIBS systems and now, the great expectation is the development of methods that make possible quantitative measurements with LIBS. The goal of this work is to calibrate a portable LIBS system to carry out quantitative measures of carbon in whole tropical soil sample. For this, six samples from the Brazilian Cerrado region (Argisoil) were used. Tropical soils have large amounts of iron in their compositions, so the carbon line at 247.86 nm presents strong interference of this element (iron lines at 247.86 and 247.95). For this reason, in this work the carbon line at 193.03 nm was used. Using methods of statistical analysis as a simple linear regression, multivariate linear regression and cross-validation were possible to obtain correlation coefficients higher than 0.91. These results show the great potential of using portable LIBS systems for quantitative carbon measurements in tropical soils. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Our objective was to develop a methodology to predict soil fertility using visible near-infrared (vis-NIR) diffuse reflectance spectra and terrain attributes derived from a digital elevation model (DEM). Specifically, our aims were to: (i) assemble a minimum data set to develop a soil fertility index for sugarcane (Sarcharum officinarum L.) (SFI-SC) for biofuel production in tropical soils; (ii) construct a model to predict the SFI-SC using soil vis-NIR spectra and terrain attributes; and (iii) produce a soil fertility map for our study area and assess it by comparing it with a green vegetation index (GVI). The study area was 185 ha located in sao Paulo State, Brazil. In total, 184 soil samples were collected and analyzed for a range of soil chemical and physical properties. Their vis-NIR spectra were collected from 400 to 2500 nm. The Shuttle Radar Topographic Mission 3-arcsec (90-m resolution) DEM of the area was used to derive 17 terrain attributes. A minimum data set of soil properties was selected to develop the SFI-SC. The SFI-SC consisted of three classes: Class 1, the highly fertile soils; Class 2, the fertile soils; and Class 3, the least fertile soils. It was derived heuristically with conditionals and using expert knowledge. The index was modeled with the spectra and terrain data using cross-validated decision trees. The cross-validation of the model correctly predicted Class 1 in 75% of cases, Class 2 in 61%, and Class 3 in 65%. A fertility map was derived for the study area and compared with a map of the GVI. Our approach offers a methodology that incorporates expert knowledge to derive the SFI-SC and uses a versatile spectro-spatial methodology that may be implemented for rapid and accurate determination of soil fertility and better exploration of areas suitable for production.
Resumo:
The DSSAT/CANEGRO model was parameterized and its predictions evaluated using data from five sugarcane (Sacchetrum spp.) experiments conducted in southern Brazil. The data used are from two of the most important Brazilian cultivars. Some parameters whose values were either directly measured or considered to be well known were not adjusted. Ten of the 20 parameters were optimized using a Generalized Likelihood Uncertainty Estimation (GLUE) algorithm using the leave-one-out cross-validation technique. Model predictions were evaluated using measured data of leaf area index (LA!), stalk and aerial dry mass, sucrose content, and soil water content, using bias, root mean squared error (RMSE), modeling efficiency (Eff), correlation coefficient, and agreement index. The Decision Support System for Agrotechnology Transfer (DSSAT)/CANEGRO model simulated the sugarcane crop in southern Brazil well, using the parameterization reported here. The soil water content predictions were better for rainfed (mean RMSE = 0.122mm) than for irrigated treatment (mean RMSE = 0.214mm). Predictions were best for aerial dry mass (Eff = 0.850), followed by stalk dry mass (Eff = 0.765) and then sucrose mass (Eff = 0.170). Number of green leaves showed the worst fit (Eff = -2.300). The cross-validation technique permits using multiple datasets that would have limited use if used independently because of the heterogeneity of measures and measurement strategies.