956 resultados para random forest regression
Resumo:
BACKGROUND: Conversion of glucose into lipid (de novo lipogenesis; DNL) is a possible fate of carbohydrate administered during nutritional support. It cannot be detected by conventional methods such as indirect calorimetry if it does not exceed lipid oxidation. OBJECTIVE: The objective was to evaluate the effects of carbohydrate administered as part of continuous enteral nutrition in critically ill patients. DESIGN: This was a prospective, open study including 25 patients nonconsecutively admitted to a medicosurgical intensive care unit. Glucose metabolism and hepatic DNL were measured in the fasting state or after 3 d of continuous isoenergetic enteral feeding providing 28%, 53%, or 75% carbohydrate. RESULTS: DNL increased with increasing carbohydrate intake (f1.gif" BORDER="0"> +/- SEM: 7.5 +/- 1.2% with 28% carbohydrate, 9.2 +/- 1.5% with 53% carbohydrate, and 19.4 +/- 3.8% with 75% carbohydrate) and was nearly zero in a group of patients who had fasted for an average of 28 h (1.0 +/- 0.2%). In multiple regression analysis, DNL was correlated with carbohydrate intake, but not with body weight or plasma insulin concentrations. Endogenous glucose production, assessed with a dual-isotope technique, was not significantly different between the 3 groups of patients (13.7-15.3 micromol * kg(-1) * min(-1)), indicating impaired suppression by carbohydrate feeding. Gluconeogenesis was measured with [(13)C]bicarbonate, and increased as the carbohydrate intake increased (from 2.1 +/- 0.5 micromol * kg(-1) * min(-1) with 28% carbohydrate intake to 3.7 +/- 0.3 micromol * kg(-1) * min(-1) with 75% carbohydrate intake, P: < 0. 05). CONCLUSION: Carbohydrate feeding fails to suppress endogenous glucose production and gluconeogenesis, but stimulates DNL in critically ill patients.
Resumo:
BACKGROUND: Therapy of chronic hepatitis C (CHC) with pegIFNα/ribavirin achieves a sustained virologic response (SVR) in ∼55%. Pre-activation of the endogenous interferon system in the liver is associated with non-response (NR). Recently, genome-wide association studies described associations of allelic variants near the IL28B (IFNλ3) gene with treatment response and with spontaneous clearance of the virus. We investigated if the IL28B genotype determines the constitutive expression of IFN stimulated genes (ISGs) in the liver of patients with CHC. METHODS: We genotyped 93 patients with CHC for 3 IL28B single nucleotide polymorphisms (SNPs, rs12979860, rs8099917, rs12980275), extracted RNA from their liver biopsies and quantified the expression of IL28B and of 8 previously identified classifier genes which discriminate between SVR and NR (IFI44L, RSAD2, ISG15, IFI22, LAMP3, OAS3, LGALS3BP and HTATIP2). Decision tree ensembles in the form of a random forest classifier were used to calculate the relative predictive power of these different variables in a multivariate analysis. RESULTS: The minor IL28B allele (bad risk for treatment response) was significantly associated with increased expression of ISGs, and, unexpectedly, with decreased expression of IL28B. Stratification of the patients into SVR and NR revealed that ISG expression was conditionally independent from the IL28B genotype, i.e. there was an increased expression of ISGs in NR compared to SVR irrespective of the IL28B genotype. The random forest feature score (RFFS) identified IFI27 (RFFS = 2.93), RSAD2 (1.88) and HTATIP2 (1.50) expression and the HCV genotype (1.62) as the strongest predictors of treatment response. ROC curves of the IL28B SNPs showed an AUC of 0.66 with an error rate (ERR) of 0.38. A classifier with the 3 best classifying genes showed an excellent test performance with an AUC of 0.94 and ERR of 0.15. The addition of IL28B genotype information did not improve the predictive power of the 3-gene classifier. CONCLUSIONS: IL28B genotype and hepatic ISG expression are conditionally independent predictors of treatment response in CHC. There is no direct link between altered IFNλ3 expression and pre-activation of the endogenous system in the liver. Hepatic ISG expression is by far the better predictor for treatment response than IL28B genotype.
Resumo:
Problems related to fire hazard and fire management have become in recent decades one of the most relevant issues in the Wildland-Urban Interface (WUI), that is the area where human infrastructures meet or intermingle with natural vegetation. In this paper we develop a robust geospatial method for defining and mapping the WUI in the Alpine environment, where most interactions between infrastructures and wildland vegetation concern the fire ignition through human activities, whereas no significant threats exist for infrastructures due to contact with burning vegetation. We used the three Alpine Swiss cantons of Ticino, Valais and Grisons as the study area. The features representing anthropogenic infrastructures (urban or infrastructural components of the WUI) as well as forest cover related features (wildland component of the WUI) were selected from the Swiss Topographic Landscape Model (TLM3D). Georeferenced forest fire occurrences derived from the WSL Swissfire database were used to define suitable WUI interface distances. The Random Forest algorithm was applied to estimate the importance of predictor variables to fire ignition occurrence. This revealed that buildings and drivable roads are the most relevant anthropogenic components with respect to fire ignition. We consequently defined the combination of drivable roads and easily accessible (i.e. 100 m from the next drivable road) buildings as the WUI-relevant infrastructural component. For the definition of the interface (buffer) distance between WUI infrastructural and wildland components, we computed the empirical cumulative distribution functions (ECDF) of the percentage of ignition points (observed and simulated) arising at increasing distances from the selected infrastructures. The ECDF facilitates the calculation of both the distance at which a given percentage of ignition points occurred and, in turn, the amount of forest area covered at a given distance. Finally, we developed a GIS ModelBuilder routine to map the WUI for the selected buffer distance. The approach was found to be reproducible, robust (based on statistical analyses for evaluating parameters) and flexible (buffer distances depending on the targeted final area covered) so that fire managers may use it to detect WUI according to their specific priorities.
Metodologia baseada em técnicas de mineração de dados para suporte à certificação de raças de ovinos
Resumo:
RESUMO O objetivo deste trabalho foi desenvolver uma metodologia baseada em técnicas de mineração de dados para selecionar os principais marcadores SNP (Single Nucleotide Polymorphism) para as raças de ovinos: Crioula, Morada Nova e Santa Inês. Os dados utilizados foram obtidos do Consórcio Internacional de Ovinos e são compostos por 72 animais das raças citadas, e cada animal possui 49.034 marcadores SNP. Considerando que o número de atributos (marcadores) é muito maior que o de observações (animais), foram aplicadas as técnicas de predição LASSO (Least Absolute Shrinkage and Selection Operator), Random Forest e Boosting para a geração de modelos preditivos que incorporam métodos de seleção de atributos. Os resultados revelaram que os modelos preditivos selecionaram os principais marcadores SNP para identificação das raças estudadas. O modelo LASSO selecionou um total de 29 marcadores relevantes. A partir dos modelos Random Forest e Boosting, foram obtidos 27 e 20 marcadores importantes, respectivamente. Por meio da intersecção dos modelos gerados, identificou-se um subconjunto de 18 marcadores com maior potencial de identificação das raças.
Resumo:
L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges.
Resumo:
Wydział Nauk Geograficznych i Geologicznych: Instytut Geoekologii i Geoinformacji
Resumo:
Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
Vehicle activated signs (VAS) display a warning message when drivers exceed a particular threshold. VAS are often installed on local roads to display a warning message depending on the speed of the approaching vehicles. VAS are usually powered by electricity; however, battery and solar powered VAS are also commonplace. This thesis investigated devel-opment of an automatic trigger speed of vehicle activated signs in order to influence driver behaviour, the effect of which has been measured in terms of reduced mean speed and low standard deviation. A comprehen-sive understanding of the effectiveness of the trigger speed of the VAS on driver behaviour was established by systematically collecting data. Specif-ically, data on time of day, speed, length and direction of the vehicle have been collected for the purpose, using Doppler radar installed at the road. A data driven calibration method for the radar used in the experiment has also been developed and evaluated. Results indicate that trigger speed of the VAS had variable effect on driv-ers’ speed at different sites and at different times of the day. It is evident that the optimal trigger speed should be set near the 85th percentile speed, to be able to lower the standard deviation. In the case of battery and solar powered VAS, trigger speeds between the 50th and 85th per-centile offered the best compromise between safety and power consump-tion. Results also indicate that different classes of vehicles report differ-ences in mean speed and standard deviation; on a highway, the mean speed of cars differs slightly from the mean speed of trucks, whereas a significant difference was observed between the classes of vehicles on lo-cal roads. A differential trigger speed was therefore investigated for the sake of completion. A data driven approach using Random forest was found to be appropriate in predicting trigger speeds respective to types of vehicles and traffic conditions. The fact that the predicted trigger speed was found to be consistently around the 85th percentile speed justifies the choice of the automatic model.
Resumo:
A resistência a múltiplos fármacos é um grande problema na terapia anti-cancerígena, sendo a glicoproteína-P (P-gp) uma das responsáveis por esta resistência. A realização deste trabalho incidiu principalmente no desenvolvimento de modelos matemáticos/estatísticos e “químicos”. Para os modelos matemáticos/estatísticos utilizamos métodos de Machine Learning como o Support Vector Machine (SVM) e o Random Forest, (RF) em relação aos modelos químicos utilizou-se farmacóforos. Os métodos acima mencionados foram aplicados a diversas proteínas P-gp, p53 e complexo p53-MDM2, utilizando duas famílias: as pifitrinas para a p53 e flavonóides para P-gp e, em menor medida, um grupo diversificado de moléculas de diversas famílias químicas. Nos modelos obtidos pelo SVM quando aplicados à P-gp e à família dos flavonóides, obtivemos bons valores através do kernel Radial Basis Function (RBF), com precisão de conjunto de treino de 94% e especificidade de 96%. Quanto ao conjunto de teste com previsão de 70% e especificidade de 67%, sendo que o número de falsos negativos foi o mais baixo comparativamente aos restantes kernels. Aplicando o RF à família dos flavonóides verificou-se que o conjunto de treino apresenta 86% de precisão e uma especificidade de 90%, quanto ao conjunto de teste obtivemos uma previsão de 70% e uma especificidade de 60%, existindo a particularidade de o número de falsos negativos ser o mais baixo. Repetindo o procedimento anterior (RF) e utilizando um total de 63 descritores, os resultados apresentaram valores inferiores obtendo-se para o conjunto de treino 79% de precisão e 82% de especificidade. Aplicando o modelo ao conjunto de teste obteve-se 70% de previsão e 60% de especificidade. Comparando os dois métodos, escolhemos o método SVM com o kernel RBF como modelo que nos garante os melhores resultados de classificação. Aplicamos o método SVM à P-gp e a um conjunto de moléculas não flavonóides que são transportados pela P-gp, obteve-se bons valores através do kernel RBF, com precisão de conjunto de treino de 95% e especificidade de 93%. Quanto ao conjunto de teste, obtivemos uma previsão de 70% e uma especificidade de 69%, existindo a particularidade de o número de falsos negativos ser o mais baixo. Aplicou-se o método do farmacóforo a três alvos, sendo estes, um conjunto de inibidores flavonóides e de substratos não flavonóides para a P-gp, um grupo de piftrinas para a p53 e um conjunto diversificado de estruturas para a ligação da p53-MDM2. Em cada um dos quatro modelos de farmacóforos obtidos identificou-se três características, sendo que as características referentes ao anel aromático e ao dador de ligações de hidrogénio estão presentes em todos os modelos obtidos. Realizando o rastreio em diversas bases de dados utilizando os modelos, obtivemos hits com uma grande diversidade estrutural.
Resumo:
Understanding the historical and ecological relationships which are influent in current biological diversity is one of the most challenging tasks of evolutionary biology. Recent systematics emphasizes the need of integrative approaches to delimit different lineages and species. The northeastern Brazil, mostly placed in Caatinga biome, is characterized by a semi-arid weather, low precipitation and seasonal behavior of rivers. This region is regarded lacking as ichthyological knowledge and one of the most threatened by anthropic activities. Further, will be affected by a massive water diverpsion work that will transfer waters from São Francisco basin, to other major four basins: Jaguaribe, Apodi-Mossoró, Piranhas-Açu and Paraiba do Norte. Loss of diversity and richness, hibridizitation, community interactions changes, population homogenization, changes in water quality and flow regime, are examples of environmental impacts already related with similar works. The present study aims to investigate morphological and molecular variation of Cichlasoma orientale Kullander 1983 and Crenicichla menezesi Ploeg 1991, two cichlid species present in northeastern Brazil basins. Further, the study aims to evaluate the influence of geomorphological and climatic processes in this variation, and point some possible impacts of the artificial connectivity which can be brought by São Francisco interbasin water transfer to their population dynamics. Geometric morphometrics and phylogeographical analysis were used to investigate the populations from three different hydrological regions. Our results showed a significant morphological variation of populations from basins that are involved in the São Franscisco s diversion project, not related to an ancient separation between populations, emphasizing morphological variation which could represent a set of plastic responses to the variable hydrological regime in Northeastern Brazil. The role of plastical responses in naturally variable habitats as well as the potential disturbs that could be brought by the interbasin water transfer works are discussed here. Further, our molecular data allowed us to make inferences about species distribution and their taxonomy, and identification of a potential new species of Crenicichla for São Francisco river basin. Our data also allowed to identify some shared haplotypes for both species, which could be related to lineage sorting scenarios or recent gene flow between populations. However a strong structure in most of the pairwise comparisons between populations for both species was revealed. Climatic events such as Atlantic forest regression during the Pleistocene, sea level fluctuations and dispersion by paleorivers in the mouth of Apodi-Mossoró river, and neotectonic events regulating the connection between drainages are likely to have had a contribution for the actual lineages distribution in northeastern Brazil. Further, analysis of molecular variation (AMOVA and SAMOVA) showed that the actual basin s isolation is an important factor to molecular variation, in spite of the signal of recent contact between some basins. Different genetic diversity patterns between species could be related to multiple historic events of colonization, basins landscapes or biological differences. The present study represents the first effort of integrative systematics involving fish species of northeastern Brazil, and showed important morphological and molecular patterns which could be irrecoverably affected by the artificial connection that might be caused by the São Francisco interbasin water transfer
Resumo:
The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.
Resumo:
Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
[EN]In this work an experimental study about the capability of the LBP, HOG descriptors and color for clothing attribute classification is presented. Two different variants of the LBP descriptor are considered, the original LBP and the uniform LBP. Two classifiers, Linear SVM and Random Forest, have been included in the comparison because they have been frequently used in clothing attributes classification. The experiments are carried out with a public available dataset, the clothing attribute dataset, that has 26 attributes in total. The obtained accuracies are over 75% in most cases, reaching 80% for the necktie or sleeve length attributes.