986 resultados para variable importance


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação de Mestrado apresentada ao ISPA - Instituto Universitário

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives Demonstrate the application of decision trees – classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs) – to understand structure in missing data. Setting Data taken from employees at three different industry sites in Australia. Participants 7915 observations were included. Materials and Methods The approach was evaluated using an occupational health dataset comprising results of questionnaires, medical tests, and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the Type of data (medical or environmental), the site in which it was collected, the number of visits and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusion Researchers are encouraged to use CART and BRT models to explore and understand missing data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain) during the 1961-2001 period. Three types of downscaling models have been built: (i) analogues, (ii) analogues followed by random forests and (iii) analogues followed by multiple linear regression. The inputs consist of data (predictor fields) taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961-1996) and a test period (19972001), where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article highlights the potential benefits that the Kohonen method has for the classification of rivers with similar characteristics by determining regional ecological flows using the ELOHA (Ecological Limits of Hydrologic Alteration) methodology. Currently, there are many methodologies for the classification of rivers, however none of them include the characteristics found in Kohonen method such as (i) providing the number of groups that actually underlie the information presented, (ii) used to make variable importance analysis, (iii) which in any case can display two-dimensional classification process, and (iv) that regardless of the parameters used in the model the clustering structure remains. In order to evaluate the potential benefits of the Kohonen method, 174 flow stations distributed along the great river basin “Magdalena-Cauca” (Colombia) were analyzed. 73 variables were obtained for the classification process in each case. Six trials were done using different combinations of variables and the results were validated against reference classification obtained by Ingfocol in 2010, whose results were also framed using ELOHA guidelines. In the process of validation it was found that two of the tested models reproduced a level higher than 80% of the reference classification with the first trial, meaning that more than 80% of the flow stations analyzed in both models formed invariant groups of streams.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives: The aim of this preliminary study was to characterize the plasma lipid profiling of women with preeclampsia. Design and methods: Plasma samples of 8 pregnant women with early-onset preeclampsia and 8 normal pregnant women were evaluated. Lipids were extracted from plasma using the Bligh-Dyer protocol. The extracts were subjected to MALDI-MS. Data matrix was exported for partial least squares discriminant analysis (PLS-DA) and a parameter VIP was employed to reflect the variable importance in the discriminant analysis. The major discriminant variables were selected and underwent to Mann-Whitney U test. Results: A total of 1290 ions were initially identified and twelve m/z signals were highlighted as the most important lipids for the discrimination of patients with preeclampsia. The identification of these differential lipids was carried out through Lipid Database Search. Conclusions: The main classes identified were glycerophosphocholines [GP01], glycerophosphoserines [GP03], glycerophosphoglycerols [GP04], glycosyldiradylglycerols [GL05] and glycerophosphates [GP10]. (C) 2012 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acetylcholine interacts with muscarinic receptors (M) to mediate gastrointestinal (GI) smooth muscle contractions. We have compared mRNA levels and binding sites of M(1)to M(5) in muscle tissues from fundus abomasi, pylorus, ileum, cecum, proximal loop of the ascending colon (PLAC), and external loop of the spiral colon (ELSC) of healthy dairy cows. The mRNA levels were measured by quantitative RT-PCR. The inhibition of [(3)H]-QNB (1-quinuclidinyl-[phenyl-4-(3)H]-benzilate) binding by M antagonists [atropine (M(1 - 5)), pirenzepine (M(1)), methoctramine (M(2)), 4-DAMP (M(3)), and tropicamide (M(4))] was used to identify receptors at the functional level. Maximal binding (B(max)) was determined through saturation binding with atropine as a competitor. The mRNA levels of M(1), M(2), M(3), and M(5) represented 0.2, 48, 50, and 1.8%, respectively, of the total M population, whereas mRNA of M(4) was undetectable. The mRNA levels of M(2) and of M(3) in the ileum were lower (P < 0.05) than in other GI locations, which were similar among each other. Atropine, pirenzepine, methoctramine, and 4-DAMP inhibited [(3)H]-QNB binding according to an either low- or high-affinity receptor pattern, whereas tropicamide had no effect on [(3)H]-QNB binding. The [(3)H]-QNB binding was dose-dependent and saturable. B(max) in fundus, pylorus, and PLAC was lower (P < 0.05) than in the ELSC, and in the pylorus lower (P < 0.05) than in the ileum. B(max) and mRNA levels were negatively correlated (r = -0.3; P < 0.05). In conclusion, densities of M are different among GI locations, suggesting variable importance of M for digestive functions along the GI tract.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-08

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Rayleigh–Stokes problems have in recent years received much attention due to their importance in physics. In this article, we focus on the variable-order Rayleigh–Stokes problem for a heated generalized second grade fluid with fractional derivative. Implicit and explicit numerical methods are developed to solve the problem. The convergence, stability of the numerical methods and solvability of the implicit numerical method are discussed via Fourier analysis. Moreover, a numerical example is given and the results support the effectiveness of the theoretical analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Enzymes belonging to the M1 family play important cellular roles and the key amino acids (aa) in the catalytic domain are conserved. However, C-terminal domain aa are highly variable and demonstrate distinct differences in organization. To address a functional role for the C-terminal domain, progressive deletions were generated in Tricorn interacting factor F2 from Thermoplasma acidophilum (F2) and Peptidase N from Escherichia coli (PepN). Catalytic activity was partially reduced in PepN lacking 4 C-terminal residues (PepNΔC4) whereas it was greatly reduced in F2 lacking 10 C-terminal residues (F2ΔC10) or PepN lacking eleven C-terminal residues (PepNΔC11). Notably, expression of PepNΔC4, but not PepNΔC11, in E. coliΔpepN increased its ability to resist nutritional and high temperature stress, demonstrating physiological significance. Purified C-terminal deleted proteins demonstrated greater sensitivity to trypsin and bound stronger to 8-amino 1-napthalene sulphonic acid (ANS), revealing greater numbers of surface exposed hydrophobic aa. Also, F2 or PepN containing large aa deletions in the C-termini, but not smaller deletions, were present in high amounts in the insoluble fraction of cell extracts probably due to reduced protein solubility. Modeling studies, using the crystal structure of E. coli PepN, demonstrated increase in hydrophobic surface area and change in accessibility of several aa from buried to exposed upon deletion of C-terminal aa. Together, these studies revealed that non-conserved distal C-terminal aa repress the surface exposure of apolar aa, enhance protein solubility, and catalytic activity in two soluble and distinct members of the M1 family.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Forestry has influenced forest dwelling organisms for centuries in Fennoscandia. For example, in Finland ca. 30% of the threatened species are threatened because of forestry. Nowadays forest management recommendations include practices aimed at maintaining biodiversity in harvesting, such as green-tree retention. However, the effects of these practices have been little studied. In variable retention, different numbers of trees are retained, varying from green-tree retention (at least a few live standing trees in clear-cuts) to thinning (only individual trees removed). I examined the responses of ground-dwelling spiders and carabid beetles to green-tree retention (with small and large tree groups), gap felling and thinning aimed at an uneven age structure of trees. The impacts of these harvesting methods were compared to those of clear-cutting and uncut controls. I aimed to test the hypothesis that retaining more trees positively affects populations of those species of spiders and carabids that were present before harvesting. The data come from two studies. First, spiders were collected with pitfall traps in south-central Finland in 1995 (pre-treatment) and 1998 (after-treatment) in order to examine the effects of clear-cutting, green-tree retention (with 0.01-0.02-ha sized tree groups), gap felling (with three 0.16-ha sized openings in a 1-ha stand), thinning aiming at an uneven age structure of trees and uncut control. Second, spiders and carabids were caught with pitfall traps in eastern Finland in 1998-2001 (pre-treatment and three post-treatment years) in eleven 0.09-0.55-ha sized retention-tree groups and clear-cuts adjacent to them. Original spider and carabid assemblages were better maintained after harvests that retained more trees. Thinning maintained forest spiders well. However, gap felling and large retention-tree groups maintained some forest spider and carabid species in the short-term, but negatively affected some species over time. However, use of small retention-tree groups was associated with negative effects on forest spider populations. Studies are needed on the long-term effects of variable retention on terrestrial invertebrates; especially those directed at defining appropriate retention patch size and on the importance of structural diversity provided by variable retention for invertebrate populations. However, the aims of variable retention should be specified first. For example, are retention-tree groups planned to constitute life-boats , stepping-stones or to create structural diversity? Does it suffice that some species are maintained, or do we want to preserve the most sensitive ones, and how are these best defined? Moreover, the ecological benefits and economic costs of modified logging methods should be compared to other approaches aimed at maintaining biodiversity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Insulin like growth factor binding protein 2 (IGFBP2) is highly up regulated in glioblastoma (GBM) tissues and has been one of the prognostic indicators. There are compelling evidences suggesting important roles for IGFBP2 in glioma cell proliferation, migration and invasion. Extracellular IGFBP2 through its carboxy terminal arginine glycine aspartate (RGD) motif can bind to cell surface alpha 5 beta 1 integrins and activate pathways downstream to integrin signaling. This IGFBP2 activated integrin signaling is known to play a crucial role in IGFBP2 mediated invasion of glioma cells. Hence a molecular inhibitor of carboxy terminal domain of IGFBP2 which can inhibit IGFBP2-cell surface interaction is of great therapeutic importance. In an attempt to develop molecular inhibitors of IGFBP2, we screened single chain variable fragment (scFv) phage display libraries, Tomlinson I (Library size 1.47 x 10(8)) and Tomlinson J (Library size 1.37 x 10(8)) using human recombinant IGFBP2. After screening we obtained three IGFBP2 specific binders out of which one scFv B7J showed better binding to IGFBP2 at its carboxy terminal domain, blocked IGFBP2-cell surface association, reduced activity of matrix metalloprotease 2 in the conditioned medium of glioma cells and inhibited IGFBP2 induced migration and invasion of glioma cells. We demonstrate for the first time that in vitro inhibition of extracellular IGFBP2 activity by using human scFv results in significant reduction of glioma cell migration and invasion. Therefore, the inhibition of IGFBP2 can serve as a potential therapeutic strategy in the management of GBM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[ES] Actualmente las empresas se enfrentan a un nuevo reto: la integración de los aspectos medioambientales en la gestión y toma de decisiones empresariales. Varios han sido los factores desencadenantes, que han inducido a la Economía de la Empresa y al Marketing a introducir la variable medio ambiente en sus planteamientos, en aras a conformar una base teórica sistematizada que permita abordar con éxito las necesidades medioambientales de la clientela y de la sociedad. En este trabajo se analizan los factores que han contribuido a incrementar la importancia de la variable medio ambiente en la empresa, así como las causas que, en su día, motivaron su exclusión.