Biblioteca Digital

946 resultados para REGRESSION TREE

Approximation of Bayesian predictive p-values with regression ABC

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the Bayesian framework a standard approach to model criticism is to compare some function of the observed data to a reference predictive distribution. The result of the comparison can be summarized in the form of a p-value, and it's well known that computation of some kinds of Bayesian predictive p-values can be challenging. The use of regression adjustment approximate Bayesian computation (ABC) methods is explored for this task. Two problems are considered. The first is the calibration of posterior predictive p-values so that they are uniformly distributed under some reference distribution for the data. Computation is difficult because the calibration process requires repeated approximation of the posterior for different data sets under the reference distribution. The second problem considered is approximation of distributions of prior predictive p-values for the purpose of choosing weakly informative priors in the case where the model checking statistic is expensive to compute. Here the computation is difficult because of the need to repeatedly sample from a prior predictive distribution for different values of a prior hyperparameter. In both these problems we argue that high accuracy in the computations is not required, which makes fast approximations such as regression adjustment ABC very useful. We illustrate our methods with several samples.

Comparison of four expert elicitation methods : For Bayesian logistic regression and classification trees

Relevância:

20.00% 20.00%

Publicador:

Perturbed Lignification Impacts Tree Growth in Hybrid Poplar-A Function of Sink Strength, Vascular Integrity, and Photosynthetic Assimilation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effects of reductions in cell wall lignin content, manifested by RNA interference suppression of coumaroyl 3'-hydroxylase, on plant growth, water transport, gas exchange, and photosynthesis were evaluated in hybrid poplar trees (Populus alba 3 grandidentata). The growth characteristics of the reduced lignin trees were significantly impaired, resulting in smaller stems and reduced root biomass when compared to wild-type trees, as well as altered leaf morphology and architecture. The severe inhibition of cell wall lignification produced trees with a collapsed xylem phenotype, resulting in compromised vascular integrity, and displayed reduced hydraulic conductivity and a greater susceptibility to wall failure and cavitation. In the reduced lignin trees, photosynthetic carbon assimilation and stomatal conductance were also greatly reduced, however, shoot xylem pressure potential and carbon isotope discrimination were higher and water-use efficiency was lower, inconsistent with water stress. Reductions in assimilation rate could not be ascribed to increased stomatal limitation. Starch and soluble sugars analysis of leaves revealed that photosynthate was accumulating to high levels, suggesting that the trees with substantially reduced cell wall lignin were not carbon limited and that reductions in sink strength were, instead, limiting photosynthesis.

A model-framed evaluation of elephant effects on tree and fire dynamics in African savannas

Relevância:

20.00% 20.00%

Publicador:

Resumo:

There is a concern that high densities of elephants in southern Africa could lead to the overall reduction of other forms of biodiversity. We present a grid-based model of elephant-savanna dynamics, which differs from previous elephant-vegetation models by accounting for woody plant demographics, tree-grass interactions, stochastic environmental variables (fire and rainfall), and spatial contagion of fire and tree recruitment. The model projects changes in height structure and spatial pattern of trees over periods of centuries. The vegetation component of the model produces long-term tree-grass coexistence, and the emergent fire frequencies match those reported for southern African savannas. Including elephants in the savanna model had the expected effect of reducing woody plant cover, mainly via increased adult tree mortality, although at an elephant density of 1.0 elephant/km2, woody plants still persisted for over a century. We tested three different scenarios in addition to our default assumptions. (1) Reducing mortality of adult trees after elephant use, mimicking a more browsing-tolerant tree species, mitigated the detrimental effect of elephants on the woody population. (2) Coupling germination success (increased seedling recruitment) to elephant browsing further increased tree persistence, and (3) a faster growing woody component allowed some woody plant persistence for at least a century at a density of 3 elephants/km2. Quantitative models of the kind presented here provide a valuable tool for exploring the consequences of management decisions involving the manipulation of elephant population densities. © 2005 by the Ecological Society of America.

Parallel streaming signature EM-tree: A clustering algorithm for web scale applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

A Monte-Carlo tree search in argumentation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Monte-Carlo Tree Search (MCTS) is a heuristic to search in large trees. We apply it to argumentative puzzles where MCTS pursues the best argumentation with respect to a set of arguments to be argued. To make our ideas as widely applicable as possible, we integrate MCTS to an abstract setting for argumentation where the content of arguments is left unspecified. Experimental results show the pertinence of this integration for learning argumentations by comparing it with a basic reinforcement learning.

Using decision trees to understand structure in missing data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives Demonstrate the application of decision trees – classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs) – to understand structure in missing data. Setting Data taken from employees at three different industry sites in Australia. Participants 7915 observations were included. Materials and Methods The approach was evaluated using an occupational health dataset comprising results of questionnaires, medical tests, and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the Type of data (medical or environmental), the site in which it was collected, the number of visits and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusion Researchers are encouraged to use CART and BRT models to explore and understand missing data.

Micromorphology and anatomy of leaves of Syzygium floribundum (Myrtaceae: Syzygieae), a rainforest tree endemic to eastern Australia

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although species of Syzygium are abundant components of the rainforests in Queensland and New South Wales, little is known about the anatomy of the Australian taxa. Here we describe the foliar anatomy and micromorphology of Syzygium floribundum (syn: Waterhousea floribunda) using standard protocols for scanning electron microscopy (SEM) and light microscopy. Syzygium floribundum possesses dorsiventral leaves with cyclo-staurocytic stomata, single epidermis, internal phloem, rhombus-shaped calcium oxalate crystals and complex-open midrib. In general, leaf anatomical and micromorphological characters are common with some species of the tribe Syzygieae. However, this particular combination of leaf characters has not been reported in a species of the genus. The anatomy of the species is typical of mesophytic taxa.

Bivariate genome-wide association study of genetically correlated neuroimaging phenotypes from DTI and MRI through a seemingly unrelated regression model

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large multisite efforts (e.g., the ENIGMA Consortium), have shown that neuroimaging traits including tract integrity (from DTI fractional anisotropy, FA) and subcortical volumes (from T1-weighted scans) are highly heritable and promising phenotypes for discovering genetic variants associated with brain structure. However, genetic correlations (rg) among measures from these different modalities for mapping the human genome to the brain remain unknown. Discovering these correlations can help map genetic and neuroanatomical pathways implicated in development and inherited risk for disease. We use structural equation models and a twin design to find rg between pairs of phenotypes extracted from DTI and MRI scans. When controlling for intracranial volume, the caudate as well as related measures from the limbic system - hippocampal volume - showed high rg with the cingulum FA. Using an unrelated sample and a Seemingly Unrelated Regression model for bivariate analysis of this connection, we show that a multivariate GWAS approach may be more promising for genetic discovery than a univariate approach applied to each trait separately.

Discovery and replication of gene influences on brain structure using LASSO regression

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2.We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8±2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.

Non-linear regression model for spatial variation in precipitation chemistry for South India

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO3, NO3 and Mg do not change much from coast to inland while, SO4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent (R-2 similar to 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of similar to 5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India. (C) 2008 Elsevier Ltd. All rights reserved.

Secondary attribute retrieval using tree data structures

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several techniques are known for searching an ordered collection of data. The techniques and analyses of retrieval methods based on primary attributes are straightforward. Retrieval using secondary attributes depends on several factors. For secondary attribute retrieval, the linear structures—inverted lists, multilists, doubly linked lists—and the recently proposed nonlinear tree structures—multiple attribute tree (MAT), K-d tree (kdT)—have their individual merits. It is shown in this paper that, of the two tree structures, MAT possesses several features of a systematic data structure for external file organisation which make it superior to kdT. Analytic estimates for the complexity of node searchers, in MAT and kdT for several types of queries, are developed and compared.

Rank regression for analyzing ordinal qualitative data for treatment comparison

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ordinal qualitative data are often collected for phenotypical measurements in plant pathology and other biological sciences. Statistical methods, such as t tests or analysis of variance, are usually used to analyze ordinal data when comparing two groups or multiple groups. However, the underlying assumptions such as normality and homogeneous variances are often violated for qualitative data. To this end, we investigated an alternative methodology, rank regression, for analyzing the ordinal data. The rank-based methods are essentially based on pairwise comparisons and, therefore, can deal with qualitative data naturally. They require neither normality assumption nor data transformation. Apart from robustness against outliers and high efficiency, the rank regression can also incorporate covariate effects in the same way as the ordinary regression. By reanalyzing a data set from a wheat Fusarium crown rot study, we illustrated the use of the rank regression methodology and demonstrated that the rank regression models appear to be more appropriate and sensible for analyzing nonnormal data and data with outliers.

Efficient estimation for rank-based regression with clustered data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rank-based inference is widely used because of its robustness. This article provides optimal rank-based estimating functions in analysis of clustered data with random cluster effects. The extensive simulation studies carried out to evaluate the performance of the proposed method demonstrate that it is robust to outliers and is highly efficient given the existence of strong cluster correlations. The performance of the proposed method is satisfactory even when the correlation structure is misspecified, or when heteroscedasticity in variance is present. Finally, a real dataset is analyzed for illustration.

Quantile regression for longitudinal data with a working correlation model

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a linear quantile regression analysis method for longitudinal data that combines the between- and within-subject estimating functions, which incorporates the correlations between repeated measurements. Therefore, the proposed method results in more efficient parameter estimation relative to the estimating functions based on an independence working model. To reduce computational burdens, the induced smoothing method is introduced to obtain parameter estimates and their variances. Under some regularity conditions, the estimators derived by the induced smoothing method are consistent and have asymptotically normal distributions. A number of simulation studies are carried out to evaluate the performance of the proposed method. The results indicate that the efficiency gain for the proposed method is substantial especially when strong within correlations exist. Finally, a dataset from the audiology growth research is used to illustrate the proposed methodology.

«
1
2
...
9
10
11
12
13
14
15
...
63
64
»