876 resultados para Boosted regression trees
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The present study aimed to comparatively verify the relation between the hermit crabs and the shells they use in two populations of Loxopagurus loxochelis. Samples were collected monthly from July 2002 to June 2003, at Caraguatatuba and Ubatuba Bay, Sao Paulo, Brazil. The animals sampled had their sex identified, were weighed and measured; their shells were identified, measured and weighed, and their internal volume determined. To relate the hermit crab's characteristics and the shells' variables, principal component analysis (PCA) and a regression tree were used. According to the PCA analysis, the three gastropod shells most frequently used by L. loxochelis varied in size. The regression tree successfully explained the relationship between the hermit crab's characteristics and the internal volume of the inhabited shell. It can be inferred that the relationship between the morphometry of an individual hermit crab and its shell is not straightforward and it is impossible to explain only on the basis of direct correlations between the body's and the shell's attributes. Several factors (such as the morphometry and the availability of the shell, environmental conditions and inter- and intraspecific competition) interact and seem to be taken into consideration by the hermit crabs when they choose a shell, resulting in the diversified pattern of shell occupancy shown here and elsewhere.
Resumo:
Programa de doctorado: Clínica e investigación terapéutica.
Resumo:
Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
Manual and low-tech well drilling techniques have potential to assist in reaching the United Nations' millennium development goal for water in sub-Saharan Africa. This study used publicly available geospatial data in a regression tree analysis to predict groundwater depth in the Zinder region of Niger to identify suitable areas for manual well drilling. Regression trees were developed and tested on a database for 3681 wells in the Zinder region. A tree with 17 terminal leaves provided a range of ground water depth estimates that were appropriate for manual drilling, though much of the tree's complexity was associated with depths that were beyond manual methods. A natural log transformation of groundwater depth was tested to see if rescaling dataset variance would result in finer distinctions for regions of shallow groundwater. The RMSE for a log-transformed tree with only 10 terminal leaves was almost half that of the untransformed 17 leaf tree for groundwater depths less than 10 m. This analysis indicated important groundwater relationships for commonly available maps of geology, soils, elevation, and enhanced vegetation index from the MODIS satellite imaging system.
Resumo:
Visual traces of iron reduction and oxidation are linked to the redox status of soils and have been used to characterise the quality of agricultural soils.We tested whether this feature could also be used to explain the spatial pattern of the natural vegetation of tidal habitats. If so, an easy assessment of the effect of rising sea level on tidal ecosystems would be possible. Our study was conducted at the salt marshes of the northern lagoon of Venice, which are strongly threatened by erosion and rising sea level and are part of the world heritage 'Venice and its lagoon'. We analysed the abundance of plant species at 255 sampling points along a land-sea gradient. In addition, we surveyed the redox morphology (presence/absence of red iron oxide mottles in the greyish topsoil horizons) of the soils and the presence of disturbances. We used indicator species analysis, correlation trees and multivariate regression trees to analyse relations between soil properties and plant species distribution. Plant species with known sensitivity to anaerobic conditions (e.g. Halimione portulacoides) were identified as indicators for oxic soils (showing iron oxide mottles within a greyish soil matrix). Plant species that tolerate a low redox potential (e.g. Spartina maritima) were identified as indicators for anoxic soils (greyish matrix without oxide mottles). Correlation trees and multivariate regression trees indicate the dominant role of the redox morphology of the soils in plant species distribution. In addition, the distance from the mainland and the presence of disturbances were identified as tree-splitting variables. The small-scale variation of oxygen availability plays a key role for the biodiversity of salt marsh ecosystems. Our results suggest that the redox morphology of salt marsh soils indicates the plant availability of oxygen. Thus, the consideration of this indicator may enable an understanding of the heterogeneity of biological processes in oxygen-limited systems and may be a sensitive and easy-to-use tool to assess human impacts on salt marsh ecosystems.
Resumo:
Survival, T-cell functions, and postmortem histopathology were studied in H-2 congenic strains of mice bearing H-2b, H-2k, and H-2d haplotypes. Males lived longer than females in all homozygous and heterozygous combinations except for H-2d homozygotes, which showed no differences between males and females. Association of heterozygosity with longer survival was observed only with H-2b/H-2b and H-2b/H-2d mice. Analysis using classification and regression trees (CART) showed that both males and females of H-2b homozygous and H-2k/H-2b mice had the shortest life-span of the strains studied. In histopathological analyses, lymphomas were noted to be more frequent in females, while hemangiosarcomas and hepatomas were more frequent in males. Lymphomas appeared earlier than hepatomas or hemangiosarcomas. The incidence of lymphomas was associated with the H-2 haplotype--e.g., H-2b homozygous mice had more lymphomas than did mice of the H-2d haplotype. More vigorous T-cell function was maintained with age (27 months) in H-2d, H-2b/H-2d, and H-2d/H-2k mice as compared with H-2b, H-2k, and H-2b/H-2k mice, which showed a decline of T-cell responses with age.
Resumo:
Areas of the landscape that are priorities for conservation should be those that are both vulnerable to threatening processes and that if lost or degraded, will result in conservation targets being compromised. While much attention is directed towards understanding the patterns of biodiversity, much less is given to determining the areas of the landscape most vulnerable to threats. We assessed the relative vulnerability of remaining areas of native forest to conversion to plantations in the ecologically significant temperate rainforest region of south central Chile. The area of the study region is 4.2 million ha and the extent of plantations is approximately 200000 ha. First, the spatial distribution of native forest conversion to plantations was determined. The variables related to the spatial distribution of this threatening process were identified through the development of a classification tree and the generation of a multivariate. spatially explicit, statistical model. The model of native forest conversion explained 43% of the deviance and the discrimination ability of the model was high. Predictions were made of where native forest conversion is likely to occur in the future. Due to patterns of climate, topography, soils and proximity to infrastructure and towns, remaining forest areas differ in their relative risk of being converted to plantations. Another factor that may increase the vulnerability of remaining native forest in a subset of the study region is the proposed construction of a highway. We found that 90% of the area of existing plantations within this region is within 2.5 km of roads. When the predictions of native forest conversion were recalculated accounting for the construction of this highway, it was found that: approximately 27000 ha of native forest had an increased probability of conversion. The areas of native forest identified to be vulnerable to conversion are outside of the existing reserve network. (C) 2004 Elsevier Ltd. All tights reserved.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines.
Resumo:
Um sistema de predição de alarmes com a finalidade de auxiliar a implantação de uma política de manutenção preditiva industrial e de constituir-se em uma ferramenta gerencial de apoio à tomada de decisão é proposto neste trabalho. O sistema adquire leituras de diversos sensores instalados na planta, extrai suas características e avalia a saúde do equipamento. O diagnóstico e prognóstico implica a classificação das condições de operação da planta. Técnicas de árvores de regressão e classificação não-supervisionada são utilizadas neste artigo. Uma amostra das medições de 73 variáveis feitas por sensores instalados em uma usina hidrelétrica foi utilizada para testar e validar a proposta. As medições foram amostradas em um período de 15 meses.
Resumo:
Carbon (C) sequestration in soils is a means for increasing soil organic carbon (SOC) stocks and is a potential tool for climate change mitigation. One recommended management practice to increase SOC stocks is nitrogen (N) fertilisation, however examples of positive, negative or null SOC effects in response to N addition exist. We evaluated the relative importance of plant molecular structure, soil physical properties and soil ecological stoichiometry in explaining the retention of SOC with and without N addition. We tracked the transformation of 13C pulse-labelled buffel grass (Cenchrus ciliaris L.), wheat (Triticum aestivum L.) and lucerne (Medicago sativa L.) material to the <53 μm silt + clay soil organic C fraction, hereafter named “humus”, over 365-days of incubation in four contrasting agricultural soils, with and without urea-N addition. We hypothesised that: a) humus retention would be soil and litter dependent; b) humus retention would be litter independent once litter C:N ratios were standardised with urea-N addition; and c) humus retention would be improved by urea-N addition. Two and three-way factorial analysis of variance indicated that 13C humus was consistently soil and litter dependent, even when litter C:N ratios were standardised, and that the effect of urea-N addition on 13C humus was also soil and litter dependent. A boosted regression analysis of the effect of 44 plant and soil explanatory variables demonstrated that soil biological and chemical properties had the greatest relative influence on 13C humus. Regression tree analyses demonstrated that the greatest gains in 13C humus occurred in soils of relatively low total organic C, dissolved organic C and microbial biomass C (MBC), or with a combination of relatively high MBC and low C:N ratio. The greatest losses in 13C humus occurred in soils with a combination of relatively high MBC and low total N or increasing C:N ratio. We conclude that soil variables involved in soil ecological stoichiometry exert a greater relative influence on incorporating organic matter as humus compared to plant molecular structure and soil physical properties. Furthermore, we conclude that the effect of N fertilisation on humus retention is dependent upon soil ecological stoichiometry.