883 resultados para REGRESSION TREES
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The present study aimed to comparatively verify the relation between the hermit crabs and the shells they use in two populations of Loxopagurus loxochelis. Samples were collected monthly from July 2002 to June 2003, at Caraguatatuba and Ubatuba Bay, Sao Paulo, Brazil. The animals sampled had their sex identified, were weighed and measured; their shells were identified, measured and weighed, and their internal volume determined. To relate the hermit crab's characteristics and the shells' variables, principal component analysis (PCA) and a regression tree were used. According to the PCA analysis, the three gastropod shells most frequently used by L. loxochelis varied in size. The regression tree successfully explained the relationship between the hermit crab's characteristics and the internal volume of the inhabited shell. It can be inferred that the relationship between the morphometry of an individual hermit crab and its shell is not straightforward and it is impossible to explain only on the basis of direct correlations between the body's and the shell's attributes. Several factors (such as the morphometry and the availability of the shell, environmental conditions and inter- and intraspecific competition) interact and seem to be taken into consideration by the hermit crabs when they choose a shell, resulting in the diversified pattern of shell occupancy shown here and elsewhere.
Resumo:
Programa de doctorado: Clínica e investigación terapéutica.
Resumo:
Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk.
Resumo:
Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.
Resumo:
Conservation and monitoring of forest biodiversity requires reliable information about forest structure and composition at multiple spatial scales. However, detailed data about forest habitat characteristics across large areas are often incomplete due to difficulties associated with field sampling methods. To overcome this limitation we employed a nationally available light detection and ranging (LiDAR) remote sensing dataset to develop variables describing forest landscape structure across a large environmental gradient in Switzerland. Using a model species indicative of structurally rich mountain forests (hazel grouse Bonasa bonasia), we tested the potential of such variables to predict species occurrence and evaluated the additional benefit of LiDAR data when used in combination with traditional, sample plot-based field variables. We calibrated boosted regression trees (BRT) models for both variable sets separately and in combination, and compared the models’ accuracies. While both field-based and LiDAR models performed well, combining the two data sources improved the accuracy of the species’ habitat model. The variables retained from the two datasets held different types of information: field variables mostly quantified food resources and cover in the field and shrub layer, LiDAR variables characterized heterogeneity of vegetation structure which correlated with field variables describing the understory and ground vegetation. When combined with data on forest vegetation composition from field surveys, LiDAR provides valuable complementary information for encompassing species niches more comprehensively. Thus, LiDAR bridges the gap between precise, locally restricted field-data and coarse digital land cover information by reliably identifying habitat structure and quality across large areas.
Resumo:
Manual and low-tech well drilling techniques have potential to assist in reaching the United Nations' millennium development goal for water in sub-Saharan Africa. This study used publicly available geospatial data in a regression tree analysis to predict groundwater depth in the Zinder region of Niger to identify suitable areas for manual well drilling. Regression trees were developed and tested on a database for 3681 wells in the Zinder region. A tree with 17 terminal leaves provided a range of ground water depth estimates that were appropriate for manual drilling, though much of the tree's complexity was associated with depths that were beyond manual methods. A natural log transformation of groundwater depth was tested to see if rescaling dataset variance would result in finer distinctions for regions of shallow groundwater. The RMSE for a log-transformed tree with only 10 terminal leaves was almost half that of the untransformed 17 leaf tree for groundwater depths less than 10 m. This analysis indicated important groundwater relationships for commonly available maps of geology, soils, elevation, and enhanced vegetation index from the MODIS satellite imaging system.
Resumo:
Visual traces of iron reduction and oxidation are linked to the redox status of soils and have been used to characterise the quality of agricultural soils.We tested whether this feature could also be used to explain the spatial pattern of the natural vegetation of tidal habitats. If so, an easy assessment of the effect of rising sea level on tidal ecosystems would be possible. Our study was conducted at the salt marshes of the northern lagoon of Venice, which are strongly threatened by erosion and rising sea level and are part of the world heritage 'Venice and its lagoon'. We analysed the abundance of plant species at 255 sampling points along a land-sea gradient. In addition, we surveyed the redox morphology (presence/absence of red iron oxide mottles in the greyish topsoil horizons) of the soils and the presence of disturbances. We used indicator species analysis, correlation trees and multivariate regression trees to analyse relations between soil properties and plant species distribution. Plant species with known sensitivity to anaerobic conditions (e.g. Halimione portulacoides) were identified as indicators for oxic soils (showing iron oxide mottles within a greyish soil matrix). Plant species that tolerate a low redox potential (e.g. Spartina maritima) were identified as indicators for anoxic soils (greyish matrix without oxide mottles). Correlation trees and multivariate regression trees indicate the dominant role of the redox morphology of the soils in plant species distribution. In addition, the distance from the mainland and the presence of disturbances were identified as tree-splitting variables. The small-scale variation of oxygen availability plays a key role for the biodiversity of salt marsh ecosystems. Our results suggest that the redox morphology of salt marsh soils indicates the plant availability of oxygen. Thus, the consideration of this indicator may enable an understanding of the heterogeneity of biological processes in oxygen-limited systems and may be a sensitive and easy-to-use tool to assess human impacts on salt marsh ecosystems.
Resumo:
Survival, T-cell functions, and postmortem histopathology were studied in H-2 congenic strains of mice bearing H-2b, H-2k, and H-2d haplotypes. Males lived longer than females in all homozygous and heterozygous combinations except for H-2d homozygotes, which showed no differences between males and females. Association of heterozygosity with longer survival was observed only with H-2b/H-2b and H-2b/H-2d mice. Analysis using classification and regression trees (CART) showed that both males and females of H-2b homozygous and H-2k/H-2b mice had the shortest life-span of the strains studied. In histopathological analyses, lymphomas were noted to be more frequent in females, while hemangiosarcomas and hepatomas were more frequent in males. Lymphomas appeared earlier than hepatomas or hemangiosarcomas. The incidence of lymphomas was associated with the H-2 haplotype--e.g., H-2b homozygous mice had more lymphomas than did mice of the H-2d haplotype. More vigorous T-cell function was maintained with age (27 months) in H-2d, H-2b/H-2d, and H-2d/H-2k mice as compared with H-2b, H-2k, and H-2b/H-2k mice, which showed a decline of T-cell responses with age.
Resumo:
No estudo das comunidades florestais, estabelecer a importância relativa dos fatores que definem a composição e a distribuição das espécies é um desafio. Em termos de gradientes ambientais o estudo das respostas das espécies arbóreas são essenciais para a compreensão dos processos ecológicos e decisões de conservação. Neste sentido, para contribuir com a elucidação dos processos ecológicos nas principais formações florestais do Estado de São Paulo (Floresta Ombrófila Densa de Terras Baixas, Floresta Ombrófila Densa Submontana, Floresta Estacional Semidecidual e Savana Florestada) este trabalho objetivou responder as seguintes questões: (I) a composição florística e a abundância das espécies arbóreas, em cada unidade fitogeográfica, variam conforme o gradiente edáfico e topográfico?; (II) características do solo e topografia podem influenciar na previsibilidade de ocorrência de espécies arbóreas de ampla distribuição em diferentes tipos vegetacionais? (III) existe relação entre o padrão de distribuição espacial de espécies arbóreas e os parâmetros do solo e topografia? O trabalho foi realizado em parcelas alocadas em unidades de conservação (UC) que apresentaram trechos representativos, em termos de conservação e tamanho, das quatro principais formações florestais presentes no Estado de São Paulo. Em cada UC foram contabilizados os indivíduos arbóreos (CAP ≥ 15 cm), topografia, dados de textura e atributos químicos dos solos em uma parcela de 10,24 ha, subdividida em 256 subparcelas. Análises de correspodência canônica foram aplicadas para estabelecer a correspondência entre a abundância das espécies e o gradiente ambiental (solo e topografia). O método TWINSPAN modificado foi aplicado ao diagrama de ordenação da CCA para avaliar a influência das variáveis ambientais (solo e topografia) na composição de espécies. Árvores de regressão \"ampliadas\" (BRT) foram ajustadas para a predição da ocorrência das espécies segundo as variáveis de solo e topografia. O índice de Getis-Ord (G) foi utilizado para determinar a autocorrelação espacial das variáveis ambientais utilizadas nos modelos de predição da ocorrência das espécies. Nas unidades fitogeográficas analisadas, a correspondência entre o gradiente ambiental (solo e topografia) e a abundância das espécies foi significativa, especialmente na Savana Florestada onde observou-se a maior relação. O solo e a topografia também se relacionaram com a semelhança na composição florística das subparcelas, com exceção da Floresta Estacional Semicidual (EEC). As principais variáveis de solo e topografia relacionadas a flora em cada UC foram: (1) Na Floresta Ombrófila Densa de Terras Baixas (PEIC) - teor de alumínio na camada profunda (Al (80-100 cm)) que pode refletir os teor de Al na superfície, acidez do solo (pH(H2O) (5-25 cm)) e altitude, que delimitou as áreas alagadas; (2) Na Floresta Ombrófila Densa Submontana (PECB) - altitude, fator que, devido ao relevo acidentado, influencia a temperatura e incidência de sol no sub-bosque; (3) Na Savana Florestada (EEA) - fertilidade, tolerância ao alumínio e acidez do solo. Nos modelos de predição BRT, as variáveis químicas dos solos foram mais importantes do que a textura, devido à pequena variação deste atributo no solo nas áreas amostradas. Dentre as variáveis químicas dos solos, a capacidade de troca catiônica foi utilizada para prever a ocorrência das espécies nas quatro formações florestais, sendo particularmente importante na camada mais profunda do solo da Floresta Ombrófila Densa de Terras Baixas (PEIC). Quanto à topografia, a altitude foi inserida na maioria dos modelos e apresentou diferentes influências sobre as áreas de estudo. De modo geral, para presença das espécies de ampla distribuição observou-se uma mesma tendência quando à associação com os atributos dos solos, porém com amplitudes dos descritores edáficos que variaram de acordo com a área de estudo. A ocorrência de Guapira opposita e Syagrus romanzoffiana, cujo padrão variou conforme a escala, foi explicada por variáveis com padrões espaciais agregados que somaram entre 30% e 50% de importância relativa no modelo BRT. A presença de A. anthelmia, cujo padrão também apresentou certo nível de agregação, foi associada apenas a uma variável com padrão agregado, a altitude (21%), que pode ter exercido grande influência na distribuição da espécie ao delimitar áreas alagadas. T. guianensis se associou a variáveis ambientais preditoras com padrão espacial agregado que somaram cerca de 70% de importância relativa, o que deve ter sido suficiente para estabelecer o padrão agregado em todas as escalas. No entanto, a influência dos fatores ambientais no padrão de distribuição da espécie não depende apenas do ótimo ambiental da espécie, mas um resultado da interação espécie-ambiente. Concluiu-se que: (I) características edáficas e topográficas explicaram uma pequena parcela da composição florística, em cada unidade fitogeográfica, embora a ocorrência de algumas espécies tenha se associado ao gradiente edáfico e topográfico; (II) a partir de características dos solos e da topografia foi possível prever a presença de espécies arbóreas, que apresentaram particularidades em relação a sua associação com o solo de cada fitofisionomia; (III) a partir de associações descritivas o solo e a topografia influenciam o padrão de distribuição espacial das espécies, na proporção em que contribuem para a presença das mesmas.
Resumo:
Areas of the landscape that are priorities for conservation should be those that are both vulnerable to threatening processes and that if lost or degraded, will result in conservation targets being compromised. While much attention is directed towards understanding the patterns of biodiversity, much less is given to determining the areas of the landscape most vulnerable to threats. We assessed the relative vulnerability of remaining areas of native forest to conversion to plantations in the ecologically significant temperate rainforest region of south central Chile. The area of the study region is 4.2 million ha and the extent of plantations is approximately 200000 ha. First, the spatial distribution of native forest conversion to plantations was determined. The variables related to the spatial distribution of this threatening process were identified through the development of a classification tree and the generation of a multivariate. spatially explicit, statistical model. The model of native forest conversion explained 43% of the deviance and the discrimination ability of the model was high. Predictions were made of where native forest conversion is likely to occur in the future. Due to patterns of climate, topography, soils and proximity to infrastructure and towns, remaining forest areas differ in their relative risk of being converted to plantations. Another factor that may increase the vulnerability of remaining native forest in a subset of the study region is the proposed construction of a highway. We found that 90% of the area of existing plantations within this region is within 2.5 km of roads. When the predictions of native forest conversion were recalculated accounting for the construction of this highway, it was found that: approximately 27000 ha of native forest had an increased probability of conversion. The areas of native forest identified to be vulnerable to conversion are outside of the existing reserve network. (C) 2004 Elsevier Ltd. All tights reserved.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines.