54 resultados para random forest data analysis
Effects of roads, topography, and land use on forest cover dynamics in the Brazilian Atlantic Forest
Resumo:
Roads and topography can determine patterns of land use and distribution of forest cover, particularly in tropical regions. We evaluated how road density, land use, and topography affected forest fragmentation, deforestation and forest regrowth in a Brazilian Atlantic Forest region near the city of Sao Paulo. We mapped roads and land use/land cover for three years (1962, 1981 and 2000) from historical aerial photographs, and summarized the distribution of roads, land use/land cover and topography within a grid of 94 non-overlapping 100 ha squares. We used generalized least squares regression models for data analysis. Our models showed that forest fragmentation and deforestation depended on topography, land use and road density, whereas forest regrowth depended primarily on land use. However, the relationships between these variables and forest dynamics changed in the two studied periods; land use and slope were the strongest predictors from 1962 to 1981, and past (1962) road density and land use were the strongest predictors for the following period (1981-2000). Roads had the strongest relationship with deforestation and forest fragmentation when the expansions of agriculture and buildings were limited to already deforested areas, and when there was a rapid expansion of development, under influence of Sao Paulo city. Furthermore, the past(1962)road network was more important than the recent road network (1981) when explaining forest dynamics between 1981 and 2000, suggesting a long-term effect of roads. Roads are permanent scars on the landscape and facilitate deforestation and forest fragmentation due to increased accessibility and land valorization, which control land-use and land-cover dynamics. Topography directly affected deforestation, agriculture and road expansion, mainly between 1962 and 1981. Forest are thus in peril where there are more roads, and long-term conservation strategies should consider ways to mitigate roads as permanent landscape features and drivers facilitators of deforestation and forest fragmentation. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper a new parametric method to deal with discrepant experimental results is developed. The method is based on the fit of a probability density function to the data. This paper also compares the characteristics of different methods used to deduce recommended values and uncertainties from a discrepant set of experimental data. The methods are applied to the (137)Cs and (90)Sr published half-lives and special emphasis is given to the deduced confidence intervals. The obtained results are analyzed considering two fundamental properties expected from an experimental result: the probability content of confidence intervals and the statistical consistency between different recommended values. The recommended values and uncertainties for the (137)Cs and (90)Sr half-lives are 10,984 (24) days and 10,523 (70) days, respectively. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
When missing data occur in studies designed to compare the accuracy of diagnostic tests, a common, though naive, practice is to base the comparison of sensitivity, specificity, as well as of positive and negative predictive values on some subset of the data that fits into methods implemented in standard statistical packages. Such methods are usually valid only under the strong missing completely at random (MCAR) assumption and may generate biased and less precise estimates. We review some models that use the dependence structure of the completely observed cases to incorporate the information of the partially categorized observations into the analysis and show how they may be fitted via a two-stage hybrid process involving maximum likelihood in the first stage and weighted least squares in the second. We indicate how computational subroutines written in R may be used to fit the proposed models and illustrate the different analysis strategies with observational data collected to compare the accuracy of three distinct non-invasive diagnostic methods for endometriosis. The results indicate that even when the MCAR assumption is plausible, the naive partial analyses should be avoided.
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
OBJETIVO: Este artigo analisa e compara os dados de consumo alimentar de duas populações ribeirinhas da Amazônia vivendo em ecossistemas contrastantes de floresta tropical: a várzea estacional e a floresta de terra firme. MÉTODOS: Foi estudado o consumo alimentar de 11 unidades domésticas na várzea (Ilha de Ituqui, Município de Santarém) e 17 na terra firme (Floresta Nacional de Caxiuanã, Municípios de Melgaço e Portel). O método utilizado foi o recordatório de 24 horas. As análises estatísticas foram executadas com o auxílio do programa Statistical Package for Social Sciences 12.0. RESULTADOS: Em ambos os ecossistemas, os resultados confirmam a centralidade do pescado e da mandioca na dieta local. Porém, a contribuição de outros itens alimentares secundários, tais como o açaí (em Caxiuanã) e o leite in natura (em Ituqui), também foi significante. Além disso, o açúcar revelou ser uma fonte de energia confiável para enfrentar as flutuações sazonais dos recursos naturais. Parece haver ainda uma maior contribuição energética dos peixes para a dieta de Ituqui, provavelmente em função da maior produtividade dos rios e lagos da várzea em relação à terra firme. Por fim, Ituqui revelou uma maior dependência de itens alimentares comprados, enquanto Caxiuanã mostrou estar ainda bastante vinculada à agricultura e às redes locais de troca. CONCLUSÃO: Além dos resultados confirmarem a importância do pescado e da mandioca, também mostraram que produtos industrializados, como o açúcar, têm um papel importante nas dietas, podendo apontar para tendências no consumo alimentar relacionadas com a atual transição nutricional e com a erosão, em diferentes níveis, dos sistemas de subsistência locais.
Resumo:
Abelhas das orquídeas (Apini, Euglossina) apresentam distribuição principalmente Neotropical, com cerca de 200 espécies e cinco gêneros descritos. Muitos levantamentos locais de fauna estão disponíveis na literatura, mas estudos comparativos sobre a composição e distribuição dos Euglossina são ainda escassos. O objetivo deste estudo é analisar os dados disponíveis de 29 assembleias a fim de entender os padrões gerais de distribuição espacial nas áreas amostradas ao longo do Neotrópico. Métodos de ordenação (DCA e NMDS) foram utilizados para descrever os agrupamentos de assembleias de acordo com as ocorrências de abelhas das orquídeas. As localidades de florestas da América Central e da Amazônia formaram grupos coesos em ambas as análises, enquanto as localidades de Mata Atlântica ficaram mais dispersas nos gráficos. Localidades na margem leste da Amazônia aparecem como áreas de transição características entre esta sub-região e a Mata Atlântica. As análises de variância entre o primeiro eixo da DCA e variáveis selecionadas apresentaram valores significantes quanto à influência dos gradientes de latitude, longitude e precipitação, bem como das sub-regiões biogeográficas nos agrupamentos das assembleias. O padrão geral encontrado é congruente com os padrões biogeográficos previamente propostos para a região Neotropical. Os resultados do DCA auxiliam ainda a identificar, de forma independente, os elementos das faunas de cada uma das formações vegetais estudadas.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Devido à crescente expansão da leishmaniose visceral americana (LVA) no Brasil, o presente estudo teve como objetivo identificar as espécies de flebotomíneos em áreas vulneráveis à transmissão dessa parasitose, bem como em outras sem qualquer informação sobre a presença desses dípteros no Paraná. As coletas de flebotomíneos foram realizadas em 46 localidades distribuídas em 37 municípios do Paraná, no período de março de 2004 a novembro de 2005. Em cada uma das localidades foram instaladas armadilhas de Falcão, durante três noites consecutivas, em vegetação natural e ambientes antrópicos (intra e peridomicílio). Ocasionalmente, foram instalas armadilhas de Shannon e feitas inspeções de paredes e aspiração em domicílio, peridomicílio e extradomicílio. O tratamento dos dados baseou-se na estimativa das freqüências e abundância das espécies, segundo cinco regiões de distintas paisagens originais. Coletaram-se 38.662 flebotomíneos de 23 espécies. Predominaram Nyssomyia neivai (Pinto) (75.6%), Ny. whitmani (Antunes & Coutinho) (10.1%), Migonemyia migonei (França) (7.8%), Expapillata firmatoi (Barreto et al.) (2.1%) and Pintomyia fischeri (Pinto) (1,6%); representando juntas 97,2% dos flebotomíneos coletados. Lutzomyia longipalpis (Lutz & Neiva) o principal vetor da LVA não foi encontrado. No entanto, capturou-se Lu. gaminarai (Cordero et al.), cujas fêmeas são morfologicamente semelhantes às de Lu. longipalpis. As espécies mais freqüentes e abundantes têm sido apontadas como vetores da leishmaniose tegumentar no Paraná e em outras áreas das Regiões Sudeste e Sul do Brasil. A presença de Lu. gaminarai no Paraná suscita a necessidade de estudos do seu comportamento, inclusive em relação à sua competência vetorial do agente da leishmaniose visceral.
Resumo:
Background: Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods: Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results: Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion: To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.
Resumo:
Background: Dermatomyositis (DM) and polymyositis (PM) are rare systemic autoimmune rheumatic diseases with high fatality rates. There have been few population-based mortality studies of dermatomyositis and polymyositis in the world, and none have been conducted in Brazil. The objective of the present study was to employ multiple-cause of-death methodology in the analysis of trends in mortality related to dermatomyositis and polymyositis in the state of Sao Paulo, Brazil, between 1985 and 2007. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which DM or PM was listed as a cause of death. The variables sex, age and underlying, associated or total mentions of causes of death were studied using mortality rates, proportions and historical trends. Statistical analysis were performed by chi-square and H Kruskal-Wallis tests, variance analysis and linear regression. A p value less than 0.05 was regarded as significant. Results: Over a 23-year period, there were 318 DM-related deaths and 316 PM-related deaths. Overall, DM/PM was designated as an underlying cause in 55.2% and as an associated cause in 44.8%; among 634 total deaths females accounted for 71.5%. During the study period, age-and gender-adjusted DM mortality rates did not change significantly, although PM as an underlying cause and total mentions of PM trended lower (p < 0.05). The mean ages at death were 47.76 +/- 20.81 years for DM and 54.24 +/- 17.94 years for PM (p = 0.0003). For DM/PM, respectively, as underlying causes, the principal associated causes of death were as follows: pneumonia (in 43.8%/33.5%); respiratory failure (in 34.4%/32.3%); interstitial pulmonary diseases and other pulmonary conditions (in 28.9%/17.6%); and septicemia (in 22.8%/15.9%). For DM/PM, respectively, as associated causes, the following were the principal underlying causes of death: respiratory disorders (in 28.3%/26.0%); circulatory disorders (in 17.4%/20.5%); neoplasms (in 16.7%/13.7%); infectious and parasitic diseases (in 11.6%/9.6%); and gastrointestinal disorders (in 8.0%/4.8%). Of the 318 DM-related deaths, 36 involved neoplasms, compared with 20 of the 316 PM-related deaths (p = 0.03). Conclusions: Our study using multiple cause of deaths found that DM/PM were identified as the underlying cause of death in only 55.2% of the deaths, indicating that both diseases were underestimated in the primary mortality statistics. We observed a predominance of deaths in women and in older individuals, as well as a trend toward stability in the mortality rates. We have confirmed that the risk of death is greater when either disease is accompanied by neoplasm, albeit to lesser degree in individuals with PM. The investigation of the underlying and associated causes of death related to DM/PM broaden the knowledge of the natural history of both diseases and could help integrate mortality data for use in the evaluation of control measures for DM/PM.
Resumo:
Recurrences are close returns of a given state in a time series, and can be used to identify different dynamical regimes and other related phenomena, being particularly suited for analyzing experimental data. In this work, we use recurrence quantification analysis to investigate dynamical patterns in scalar data series obtained from measurements of floating potential and ion saturation current at the plasma edge of the Tokamak Chauffage Alfveacuten Breacutesilien [R. M. O. Galva approximate to o , Plasma Phys. Controlled Fusion 43, 1181 (2001)]. We consider plasma discharges with and without the application of radial electric bias, and also with two different regimes of current ramp. Our results indicate that biasing improves confinement through destroying highly recurrent regions within the plasma column that enhance particle and heat transport.
Resumo:
Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.
Resumo:
This paper aims to find relations between the socioeconomic characteristics, activity participation, land use patterns and travel behavior of the residents in the Sao Paulo Metropolitan Area (SPMA) by using Exploratory Multivariate Data Analysis (EMDA) techniques. The variables influencing travel pattern choices are investigated using: (a) Cluster Analysis (CA), grouping and characterizing the Traffic Zones (17), proposing the independent variable called Origin Cluster and, (b) Decision Tree (DT) to find a priori unknown relations among socioeconomic characteristics, land use attributes of the origin TZ and destination choices. The analysis was based on the origin-destination home-interview survey carried out in SPMA in 1997. The DT application revealed the variables of greatest influence on the travel pattern choice. The most important independent variable considered by DT is car ownership, followed by the Use of Transportation ""credits"" for Transit tariff, and, finally, activity participation variables and Origin Cluster. With these results, it was possible to analyze the influence of a family income, car ownership, position of the individual in the family, use of transportation ""credits"" for transit tariff (mainly for travel mode sequence choice), activities participation (activity sequence choice) and Origin Cluster (destination/travel distance choice). (c) 2010 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.
Resumo:
Survival models involving frailties are commonly applied in studies where correlated event time data arise due to natural or artificial clustering. In this paper we present an application of such models in the animal breeding field. Specifically, a mixed survival model with a multivariate correlated frailty term is proposed for the analysis of data from over 3611 Brazilian Nellore cattle. The primary aim is to evaluate parental genetic effects on the trait length in days that their progeny need to gain a commercially specified standard weight gain. This trait is not measured directly but can be estimated from growth data. Results point to the importance of genetic effects and suggest that these models constitute a valuable data analysis tool for beef cattle breeding.