936 resultados para Extremely random forest
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Faculdade de Economia, Administração e Contabilidade, Programa de Pós-Graduação em Administração, 2016.
Resumo:
En la actualidad, existen un gran número de investigaciones que usan técnicas de aprendizaje automático basadas en árboles de decisión. Como evolución de dichos trabajos, se han desarrollado métodos que usan Multiclasificadores (Random forest, Boosting, Bagging) que resuelven los mismos problemas abordados con árboles de decisión simples, aumentando el porcentaje de acierto. El ámbito de los problemas resueltos tradicionalmente por dichas técnicas es muy variado aunque destaca la bio-informática. En cualquier caso, la clasificación siempre puede ser consultada a un experto considerándose su respuesta como correcta. Existen problemas donde un experto en la materia no siempre acierta. Un ejemplo, pueden ser las quinielas (1X2). Donde podemos observar que un conocimiento del dominio del problema aumenta el porcentaje de aciertos, sin embargo, predecir un resultado erróneo es muy posible. El motivo es que el número de factores que influyen en un resultado es tan grande que, en muchas ocasiones, convierten la predicción en un acto de azar. En este trabajo pretendemos encontrar un multiclasificador basado en los clasificadores simples más estudiados como pueden ser el Perceptrón Multicapa o Árboles de Decisión con el porcentaje de aciertos más alto posible. Con tal fin, se van a estudiar e implementar una serie de configuraciones de clasificadores propios junto a multiclasificadores desarrollados por terceros. Otra línea de estudio son los propios datos, es decir, el conjunto de entrenamiento. Mediante un estudio del dominio del problema añadiremos nuevos atributos que enriquecen la información que disponemos de cada resultado intentando imitar el conocimiento en el que se basa un experto. Los desarrollos descritos se han realizado en R. Además, se ha realizado una aplicación que permite entrenar un multiclasificador (bien de los propios o bien de los desarrollados por terceros) y como resultado obtenemos la matriz de confusión junto al porcentaje de aciertos. En cuanto a resultados, obtenemos porcentajes de aciertos entre el 50% y el 55%. Por encima del azar y próximos a los resultados de los expertos.
Resumo:
Security defects are common in large software systems because of their size and complexity. Although efficient development processes, testing, and maintenance policies are applied to software systems, there are still a large number of vulnerabilities that can remain, despite these measures. Some vulnerabilities stay in a system from one release to the next one because they cannot be easily reproduced through testing. These vulnerabilities endanger the security of the systems. We propose vulnerability classification and prediction frameworks based on vulnerability reproducibility. The frameworks are effective to identify the types and locations of vulnerabilities in the earlier stage, and improve the security of software in the next versions (referred to as releases). We expand an existing concept of software bug classification to vulnerability classification (easily reproducible and hard to reproduce) to develop a classification framework for differentiating between these vulnerabilities based on code fixes and textual reports. We then investigate the potential correlations between the vulnerability categories and the classical software metrics and some other runtime environmental factors of reproducibility to develop a vulnerability prediction framework. The classification and prediction frameworks help developers adopt corresponding mitigation or elimination actions and develop appropriate test cases. Also, the vulnerability prediction framework is of great help for security experts focus their effort on the top-ranked vulnerability-prone files. As a result, the frameworks decrease the number of attacks that exploit security vulnerabilities in the next versions of the software. To build the classification and prediction frameworks, different machine learning techniques (C4.5 Decision Tree, Random Forest, Logistic Regression, and Naive Bayes) are employed. The effectiveness of the proposed frameworks is assessed based on collected software security defects of Mozilla Firefox.
Resumo:
The main purpose of this study is to evaluate the best set of features that automatically enables the identification of argumentative sentences from unstructured text. As corpus, we use case laws from the European Court of Human Rights (ECHR). Three kinds of experiments are conducted: Basic Experiments, Multi Feature Experiments and Tree Kernel Experiments. These experiments are basically categorized according to the type of features available in the corpus. The features are extracted from the corpus and Support Vector Machine (SVM) and Random Forest are the used as Machine learning algorithms. We achieved F1 score of 0.705 for identifying the argumentative sentences which is quite promising result and can be used as the basis for a general argument-mining framework.
Resumo:
O levantamento e a análise da espacialização dos atributos do solo através de ferramentas de geoestatística são fundamentais para que cada hectare de terra seja cultivado segundo as suas reais aptidões. As imagens de radar de abertura sintética (SAR) têm um grande potencial para a estimação de umidade do solo e, desta forma, estes sensores podem auxiliar no mapeamento de propriedades físicas e físico-hídricas dos solos. O objetivo geral deste estudo foi avaliar o potencial de utilização de imagens de radar (micro-ondas) ALOS/PALSAR na identificação de solos em uma área da Formação Botucatu, dominada por solos de textura arenosa e média no município de Mineiros - GO. A área tem aproximadamente 946 ha, com o relevo da região variando de plano a suave ondulado e geologia da área é composta basicamente, por Arenitos da Formação Botucatu. No presente estudo foram amostrados 84 pontos para calibração e 25 pontos para validação, coletados nas profundidades de 0-20 cm e 60-80 cm. As amostras de solo analisadas para a determinação de areia, silte, argila, capacidade de campo (CC), ponto de murcha permanente (PMP) e água total disponível (AD). Para o desenvolvimento do trabalho foram adquiridas imagens de cinco datas e diferentes polarizações, totalizando 14 imagens, que foram processadas para a correção geométrica e correção radiométrica, utilizando o MDE. Também foram gerados covariáveis dos atributos do terreno: elevação (ELEV), declividade (DECLIV), posição relativa da declividade (PR-DECL), distância vertical do canal de drenagem (DVCD), fator-ls (FATOR-LS) e distância euclidiana (D-EUCL). A predição dos atributos do solo foi realizada utilizando os métodos Random Forest (RF) e Random Forest Krigagem (RFK), tendo como covariáveis preditoras as imagens de radar e os atributos do terreno. O processamento das imagens do radar ALOS/PALSAR possibilitou as correções geométrica e radiométrica, transformando os dados em unidades de coeficiente de retroespalhamento (?º) corrigidos pelo modelo digital de elevação (MDE). As imagens adquiridas representaram de forma ampla as variações de ?º ocorridos em diferentes datas. Os solos da área de estudo são predominantemente arenosos, com a maioria dos pontos amostrados classificados como NEOSSOLOS QUARTZARÊNICOS, seguidos dos LATOSSOLOS. Os modelos RF empregados para a predição dos atributos físicos e físico-hídricos dos solos proporcionaram a análise da contribuição das covariáveis preditoras. Os atributos do terreno que exerceram maior influência na predição dos atributos estudados estão relacionados à elevação. As imagens de 03/05/2009 (HH1, VV1, HV1 e VH1) e 26/09/2010 (HH3 e HV3), obtidas em períodos mais secos, tiveram melhores correlações com os atributos do solo. As análises dos semivariogramas dos resíduos da predição dos modelos RF demonstraram maior dependência espacial na camada de 60 a 80 cm. A abordagem da Krigagem somada ao modelo RF contribuíram para a melhoria da predição dos atributos areia, argila, CC e PMP. O uso de imagens de radar ALOS/PALSAR e atributos do terreno como covariáveis em modelos RFK mostrou potencial para estimar os atributos físicos (areia e argila) e físico-hídricos (CC e PMP), que podem auxiliar no mapeamento de solos associados aos materiais de origem da Formação Botucatu.
Resumo:
Markov random fields (MRF) are popular in image processing applications to describe spatial dependencies between image units. Here, we take a look at the theory and the models of MRFs with an application to improve forest inventory estimates. Typically, autocorrelation between study units is a nuisance in statistical inference, but we take an advantage of the dependencies to smooth noisy measurements by borrowing information from the neighbouring units. We build a stochastic spatial model, which we estimate with a Markov chain Monte Carlo simulation method. The smooth values are validated against another data set increasing our confidence that the estimates are more accurate than the originals.
Resumo:
The research on multiple classifiers systems includes the creation of an ensemble of classifiers and the proper combination of the decisions. In order to combine the decisions given by classifiers, methods related to fixed rules and decision templates are often used. Therefore, the influence and relationship between classifier decisions are often not considered in the combination schemes. In this paper we propose a framework to combine classifiers using a decision graph under a random field model and a game strategy approach to obtain the final decision. The results of combining Optimum-Path Forest (OPF) classifiers using the proposed model are reported, obtaining good performance in experiments using simulated and real data sets. The results encourage the combination of OPF ensembles and the framework to design multiple classifier systems. © 2011 Springer-Verlag.
Resumo:
Some machine learning methods do not exploit contextual information in the process of discovering, describing and recognizing patterns. However, spatial/temporal neighboring samples are likely to have same behavior. Here, we propose an approach which unifies a supervised learning algorithm - namely Optimum-Path Forest - together with a Markov Random Field in order to build a prior model holding a spatial smoothness assumption, which takes into account the contextual information for classification purposes. We show its robustness for brain tissue classification over some images of the well-known dataset IBSR. © 2013 Springer-Verlag.
Resumo:
A permanent 2 ha (200 m x 100 m) plot was established for long-term monitoring of plant diversity and dynamics in a tropical dry deciduous forest of Bhadra Wildlife Sanctuary, Karnataka, southern India. Enumeration of all woody plants >= 1 cm DBH (diameter at breast height) yielded a total of 1766 individuals that belonged to 46 species, 37 genera and 24 families. Combretaceae was the most abundant family in the forest with a family importance value of 68.3. Plant density varied from 20 - 90 individuals with an average 35 individuals/quadrat (20 m x 20 m). Randia dumetorum, with 466 individuals (representing 26.7 % of the total density 2 ha(-1)) with species importance value of 36.25, was the dominant species in the plot. The total basal area of the plot was 18.09 m(2) ha(-1) with a mean of 0.72 m(2) quadrat(-1). The highest basal area of the plot was contributed by Combretaceae (12.93 m(2) 2 ha(-1)) at family level and Terminalia tomentosa (5.58 m(2) 2 ha(-1)) at species level. The lowest diameter class (1-10 cm) had the highest density (1054 individuals 2 ha(-1)), but basal area was highest in the 80 - 90 cm diameter class (5.03m(2) 2 ha(-1)). Most of the species exhibited random or aggregated distribution over the plot. This study provides a baseline information on the dry forests of Bhadra Wildlife Sanctuary.
Resumo:
While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.
Resumo:
Designing a robust algorithm for visual object tracking has been a challenging task since many years. There are trackers in the literature that are reasonably accurate for many tracking scenarios but most of them are computationally expensive. This narrows down their applicability as many tracking applications demand real time response. In this paper, we present a tracker based on random ferns. Tracking is posed as a classification problem and classification is done using ferns. We used ferns as they rely on binary features and are extremely fast at both training and classification as compared to other classification algorithms. Our experiments show that the proposed tracker performs well on some of the most challenging tracking datasets and executes much faster than one of the state-of-the-art trackers, without much difference in tracking accuracy.
Resumo:
岷江上游地区高山/亚高山植被分布的坡向性分异显著,阴阳坡高山林线不仅物种组成差异明显,并且分布海拔呈现出阴坡高阳坡低的格局.阳坡林线树种主要是圆柏属乔木,林线类型多为渐变型,海拔高度大约在3 400m~3 800m;阴坡林线树种主要是冷杉,林线类型多为骤变型,海拔高度约在3 800m~4 400m.本研究采用土壤种子库物理筛选、室内萌发实验及野外群落调查等方法,对岷江上游地区阴坡岷江冷杉和阳坡祁连圆柏两类林线树种不同海拔梯度上土壤种子库以及幼苗库特征进行了调查,从土壤种子库和幼苗更新特征的角度对林线乔木树种种群更新特征进行了分析,进而对该地区高山林线在阴阳坡分布差异的原因进行了探讨,结果显示: 1.土壤种子库 阴坡:阴坡高山林线附近岷江冷杉土壤种子的平均密度大约为50.96粒/m2,其中树线以上10m处土壤种子密度为1.00粒/m2,树线处大约19.33粒/m2,林线交错带内土壤种子密度最高为136.83粒/m2,郁闭林内种子密度小于林线交错带,只有30.50粒/m2,种子平均空壳率为52%,霉变率达34%,完好种子只有6%.土壤种子库垂直分布特征为地被物层含种子比重最大,大约在67.50%左右;其次为0~2cm层,约18.84%左右;2~5cm层所占种子比例最小,约13.66%左右.霉变种子数量与土壤深度呈负相关. 阳坡:阳坡祁连圆柏土壤种子的平均密度为60.16粒/m2.树线以上10m处密度为1.92粒/m2,树线位置大约108.16粒/m2,林线交错带内平均为75.80粒/m2,郁闭林内种子密度小于林线交错带,只有20.00粒/m2.种子平均空壳率为36%,完好种子占49%,霉变率较低,大约为10%.阴阳坡林线树种土壤种子库垂直分布特征为:地被物层含种子最多,其次为0~4cm层,4~10cm层所占种子比例最小,霉变种子数量与土壤深度也呈负相关. 2. 幼苗库调查 阳坡:在树线以上区域没有发现幼苗,林线交错带内幼苗密度平均达3 250株/hm2,郁闭林内仅2 750株/ hm2.整个样地内1~2a幼苗很少甚至没有出现,3~10a的幼苗相对较多.空间分布上,祁连圆柏幼苗在林线交错带内接近随机分布,郁闭林内则介于随机分布和均匀分布之间. 阴坡:在树线以上幼苗密度为1 250株/ hm2,全部为1~2a幼苗,林线交错带内幼苗密度平均达7 000株/ hm2,郁闭林内达6 250株/ hm2.林线附近岷江冷杉幼苗丰富度以及幼苗的出现频率明显高于祁连圆柏,年龄结构也较祁连圆柏完整.岷江冷杉幼苗空间分布除了树线处幼苗的分布为随机分布,其他海拔则为集群分布. 3.从不同土壤深度的种子总量和幼苗数量的相关性检验发现,当年生幼苗数量跟表层种子总量相关性极显著, 但是两年生幼苗的数量与底层种子数量相关性显著.土壤种子在土壤中的垂直分布格局从一定程度上可以反映种子库的年际特征.岷江冷杉土壤种子库较丰富,种子散布后的存活力随着时间的变化逐渐下降,属于季节性瞬时种子库;祁连圆柏土壤种子散布格局为集群型分布,成熟种子大部分散布在母株冠幅内,属于永久性土壤种子库. 4.在阴坡林线交错带及以上区域还存在较为丰富的乔木土壤种子,并且在树线以上区域还发现了少量的岷江冷杉幼苗.从样地乔木的年龄结构发现,在林线交错带内上部到树线位置主要以幼龄林为主,且年龄结构完整,基本符合入侵性林线特征;阳坡林线交错带内幼苗出现频率很低,树线以上区域虽然存在种子库,但是没有幼苗出现,在林线交错带内乔木径级差距很大,年龄结构异常不完整,这种特征的林线将会面临两个可能结果:一种是维持现有状态,保持平衡;另外一种就是退化,但阳坡林线的实际动态趋势还有待长期定点研究. Treelines on the upper region of Minjiang River differ between the north aspect and the south aspect in their appearances, altitudes and tree species. On the north aspect, trees of Abies form a sharp and abrupt treeline ranging from 3800m to 4400m, while on the south the treeline is generally lower(3 400~3 800m), more open and gradual and mostly composed of Sabina. In this study, we examined the altitudinal gradients of soil seed banks and seedling recruitments at the treeline ecotones of a N-aspect and a S-aspect by using soil sieving, germination experiment and field investigations, analyzed the characteristics of population regeneration of tree species at the transitional zone and presented a analysis of the causes to the aspect-related difference in treeline patterns in the study area. Major results of our study include: 1. Soil seed bank N-aspect: Of the 50 plots investigated, the average density of soil seeds is 50.96/m2, in which well-formed seeds account for 6%, empty seeds 52%, parasitized seeds34%, and seeds damaged by animals 8%. The size of soil seed bank varies along altitude, being 1.00 seeds /m2 at the 10m above the treeline and ca.19.33 seeds/m2 at the upper limit of treeline. The highest density (136.83 seeds/m2) occurs at the treeline ecotone. By contrast, the density of soil seed for the closed forest is only 30.50 seeds/m2. In terms of vertical strata, 67.50% of the total seeds are at the surface layer, 18.84% at the middle layer (0~2cm) and 13.66% at deeper layer (2~5cm). The number of parasitized seeds is negatively correlated to soil depth. S-aspect: Of the 50 plots investigated, the average density of soil seeds is 60.16 seeds/m2, and the well-formed seeds account for 49%, empty seeds 36%, parasitized seeds10%, and seeds damaged by animals 1%. The size of soil seed bank varies along altitude, with 1.92 seeds/m2 recorded at the10m above the treeline,108.16 seeds/m2 at the upper limit of treeline, and 75.80 seeds/m2 at the treeline ecotone, while that for the closed forest is 20.00 seeds/m2. The number of seeds decreases with the depth of soil. As is on the N-aspect, the size of soil bank, from large to small, follows the order of the surface layer, the middle layer (0~4cm) and the bottom layer (4~10cm). The number of parasitized seeds is also negatively correlated to the depth of the soil. 2. Seedling bank N-aspect: A mean maximum seedling abundance of 31 000 seedlings/hm2 was recorded near alpine treeline at growing season. The density of seedlings is 1 250 seedlings/ha (all being 1 or 2 years old) at the alpine meadow 10m away above treeline, 7 000 seedlings/ha at treeline ecotone and 6 250 seedlings/ha for closed forest.The spatial distribution of Abies faxoniana seedlings is random at the upper limit of the treeline but clumped at other altitudes. S-aspect: No seedlings were found at the alpine meadow 10m away from the treeline. The density of seedlings was 3 250 seedlings/ha at treeline ecotone and 2 750 seedlings/ha for the closed forest.Hardly any 1 year current and 2 year-old seedlings appeared at the plots. The spatial distribution of Sabina przewalskii seedlings is random at treeline ecotone and between “random” and “even” forest closed forest. 3.Correlation tests of seedling population and seed bank at different soil layers indicated that the emergents were strongly correlated to seed bank at surface layer while the number of two-year seedlings was significantly correlated to the seed bank at the bottom of soil layer, indicating that germination mainly occurs at the soil surface while the middle or bottom layer was the reserve for non-germination or dead seeds. It can thus be postulated that Abies faxoniana soil seed bank is of seasonal transient type. By contrast, the soil seed bank of Sabina przewalskii is of persistent type and the soil seeds and seedlings of this species occurred more frequently near the islands of adult trees. 4.A good many soil seeds of both tree species were found near the treeline ecotone and above at N- and S-aspects. A few young seedlings were found above the Abies treeline. Investigation of five altitudinal transects respectively on N- and S-aspects indicated that Abies faxoniana has a more complete age structure than the stands of Sabina przewalskii. The age of firs decreased from closed forest to the upper limit of treeline, which suggests that the Abies treeline is advancing to higher altitude. While on the south aspect, only few Sabina przewalskii soil seeds and nearly no seedlings were found above the treeline ecotone. The stands exhibit extremely great difference in diameter classes with significantly incomplete age structure. This would lead to two possible results for the treelines: maintaining an equilibrium state at the current position or degenerating. But more studies should be carried out at longer time scales or larger spatial scales to understand whether the Sabina treeline is degenerating.
Resumo:
In attempts to conserve the species diversity of trees in tropical forests, monitoring of diversity in inventories is essential. For effective monitoring it is crucial to be able to make meaningful comparisons between different regions, or comparisons of the diversity of a region at different times. Many species diversity measures have been defined, including the well-known abundance and entropy measures. All such measures share a number of problems in their effective practical use. However, probably the most problematic is that they cannot be used to meaningfully assess changes, since thay are only concerned with the number of species or the proportions of the population/sample which they constitute. A natural (though simplistic) model of a species frequency distribution is the multinomial distribution. It is shown that the likelihood analysis of samples from such a distribution are closely related to a number of entropy-type measures of diversity. Hence a comparison of the species distribution on two plots, using the multinomial model and likelihood methods, leads to generalised cross-entropy as the LRT test statistic of the null that the species distributions are the same. Data from 30 contiguous plots in a forest in Sumatra are analysed using these methods. Significance tests between all pairs of plots yield extremely low p-values, indicating strongly that it ought to been "Obvious" that the observed species distributions are different on different plots. In terms of how different the plots are, and how these differences vary over the whole study site, a display of the degrees of freedom of the test, (equivalent to the number of shared species) seems to be the most revealing indicator, as well as the simplest.
Resumo:
The discovery that the hypotensive sequela of envenomation by the South American viper, Bothrops jararaca, was mediated by peptides, represented a milestone in drug discovery research that led to the introduction of ACE inhibitors. These bradykinin-potentiating peptides (BPPs) have been found in the venoms of many species of viper and molecular cloning of biosynthetic precursors has revealed that each encodes several different BPPs in tandem with a single copy of a C-type natriuretic peptide (CNP) located at the C-terminus. Venoms of the African forest vipers (Atheris) have been poorly studied possibly because they do not represent a major danger to humans. However, initial studies have indicated that they contain some of the “classical” protein toxins of viper venoms and a novel class of peptide, the polyglycine/polyhistidine (pGpH) peptides. These peptides occur in several molecular forms with different numbers of repetitive glycine and histidine repeats. We have cloned the biosynthetic precursor of A. squamigera pGpH peptides from a venom-derived cDNA library and have confirmed that a single copy of CNP is located at the C-terminus and additionally that, like BPPs in other vipers, pGpH peptides are encoded in tandem within this single precursor. Solid phase peptide synthesis of pGpH peptides has proven to be extremely difficult but is progressing and acquisition of synthetic replicates of each peptide is a necessary prerequisite for systematic pharmacological characterisation as establishment of a biological function for these peptides remains elusive. pGpH peptides may prove to play a role as fundamental as that of the BPPs.
Resumo:
Portuguese northern forests are often and severely affected by wildfires during the summer season. Some preventive actions, such as prescribed (or controlled) burnings and clear-cut logging, are often used as a measure to reduce the occurrences of wildfires. In the particular case of Serra da Cabreira forest, due to extremely difficulties in operational field work, the prescribed (or controlled) burning technique is the the most common preventive action used to reduce the existing fuel load amount. This paper focuses on a Fuzzy Boolean Nets analysis of the changes in some forest soil properties, namely pH, moisture and organic matter content, after a controlled fire, and on the difficulties found during the sampling process and how they were overcome. The monitoring process was conducted during a three-month period in Anjos, Vieira do Minho, Portugal, an area located in a contact zone between a two-mica coarse-grained porphyritic granite and a biotite with plagioclase granite. The sampling sites were located in a spot dominated by quartzphyllite with quartz veins whose bedrock is partially altered and covered by slightly thick humus, which maintains low undergrowth vegetation.