108 resultados para Random Forests Classifier


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Models of complex systems with n components typically have order n<sup>2</sup> parameters because each component can potentially interact with every other. When it is impractical to measure these parameters, one may choose random parameter values and study the emergent statistical properties at the system level. Many influential results in theoretical ecology have been derived from two key assumptions: that species interact with random partners at random intensities and that intraspecific competition is comparable between species. Under these assumptions, community dynamics can be described by a community matrix that is often amenable to mathematical analysis. We combine empirical data with mathematical theory to show that both of these assumptions lead to results that must be interpreted with caution. We examine 21 empirically derived community matrices constructed using three established, independent methods. The empirically derived systems are more stable by orders of magnitude than results from random matrices. This consistent disparity is not explained by existing results on predator-prey interactions. We investigate the key properties of empirical community matrices that distinguish them from random matrices. We show that network topology is less important than the relationship between a species’ trophic position within the food web and its interaction strengths. We identify key features of empirical networks that must be preserved if random matrix models are to capture the features of real ecosystems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper uses the history of rubber extraction to explore competing attempts to control the forest environments of Assam and beyond in the second half of the nineteenth century. Forest communities faced rival efforts at environmental control from both European and Indian traders, as well as from various centres of authority within the Raj. Government attempts to regulate rubber collection were undermined by the weak authority of the Raj in these regions, leading to widespread smuggling. Partly in response to the disruptive influence of rubber traders on the frontier, the Raj began to restrict the presence of outsiders in tribal regions, which came to be understood as distinct areas outside British control. When rubber yields from the forests nearest the Brahmaputra fell in the wake of intensive exploitation, India's scientific foresters demanded and from 1870 obtained the ability to regulate the Assamese forests, blaming indigenous rubber tapping strategies for the declining yields and arguing that Indian rubber could be ‘equal [to] if not better' than Amazonian rubber if only tappers would change their practices. The knowledge of the scientific foresters was fundamentally flawed, however, and their efforts to establish a new type of tapping practice failed. By 1880, the government had largely abandoned attempts to regulate wild Indian rubber, though wild sources continued to dominate the supply of global rubber until after 1910.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on imputation and observed characteristics will produce biased results. Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV status. We introduce a new random effects method to these selection models which overcomes non-convergence caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to implement in standard statistical software, and allows the construction of bootstrapped standard errors which adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated. Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia (2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive. Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that may occur when a large proportion of people refuse to test.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We describe a pre-processing correlation attack on an FPGA implementation of AES, protected with a random clocking countermeasure that exhibits complex variations in both the location and amplitude of the power consumption patterns of the AES rounds. It is demonstrated that the merged round patterns can be pre-processed to identify and extract the individual round amplitudes, enabling a successful power analysis attack. We show that the requirement of the random clocking countermeasure to provide a varying execution time between processing rounds can be exploited to select a sub-set of data where sufficient current decay has occurred, further improving the attack. In comparison with the countermeasure's estimated security of 3 million traces from an integration attack, we show that through application of our proposed techniques that the countermeasure can now be broken with as few as 13k traces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduced browsing animals negatively impact New Zealand's indigenous ecosystems. Eradicating introduced browsers is currently unfeasible at large scales, but culling since the 1960s has successfully reduced populations to a fraction of their earlier sizes. Here we ask whether culling of ungulates has allowed populations of woody plant species to recover across New Zealand forests. Using 73 pairs of permanent fenced exclosure and unfenced control plots, we found rapid increases in sapling densities within exclosures located in disturbed forests, particularly if a seedling bank was already present. Recovery was slower in thinning stands and hampered by dense fern cover. We inferred ungulate diet preference from species recovery rates inside exclosures to test whether culling increased abundance of preferred species across a national network of 574 unfenced permanent forest plots. Across this network, saplings were observed irrespective of their preference to ungulates in the 1970s, but preferred species were rarer within disturbed sites in the 1990s after long-term culling and despite nationwide increases in sapling densities. This indicates that preferred species are relatively heavily affected by browsing after culling, presumably because remaining animals will increase consumption of preferred species as competition is reduced. Our results clearly suggest that culling will not return preferred plants to the landscape immediately, even given suitable conditions for regeneration. Complete removal of ungulates rather than simply reducing their densities may be required for recovery in heavily browsed temperate forests, but since this is only feasible at small spatial scales, management efforts must target sites of high conservation value. © 2012 Elsevier Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Generative algorithms for random graphs have yielded insights into the structure and evolution of real-world networks. Most networks exhibit a well-known set of properties, such as heavy-tailed degree distributions, clustering and community formation. Usually, random graph models consider only structural information, but many real-world networks also have labelled vertices and weighted edges. In this paper, we present a generative model for random graphs with discrete vertex labels and numeric edge weights. The weights are represented as a set of Beta Mixture Models (BMMs) with an arbitrary number of mixtures, which are learned from real-world networks. We propose a Bayesian Variational Inference (VI) approach, which yields an accurate estimation while keeping computation times tractable. We compare our approach to state-of-the-art random labelled graph generators and an earlier approach based on Gaussian Mixture Models (GMMs). Our results allow us to draw conclusions about the contribution of vertex labels and edge weights to graph structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Risks are an essential feature of future climate change impacts. We explore whether knowledge that climate change might be the source of increasing pine beetle impacts on public or private forests affects stated risk estimates of damage, elicited using the exchangeability method. We find that across subjects the difference between public and private forest status does not influence stated risks, but the group told that global warming is the cause of pine beetle damage has significantly higher risk perceptions than the group not given this information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Camera traps are used to estimate densities or abundances using capture-recapture and, more recently, random encounter models (REMs). We deploy REMs to describe an invasive-native species replacement process, and to demonstrate their wider application beyond abundance estimation. The Irish hare Lepus timidus hibernicus is a high priority endemic of conservation concern. It is threatened by an expanding population of non-native, European hares L. europaeus, an invasive species of global importance. Camera traps were deployed in thirteen 1 km squares, wherein the ratio of invader to native densities were corroborated by night-driven line transect distance sampling throughout the study area of 1652 km2. Spatial patterns of invasive and native densities between the invader’s core and peripheral ranges, and native allopatry, were comparable between methods. Native densities in the peripheral range were comparable to those in native allopatry using REM, or marginally depressed using Distance Sampling. Numbers of the invader were substantially higher than the native in the core range, irrespective of method, with a 5:1 invader-to-native ratio indicating species replacement. We also describe a post hoc optimization protocol for REM which will inform subsequent (re-)surveys, allowing survey effort (camera hours) to be reduced by up to 57% without compromising the width of confidence intervals associated with density estimates. This approach will form the basis of a more cost-effective means of surveillance and monitoring for both the endemic and invasive species. The European hare undoubtedly represents a significant threat to the endemic Irish hare.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the most popular techniques of generating classifier ensembles is known as stacking which is based on a meta-learning approach. In this paper, we introduce an alternative method to stacking which is based on cluster analysis. Similar to stacking, instances from a validation set are initially classified by all base classifiers. The output of each classifier is subsequently considered as a new attribute of the instance. Following this, a validation set is divided into clusters according to the new attributes and a small subset of the original attributes of the instances. For each cluster, we find its centroid and calculate its class label. The collection of centroids is considered as a meta-classifier. Experimental results show that the new method outperformed all benchmark methods, namely Majority Voting, Stacking J48, Stacking LR, AdaBoost J48, and Random Forest, in 12 out of 22 data sets. The proposed method has two advantageous properties: it is very robust to relatively small training sets and it can be applied in semi-supervised learning problems. We provide a theoretical investigation regarding the proposed method. This demonstrates that for the method to be successful, the base classifiers applied in the ensemble should have greater than 50% accuracy levels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work presents a new general purpose classifier named Averaged Extended Tree Augmented Naive Bayes (AETAN), which is based on combining the advantageous characteristics of Extended Tree Augmented Naive Bayes (ETAN) and Averaged One-Dependence Estimator (AODE) classifiers. We describe the main properties of the approach and algorithms for learning it, along with an analysis of its computational time complexity. Empirical results with numerous data sets indicate that the new approach is superior to ETAN and AODE in terms of both zero-one classification accuracy and log loss. It also compares favourably against weighted AODE and hidden Naive Bayes. The learning phase of the new approach is slower than that of its competitors, while the time complexity for the testing phase is similar. Such characteristics suggest that the new classifier is ideal in scenarios where online learning is not required.