71 resultados para Classification Tree Pruning
Resumo:
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
As part of ACIAR project ASEM/2003/052, Improving Financial Returns to Smallholder Tree Farmers in the Philippines, plantations of timber trees in Leyte Island, the Philippines were located using a systematic survey of the island. The survey was undertaken in order to compile a database of plantations which could be used to guide the planning of project activities. In addition to recording a range of qualitative and quantitative information for each plantation, the survey spatially referenced each site using a Global Positioning System (GPS) to electronic maps of the island which were held in a Geographical Information System (GIS). Microsoft Excel and Mapsource® software were used as the software links between GPS coordinates and the GIS. Mapping of farm positions was complicated by different datums being used for maps of Leyte Island and this caused GPS positions to be displaced from equivalent positions on the map. Photos of the sites were hyperlinked to their map positions in the GIS in order to assist staff to recall site characteristics.
Resumo:
Poor root development due to constraining soil conditions could be an important factor influencing health of urban trees. Therefore, there is a need for efficient techniques to analyze the spatial distribution of tree roots. An analytical procedure for describing tree rooting patterns from X-ray computed tomography (CT) data is described and illustrated. Large irregularly shaped specimens of undisturbed sandy soil were sampled from Various positions around the base of trees using field impregnation with epoxy resin, to stabilize the cohesionless soil. Cores approximately 200 mm in diameter by 500 mm in height were extracted from these specimens. These large core samples were scanned with a medical X-ray CT device, and contiguous images of soil slices (2 mm thick) were thus produced. X-ray CT images are regarded as regularly-spaced sections through the soil although they are not actual 2D sections but matrices of voxels similar to 0.5 mm x 0.5 mm x 2 mm. The images were used to generate the equivalent of horizontal root contact maps from which three-dimensional objects, assumed to be roots, were reconstructed. The resulting connected objects were used to derive indices of the spatial organization of roots, namely: root length distribution, root length density, root growth angle distribution, root spatial distribution, and branching intensity. The successive steps of the method, from sampling to generation of indices of tree root organization, are illustrated through a case study examining rooting patterns of valuable urban trees. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
The Montreal Process indicators are intended to provide a common framework for assessing and reviewing progress toward sustainable forest management. The potential of a combined geometrical-optical/spectral mixture analysis model was assessed for mapping the Montreal Process age class and successional age indicators at a regional scale using Landsat Thematic data. The project location is an area of eucalyptus forest in Emu Creek State Forest, Southeast Queensland, Australia. A quantitative model relating the spectral reflectance of a forest to the illumination geometry, slope, and aspect of the terrain surface and the size, shape, and density, and canopy size. Inversion of this model necessitated the use of spectral mixture analysis to recover subpixel information on the fractional extent of ground scene elements (such as sunlit canopy, shaded canopy, sunlit background, and shaded background). Results obtained fron a sensitivity analysis allowed improved allocation of resources to maximize the predictive accuracy of the model. It was found that modeled estimates of crown cover projection, canopy size, and tree densities had significant agreement with field and air photo-interpreted estimates. However, the accuracy of the successional stage classification was limited. The results obtained highlight the potential for future integration of high and moderate spatial resolution-imaging sensors for monitoring forest structure and condition. (C) Elsevier Science Inc., 2000.
Resumo:
Strain-dependent hydraulic conductivities are uniquely defined by an environmental factor, representing applied normal and shear strains, combined with intrinsic material parameters representing mass and component deformation moduli, initial conductivities, and mass structure. The components representing mass moduli and structure are defined in terms of RQD (rock quality designation) and RMR (rock mass rating) to represent the response of a whole spectrum of rock masses, varying from highly fractured (crushed) rock to intact rock. These two empirical parameters determine the hydraulic response of a fractured medium to the induced-deformations The constitutive relations are verified against available published data and applied to study one-dimensional, strain-dependent fluid flow. Analytical results indicate that both normal and shear strains exert a significant influence on the processes of fluid flow and that the magnitude of this influence is regulated by the values of RQD and RMR.
Resumo:
Examples from the Murray-Darling basin in Australia are used to illustrate different methods of disaggregation of reconnaissance-scale maps. One approach for disaggregation revolves around the de-convolution of the soil-landscape paradigm elaborated during a soil survey. The descriptions of soil ma units and block diagrams in a soil survey report detail soil-landscape relationships or soil toposequences that can be used to disaggregate map units into component landscape elements. Toposequences can be visualised on a computer by combining soil maps with digital elevation data. Expert knowledge or statistics can be used to implement the disaggregation. Use of a restructuring element and k-means clustering are illustrated. Another approach to disaggregation uses training areas to develop rules to extrapolate detailed mapping into other, larger areas where detailed mapping is unavailable. A two-level decision tree example is presented. At one level, the decision tree method is used to capture mapping rules from the training area; at another level, it is used to define the domain over which those rules can be extrapolated. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
The majority of past and current individual-tree growth modelling methodologies have failed to characterise and incorporate structured stochastic components. Rather, they have relied on deterministic predictions or have added an unstructured random component to predictions. In particular, spatial stochastic structure has been neglected, despite being present in most applications of individual-tree growth models. Spatial stochastic structure (also called spatial dependence or spatial autocorrelation) eventuates when spatial influences such as competition and micro-site effects are not fully captured in models. Temporal stochastic structure (also called temporal dependence or temporal autocorrelation) eventuates when a sequence of measurements is taken on an individual-tree over time, and variables explaining temporal variation in these measurements are not included in the model. Nested stochastic structure eventuates when measurements are combined across sampling units and differences among the sampling units are not fully captured in the model. This review examines spatial, temporal, and nested stochastic structure and instances where each has been characterised in the forest biometry and statistical literature. Methodologies for incorporating stochastic structure in growth model estimation and prediction are described. Benefits from incorporation of stochastic structure include valid statistical inference, improved estimation efficiency, and more realistic and theoretically sound predictions. It is proposed in this review that individual-tree modelling methodologies need to characterise and include structured stochasticity. Possibilities for future research are discussed. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
The salticid spider Cosmophasis bitaeniata preys on the larvae of the green tree ant Oecophylla smaragdina. Gas chromatography (GC) and gas chromatography-mass spectrometry (GC-MS) reveal that the cuticle of C. bitaeniata mimics the mono- and dimethylalkanes of the cuticle of its prey. Recognition bioassays with extracts of the cuticular hydrocarbons of ants and spiders revealed that foraging major workers did not respond aggressively to the extracts of the spiders or conspecific nestmates, but reacted aggressively to conspecific nonnestmates. Typically, the ants either failed to react (as with control treatments with no extracts) or they reacted nonaggressively as with conspecific nestmates. These data indicate that the qualitative chemical mimicry of ants by C. bitaeniata allows the spiders to avoid detection by major workers of O. smaragdina.
Resumo:
Little is known about the responses of Australian plants to excess metal, including Mn. It is important to remedy this lack of information so that knowledgeable decisions can be made about managing Mn contaminated sites where inhabited by Australian vegetation. Acacia holosericea, Melaleuca leucadendra, Eucalyptus crebra and Eucalyptus camaldulensis were grown in dilute solution culture for 10 weeks. The seedlings ( 42 days old) were exposed to six Mn treatments viz., 1, 8, 32, 128, 512 and 2048 muM. The order of tolerance to toxic concentrations of Mn was A. holosericea congruent to = E. crebra < M. leucadendra < E. camaldulensis, the critical external concentrations being approximately 5.1, 5.0, 21 and 330 muM, respectively. The critical tissue Mn concentrations for the youngest fully expanded leaf and total shoots were, respectively, 265 and 215 mug g(-1) DM for A. holosericea, 445 and 495 mug g(-1) DM for M. leucadendra, 495 and 710 mug g(-1) DM for E. crebra and 7230 and 6510 mug g(-1) DM for E. camaldulensis. The high tolerance of E. camaldulensis ( as opposed to the sensitivity of E. crebra) to excess Mn raises concern about fauna feeding on the plant and is consistent with hypotheses suggesting the Eucalyptus subgenus Symphomyrtus is particularly tolerant of stress, including excess Mn. The results from this paper provide the first comprehensive combination of growth responses, critical external concentrations, critical tissue concentrations and plant toxicity symptoms for three important Australian genera, viz., Eucalyptus, Acacia and Melaleuca, for use in the management of Mn toxic sites.
Resumo:
CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView's utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at http://research.i2r.a-star.edu.sg/CysView/.