Biblioteca Digital

889 resultados para Random trees

TESTING STATISTICAL HYPOTHESIS ON RANDOM TREES AND APPLICATIONS TO THE PROTEIN CLASSIFICATION PROBLEM

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Limit theorems for sequences of random trees

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider a random tree and introduce a metric in the space of trees to define the ""mean tree"" as the tree minimizing the average distance to the random tree. When the resulting metric space is compact we have laws of large numbers and central limit theorems for sequence of independent identically distributed random trees. As application we propose tests to check if two samples of random trees have the same law.

Multi-vehicle planning using RRT-connect

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of planning multiple vehicles deals with the design of an effective algorithm that can cause multiple autonomous vehicles on the road to communicate and generate a collaborative optimal travel plan. Our modelling of the problem considers vehicles to vary greatly in terms of both size and speed, which makes it suboptimal to have a faster vehicle follow a slower vehicle or for vehicles to drive with predefined speed lanes. It is essential to have a fast planning algorithm whilst still being probabilistically complete. The Rapidly Exploring Random Trees (RRT) algorithm developed and reported on here uses a problem specific coordination axis, a local optimization algorithm, priority based coordination, and a module for deciding travel speeds. Vehicles are assumed to remain in their current relative position laterally on the road unless otherwise instructed. Experimental results presented here show regular driving behaviours, namely vehicle following, overtaking, and complex obstacle avoidance. The ability to showcase complex behaviours in the absence of speed lanes is characteristic of the solution developed.

Comparing machine learning classifiers in potential distribution modelling

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.

Modeling height-diameter relationship in pinus pinaster ait. in the forest intervention zone of Lomba, NE-Portugal

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, the relationship between diameter at breast height (d) and total height (h) of individual-tree was modeled with the aim to establish provisory height-diameter (h-d) equations for maritime pine (Pinus pinaster Ait.) stands in the Lomba ZIF, Northeast Portugal. Using data collected locally, several local and generalized h-d equations from the literature were tested and adaptations were also considered. Model fitting was conducted by using usual nonlinear least squares (nls) methods. The best local and generalized models selected, were also tested as mixed models applying a first-order conditional expectation (FOCE) approximation procedure and maximum likelihood methods to estimate fixed and random effects. For the calibration of the mixed models and in order to be consistent with the fitting procedure, the FOCE method was also used to test different sampling designs. The results showed that the local h-d equations with two parameters performed better than the analogous models with three parameters. However a unique set of parameter values for the local model can not be used to all maritime pine stands in Lomba ZIF and thus, a generalized model including covariates from the stand, in addition to d, was necessary to obtain an adequate predictive performance. No evident superiority of the generalized mixed model in comparison to the generalized model with nonlinear least squares parameters estimates was observed. On the other hand, in the case of the local model, the predictive performance greatly improved when random effects were included. The results showed that the mixed model based in the local h-d equation selected is a viable alternative for estimating h if variables from the stand are not available. Moreover, it was observed that it is possible to obtain an adequate calibrated response using only 2 to 5 additional h-d measurements in quantile (or random) trees from the distribution of d in the plot (stand). Balancing sampling effort, accuracy and straightforwardness in practical applications, the generalized model from nls fit is recommended. Examples of applications of the selected generalized equation to the forest management are presented, namely how to use it to complete missing information from forest inventory and also showing how such an equation can be incorporated in a stand-level decision support system that aims to optimize the forest management for the maximization of wood volume production in Lomba ZIF maritime pine stands.

On random 3-2 trees /

Relevância:

40.00% 40.00%

Publicador:

Resumo:

"UIUCDCS-R-74-679"

INFLUENCE OF ALTITUDE, AGE AND DIAMETER ON YIELD AND ALPHA-BISABOLOL CONTENT OF CANDEIA TREES (Eremanthus erythropappus)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The heartwood of candeia tree is a source of essential oil rich in alpha-bisabolol, a substance widely used in the cosmetic and pharmaceutical industry. Bearing in mind the economic importance of alpha-bisabolol, this work aimed to evaluate the influence of tree age on the yield and content of alpha-bisabolol present in essential oil from candeia, considering two distinct reliefs and three diameter classes, in Aiuruoca region, south Minas Gerais state. The two distinct reliefs correspond respectively to one section of the stand growing at 1,000m of altitude (Area 1) and another section growing at 1,100m of altitude (Area 2). In each section, 15 trees were felled from among 3 different diameter classes. Discs were removed from the base of each tree to estimate their age by doing growth ring count. Soil samples were taken and Subjected to physical and chemical analysis. The logs were reduced into chips and random samples were taken for distillation to extract essential oil. The method used was steam distillation at a pressure of 2 kgf/cm(2)/2.5 h. The chemical analysis was performed in a gas chromatograph (GC) based on the alpha-bisabolol standard reference. The yield of essential oil from trees in Area I was higher than that from trees in Area 2, with the same pattern of influence for older trees. In Area 2, the alpha-bisabolol content was higher in younger trees. No differences were found between the relevant parameters in relation to diameter classes.

Random perturbations of stochastic processes with unbounded variable length memory

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider binary infinite order stochastic chains perturbed by a random noise. This means that at each time step, the value assumed by the chain can be randomly and independently flipped with a small fixed probability. We show that the transition probabilities of the perturbed chain are uniformly close to the corresponding transition probabilities of the original chain. As a consequence, in the case of stochastic chains with unbounded but otherwise finite variable length memory, we show that it is possible to recover the context tree of the original chain, using a suitable version of the algorithm Context, provided that the noise is small enough.

GA topology optimization using random keys for tree encoding of structures

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Topology optimization consists in finding the spatial distribution of a given total volume of material for the resulting structure to have some optimal property, for instance, maximization of structural stiffness or maximization of the fundamental eigenfrequency. In this paper a Genetic Algorithm (GA) employing a representation method based on trees is developed to generate initial feasible individuals that remain feasible upon crossover and mutation and as such do not require any repairing operator to ensure feasibility. Several application examples are studied involving the topology optimization of structures where the objective functions is the maximization of the stiffness and the maximization of the first and the second eigenfrequencies of a plate, all cases having a prescribed material volume constraint.

Hierarchical conditional random fields for parts based models matching

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parts based model is a parametrization of an object class using a collection of landmarks following the object structure. The matching of parts based models is one of the problems where pairwise Conditional Random Fields have been successfully applied. The main reason of their effectiveness is tractable inference and learning due to the simplicity of involved graphs, usually trees. However, these models do not consider possible patterns of statistics among sets of landmarks, and thus they sufffer from using too myopic information. To overcome this limitation, we propoese a novel structure based on a hierarchical Conditional Random Fields, which we explain in the first part of this memory. We build a hierarchy of combinations of landmarks, where matching is performed taking into account the whole hierarchy. To preserve tractable inference we effectively sample the label set. We test our method on facial feature selection and human pose estimation on two challenging datasets: Buffy and MultiPIE. In the second part of this memory, we present a novel approach to multiple kernel combination that relies on stacked classification. This method can be used to evaluate the landmarks of the parts-based model approach. Our method is based on combining responses of a set of independent classifiers for each individual kernel. Unlike earlier approaches that linearly combine kernel responses, our approach uses them as inputs to another set of classifiers. We will show that we outperform state-of-the-art methods on most of the standard benchmark datasets.

Temporal variation in the composition of ant assemblages (Hymenoptera, Formicidae) on trees in the Pantanal floodplain, Mato Grosso do Sul, Brazil

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Temporal variation in the composition of ant assemblages (Hymenoptera, Formicidae) on trees in the Pantanal floodplain, Mato Grosso do Sul, Brazil. In this paper we investigate how seasonal flooding influences the composition of assemblages of ants foraging on trees in the Pantanal of Mato Grosso do Sul. During the flood in the Pantanal, a large area is covered by floods that are the main forces that regulate the pattern of diversity in these areas. However, the effects of such natural disturbances in the ant communities are poorly known. In this sense, the objective of this study was to evaluate the effect of temporal variation in assemblages of ants foraging on trees in the Pantanal of Miranda. Samples were collected during a year in two adjacent areas, one who suffered flooding during the wet period and another that did not suffer flooding throughout the year. In 10 sites for each evaluated habitat, five pitfall traps were installed at random in trees 25 m apart from each other. In the habitat with flooding, the highest richness was observed during the flooding period, while there was no significant change in richness in the area that does not suffer flooding. The diversity of species between the two evaluated habitats varied significantly during the two seasons. Most ants sampled belong to species that forage and nest in soil. This suggests that during the flood in flooded habitats, ants that did not migrate to higher areas without flooding adopt the strategy to search for resources in the tree canopy.

Inferring epidemic contact structure from phylogenetic trees.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Contact structure is believed to have a large impact on epidemic spreading and consequently using networks to model such contact structure continues to gain interest in epidemiology. However, detailed knowledge of the exact contact structure underlying real epidemics is limited. Here we address the question whether the structure of the contact network leaves a detectable genetic fingerprint in the pathogen population. To this end we compare phylogenies generated by disease outbreaks in simulated populations with different types of contact networks. We find that the shape of these phylogenies strongly depends on contact structure. In particular, measures of tree imbalance allow us to quantify to what extent the contact structure underlying an epidemic deviates from a null model contact network and illustrate this in the case of random mixing. Using a phylogeny from the Swiss HIV epidemic, we show that this epidemic has a significantly more unbalanced tree than would be expected from random mixing.

Stratification of the severity of critically ill patients with classification trees

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.

Production costs and fruit yield profitability in the initial harvest of custard apple trees

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this study was to estimate the production cost and economic indicators associated with the production and sales of fruits from 20 custard apple progenies during the initial five harvests, in order to identify the harvest season from which custard apple exploitation becomes profitable, as well as the most promising progenies from an economic point of view. The fruit yield data upon which the present work was based were obtained during the period from 2001 to 2005, in an experiment that evaluated 20 custard apple half-sibling progenies, under sprinkler irrigation. The progenies were evaluated in a random block design with five replicates and plots consisting of four plants each. The exploitation of custard apple progenies only showed to be a profitable agribusiness after the fourth year. Before that, only A3 and A4 progenies in the second year, and P3 and P11 in the third year provided profitable incomes. Considering the methodological assumptions imposed concerning the time period analysis and the prices as of July 2007, the most important profitability indicators (operating profit, return index and equilibrium price) evidenced that the A4 progeny is the most recommended, although other progenies are also highlighted, such as FJ1 and FJ2. As already discussed, the progenies showing the highest average yields of five harvests are not always the most economically recommendable ones.

Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.

«
1
2
3
4
5
6
7
8
...
59
60
»