22 resultados para tree-based

em CentAUR: Central Archive University of Reading - UK


Relevância:

70.00% 70.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We have performed microarray hybridization studies on 40 clinical isolates from 12 common serovars within Salmonella enterica subspecies I to identify the conserved chromosomal gene pool. We were able to separate the core invariant portion of the genome by a novel mathematical approach using a decision tree based on genes ranked by increasing variance. All genes within the core component were confirmed using available sequence and microarray information for S. enterica subspecies I strains. The majority of genes within the core component had conserved homologues in Escherichia coli K-12 strain MG1655. However, many genes present in the conserved set which were absent or highly divergent in K-12 had close homologues in pathogenic bacteria such as Shigella flexneri and Pseudomonas aeruginosa. Genes within previously established virulence determinants such as SPI1 to SPI5 were conserved. In addition several genes within SPI6, all of SPI9, and three fimbrial operons (fim, bcf, and stb) were conserved within all S. enterica strains included in this study. Although many phage and insertion sequence elements were missing from the core component, approximately half the pseudogenes present in S. enterica serovar Typhi were conserved. Furthermore, approximately half the genes conserved in the core set encoded hypothetical proteins. Separation of the core and variant gene sets within S. enterica subspecies I has offered fundamental biological insight into the genetic basis of phenotypic similarity and diversity across S. enterica subspecies I and shown how the core genome of these pathogens differs from the closely related E. coli K-12 laboratory strain.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a novel mobile sink area allocation scheme for consumer based mobile robotic devices with a proven application to robotic vacuum cleaners. In the home or office environment, rooms are physically separated by walls and an automated robotic cleaner cannot make a decision about which room to move to and perform the cleaning task. Likewise, state of the art cleaning robots do not move to other rooms without direct human interference. In a smart home monitoring system, sensor nodes may be deployed to monitor each separate room. In this work, a quad tree based data gathering scheme is proposed whereby the mobile sink physically moves through every room and logically links all separated sub-networks together. The proposed scheme sequentially collects data from the monitoring environment and transmits the information back to a base station. According to the sensor nodes information, the base station can command a cleaning robot to move to a specific location in the home environment. The quad tree based data gathering scheme minimizes the data gathering tour length and time through the efficient allocation of data gathering areas. A calculated shortest path data gathering tour can efficiently be allocated to the robotic cleaner to complete the cleaning task within a minimum time period. Simulation results show that the proposed scheme can effectively allocate and control the cleaning area to the robot vacuum cleaner without any direct interference from the consumer. The performance of the proposed scheme is then validated with a set of practical sequential data gathering tours in a typical office/home environment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A fast Knowledge-based Evolution Strategy, KES, for the multi-objective minimum spanning tree, is presented. The proposed algorithm is validated, for the bi-objective case, with an exhaustive search for small problems (4-10 nodes), and compared with a deterministic algorithm, EPDA and NSGA-II for larger problems (up to 100 nodes) using benchmark hard instances. Experimental results show that KES finds the true Pareto fronts for small instances of the problem and calculates good approximation Pareto sets for larger instances tested. It is shown that the fronts calculated by YES are superior to NSGA-II fronts and almost as good as those established by EPDA. KES is designed to be scalable to multi-objective problems and fast due to its small complexity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Paternity analysis based on eight microsatellite loci was used to investigate pollen and seed dispersal patterns of the dioecious wind- pollinated tree, Araucaria angustifolia. The study sites were a 5.4 ha isolated forest fragment and a small tree group situated 1.7 km away, located in Paran alpha State, Brazil. In the forest fragment, 121 males, 99 females, 66 seedlings and 92 juveniles were mapped and genotyped, together with 210 seeds. In the tree group, nine male and two female adults were mapped and genotyped, together with 20 seeds. Paternity analysis within the forest fragment indicated that at least 4% of the seeds, 3% of the seedlings and 7% of the juveniles were fertilized by pollen from trees in the adjacent group, and 6% of the seeds were fertilized by pollen from trees outside these stands. The average pollination distance within the forest fragment was 83 m; when the tree group was included the pollination distance was 2006m. The average number of effective pollen donors was estimated as 12.6. Mother- trees within the fragment could be assigned to all seedlings and juveniles, suggesting an absence of seed immigration. The distance of seedlings and juveniles from their assigned mother- trees ranged from 0.35 to 291m ( with an average of 83m). Significant spatial genetic structure among adult trees, seedlings, and juveniles was detected up to 50m, indicating seed dispersal over a short distance. The effective pollination neighborhood ranged from 0.4 to 3.3 ha. The results suggest that seed dispersal is restricted but that there is longdistance pollen dispersal between the forest fragment and the tree group; thus, the two stands of trees are not isolated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The long-term variability of the Siberian High, the dominant Northern Hemisphere anticyclone during winter, is largely unknown. To investigate how this feature varied prior to the instrumental record, we present a reconstruction of a Dec-Feb Siberian High (SH) index based on Eurasian and North American tree rings. Spanning 1599-1980, it provides information on SH variability over the past four centuries. A decline in the instrumental SH index since the late 1970s, related to Eurasian warming, is the most striking feature over the past four hundred years. It is associated with a highly significant (p < 0.0001) step change in 1989. Significant similar to 3-4 yr spectral peaks in the reconstruction fall within the range of variability of the East Asian winter monsoon (which has also declined recently) and lend further support to proposed relationships between these largescale features of the climate system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new heuristic for the Steiner Minimal Tree problem is presented here. The method described is based on the detection of particular sets of nodes in networks, the “Hot Spot” sets, which are used to obtain better approximations of the optimal solutions. An algorithm is also proposed which is capable of improving the solutions obtained by classical heuristics, by means of a stirring process of the nodes in solution trees. Classical heuristics and an enumerative method are used CIS comparison terms in the experimental analysis which demonstrates the goodness of the heuristic discussed in this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new heuristic for the Steiner minimal tree problem is presented. The method described is based on the detection of particular sets of nodes in networks, the “hot spot” sets, which are used to obtain better approximations of the optimal solutions. An algorithm is also proposed which is capable of improving the solutions obtained by classical heuristics, by means of a stirring process of the nodes in solution trees. Classical heuristics and an enumerative method are used as comparison terms in the experimental analysis which demonstrates the capability of the heuristic discussed

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A hybridised and Knowledge-based Evolutionary Algorithm (KEA) is applied to the multi-criterion minimum spanning tree problems. Hybridisation is used across its three phases. In the first phase a deterministic single objective optimization algorithm finds the extreme points of the Pareto front. In the second phase a K-best approach finds the first neighbours of the extreme points, which serve as an elitist parent population to an evolutionary algorithm in the third phase. A knowledge-based mutation operator is applied in each generation to reproduce individuals that are at least as good as the unique parent. The advantages of KEA over previous algorithms include its speed (making it applicable to large real-world problems), its scalability to more than two criteria, and its ability to find both the supported and unsupported optimal solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current forest growth models and yield tables are almost exclusively based on data from mature trees, reducing their applicability to young and developing stands. To address this gap, young European beech, sessile oak, Scots pine and Norway spruce trees approximately 0 to 10 years old were destructively sampled in a range of naturally regenerated forest stands in Central Europe. Diameter at base and height were first measured in situ for up to 175 individuals per species. Subsequently, the trees were excavated and dry biomass of foliage, branches, stems and roots was measured. Allometric relations were then used to calculate biomass allocation coefficients (BAC) and growth efficiency (GE) patterns in young trees. We found large differences in BAC and GE between broadleaves and conifers, but also between species within these categories. Both BAC and GE are strongly age-specific in young trees, their rapidly changing values reflecting different growth strategies in the earliest stages of growth. We show that linear relationships describing biomass allocation in older trees are not applicable in young trees. To accurately predict forest biomass and carbon stocks, forest growth models need to include species and age specific parameters of biomass allocation patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

According to climate change predictions, water availability might change dramatically in Europe and adjacent regions. This change will undoubtedly have an adverse effect on existing tree species and affect their ability to cope with a lack or an excess of water, changes in annual precipitation patterns, soil salinity and fire disturbance. The following chapter will describe tree species and proven-ances used in European forestry practice which are the most suitable to deal with water stress, salinity and fire. Each subchapter starts with a brief description of each of the stress factors and discusses the predictions of the likelihood of their occurrence in the near future according to the climate change scenarios. Tree spe-cies and their genotypes able to cope with particular stress factor, together with indication of their use by forest managers are then introduced in greater detail.