888 resultados para Random trees
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
Ensemble learning can be used to increase the overall classification accuracy of a classifier by generating multiple base classifiers and combining their classification results. A frequently used family of base classifiers for ensemble learning are decision trees. However, alternative approaches can potentially be used, such as the Prism family of algorithms that also induces classification rules. Compared with decision trees, Prism algorithms generate modular classification rules that cannot necessarily be represented in the form of a decision tree. Prism algorithms produce a similar classification accuracy compared with decision trees. However, in some cases, for example, if there is noise in the training and test data, Prism algorithms can outperform decision trees by achieving a higher classification accuracy. However, Prism still tends to overfit on noisy data; hence, ensemble learners have been adopted in this work to reduce the overfitting. This paper describes the development of an ensemble learner using a member of the Prism family as the base classifier to reduce the overfitting of Prism algorithms on noisy datasets. The developed ensemble classifier is compared with a stand-alone Prism classifier in terms of classification accuracy and resistance to noise.
Resumo:
The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The species of the sandy plains forests (forests of the ''restingas'') have not yet had their spatial patterns studied as aids to the understanding of the diversity found in the different physiognomies along the Brazilian coast. In this paper a 10 x 10 m quadrat framework laid in a hectare of a tree dominant forest in the sandy plains of the Picinguaba area of the Serra do Mar State Park (municipality of Ubatuba, state of São Paulo, Brazil) was used to assess the spatial pattern of distribution for the ten most important species : Pera glabrata, Euterpe edulis, Eugenia brasiliensis, Alchornea triplinervea, Guatteria australis, Myrcia racemosa, Jacaranda semiserrata, Guarea macrophylla, Euplassa cantareirae and Nectandra oppositifolia. The spatial patterns were inferred through the calculations of their T-Square Index (C) and Dispersal Distance Index (I). P. glabrata shows a random pattern, E. edulis aggregate, E. brasiliensis, A. triplinervia, G. australis, E. cantareirae and N. oppositifolia with a tendency between aggregate and uniform and, M. racemosa, J. semiserrata and G. macrophylla between aggregate and random. Although the indexes are dependent of the sample size and of the technique adjustments, the relationship of the pattern with the environmental factors is shown by clustering methods. The results give confirmation of how the spatial patterns bring associations between populations and shape of the vegetation physiognomy.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Real living cell is a complex system governed by many process which are not yet fully understood: the process of cell differentiation is one of these. In this thesis work we make use of a cell differentiation model to develop gene regulatory networks (Boolean networks) with desired differentiation dynamics. To accomplish this task we have introduced techniques of automatic design and we have performed experiments using various differentiation trees. The results obtained have shown that the developed algorithms, except the Random algorithm, are able to generate Boolean networks with interesting differentiation dynamics. Moreover, we have presented some possible future applications and developments of the cell differentiation model in robotics and in medical research. Understanding the mechanisms involved in biological cells can gives us the possibility to explain some not yet understood dangerous disease, i.e the cancer. Le cellula è un sistema complesso governato da molti processi ancora non pienamente compresi: il differenziamento cellulare è uno di questi. In questa tesi utilizziamo un modello di differenziamento cellulare per sviluppare reti di regolazione genica (reti Booleane) con dinamiche di differenziamento desiderate. Per svolgere questo compito abbiamo introdotto tecniche di progettazione automatica e abbiamo eseguito esperimenti utilizzando vari alberi di differenziamento. I risultati ottenuti hanno mostrato che gli algoritmi sviluppati, eccetto l'algoritmo Random, sono in grado di poter generare reti Booleane con dinamiche di differenziamento interessanti. Inoltre, abbiamo presentato alcune possibili applicazioni e sviluppi futuri del modello di differenziamento in robotica e nella ricerca medica. Capire i meccanismi alla base del funzionamento cellulare può fornirci la possibilità di spiegare patologie ancora oggi non comprese, come il cancro.
Resumo:
This paper presents a natural coordinate system for phylogenetic trees using a correspondence with the set of perfect matchings in the complete graph. This correspondence produces a distance between phylogenetic trees, and a way of enumerating all trees in a minimal step order. It is useful in randomized algorithms because it enables moves on the space of trees that make random optimization strategies “mix” quickly. It also promises a generalization to intermediary trees when data are not decisive as to their choice of tree, and a new way of constructing Bayesian priors on tree space.
Resumo:
In Australia more than 300 vertebrates, including 43 insectivorous bat species, depend on hollows in habitat trees for shelter, with many species using a network of multiple trees as roosts, We used roost-switching data on white-striped freetail bats (Tadarida australis; Microchiroptera: Molossidae) to construct a network representation of day roosts in suburban Brisbane, Australia. Bats were caught from a communal roost tree with a roosting group of several hundred individuals and released with transmitters. Each roost used by the bats represented a node in the network, and the movements of bats between roosts formed the links between nodes. Despite differences in gender and reproductive stages, the bats exhibited the same behavior throughout three radiotelemetry periods and over 500 bat days of radio tracking: each roosted in separate roosts, switched roosts very infrequently, and associated with other bats only at the communal roost This network resembled a scale-free network in which the distribution of the number of links from each roost followed a power law. Despite being spread over a large geographic area (> 200 km(2)), each roost was connected to others by less than three links. One roost (the hub or communal roost) defined the architecture of the network because it had the most links. That the network showed scale-free properties has profound implications for the management of the habitat trees of this roosting group. Scale-free networks provide high tolerance against stochastic events such as random roost removals but are susceptible to the selective removal of hub nodes. Network analysis is a useful tool for understanding the structural organization of habitat tree usage and allows the informed judgment of the relative importance of individual trees and hence the derivation of appropriate management decisions, Conservation planners and managers should emphasize the differential importance of habitat trees and think of them as being analogous to vital service centers in human societies.
Resumo:
We prove that a random Hilbert scheme that parametrizes the closed subschemes with a fixed Hilbert polynomial in some projective space is irreducible and nonsingular with probability greater than $0.5$. To consider the set of nonempty Hilbert schemes as a probability space, we transform this set into a disjoint union of infinite binary trees, reinterpreting Macaulay's classification of admissible Hilbert polynomials. Choosing discrete probability distributions with infinite support on the trees establishes our notion of random Hilbert schemes. To bound the probability that random Hilbert schemes are irreducible and nonsingular, we show that at least half of the vertices in the binary trees correspond to Hilbert schemes with unique Borel-fixed points.