975 resultados para Convex extendable trees


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The success rate of carrier phase ambiguity resolution (AR) is the probability that the ambiguities are successfully fixed to their correct integer values. In existing works, an exact success rate formula for integer bootstrapping estimator has been used as a sharp lower bound for the integer least squares (ILS) success rate. Rigorous computation of success rate for the more general ILS solutions has been considered difficult, because of complexity of the ILS ambiguity pull-in region and computational load of the integration of the multivariate probability density function. Contributions of this work are twofold. First, the pull-in region mathematically expressed as the vertices of a polyhedron is represented by a multi-dimensional grid, at which the cumulative probability can be integrated with the multivariate normal cumulative density function (mvncdf) available in Matlab. The bivariate case is studied where the pull-region is usually defined as a hexagon and the probability is easily obtained using mvncdf at all the grid points within the convex polygon. Second, the paper compares the computed integer rounding and integer bootstrapping success rates, lower and upper bounds of the ILS success rates to the actual ILS AR success rates obtained from a 24 h GPS data set for a 21 km baseline. The results demonstrate that the upper bound probability of the ILS AR probability given in the existing literatures agrees with the actual ILS success rate well, although the success rate computed with integer bootstrapping method is a quite sharp approximation to the actual ILS success rate. The results also show that variations or uncertainty of the unit–weight variance estimates from epoch to epoch will affect the computed success rates from different methods significantly, thus deserving more attentions in order to obtain useful success probability predictions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Developing safe and sustainable road systems is a common goal in all countries. Applications to assist with road asset management and crash minimization are sought universally. This paper presents a data mining methodology using decision trees for modeling the crash proneness of road segments using available road and crash attributes. The models quantify the concept of crash proneness and demonstrate that road segments with only a few crashes have more in common with non-crash roads than roads with higher crash counts. This paper also examines ways of dealing with highly unbalanced data sets encountered in the study.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Road crashes cost world and Australian society a significant proportion of GDP, affecting productivity and causing significant suffering for communities and individuals. This paper presents a case study that generates data mining models that contribute to understanding of road crashes by allowing examination of the role of skid resistance (F60) and other road attributes in road crashes. Predictive data mining algorithms, primarily regression trees, were used to produce road segment crash count models from the road and traffic attributes of crash scenarios. The rules derived from the regression trees provide evidence of the significance of road attributes in contributing to crash, with a focus on the evaluation of skid resistance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

All Australian governments recognize the need to ensure that land and natural resources are used sustainably. In this context, ‘resources’ includes natural resources found on land such as trees and other vegetation, fauna, soil and minerals, and cultural resources found on land such as archaeological sites and artefacts. Regulators use a wide range of techniques to promote sustainability. To achieve their objectives, they may, for example, create economic incentives through bounties, grants and subsidies, encourage the development of self-regulatory codes, or enter into agreements with landowners specifying how the land is to be managed. A common way of regulating is by making administrative orders, determinations or decisions under powers given to regulators by Acts of Parliament (statutes) or by regulations (delegated legislation). Generally the legislation provides for specified rights or duties, and authorises a regulator to make an order or decision to apply the legislative provisions to particular land or cases. For example, legislation might empower a regulator to make an order that requires the owner of a contaminated site to remediate it. When the regulator exercises the power by making an order in relation to particular land, the owner is placed under a statutory duty to remediate. When regulators exercise their statutory powers to manage the use of private land or natural or cultural resources on private land, property law issues can arise. The owner of land has a private property right that the law will enforce against anybody else who interferes with the enjoyment of the right, without legal authority to do so. The law dealing with the enforcement of private property rights forms part of private law. This report focuses on the relationship between the law of private property and the regulation of land and resources by legislation and by administrative decisions made under powers given by legislation (statutory powers).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary. Peter L. Bartlett, Alexander Rakhlin

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction. Results Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia. Conclusion We describe an Australian origin for B. pseudomallei, characterized by a single introduction event into Southeast Asia during a recent glacial period, and variable levels of lateral gene transfer within populations. These patterns provide insights into mechanisms of genetic diversification in B. pseudomallei and its closest relatives, and provide a framework for integrating the traditionally separate fields of population genetics and phylogenetics for other bacterial species with high levels of lateral gene transfer.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The role of ions in the production of atmospheric particles has gained wide interest due to their profound impact on climate. Away from anthropogenic sources, molecules are ionized by alpha radiation from radon exhaled from the ground and cosmic gamma radiation from space. These molecular ions quickly form into ‘cluster ions’, typically smaller than about 1.5 nm. Using our measurements and the published literature, we present evidence to show that cluster ion concentrations in forest areas are consistently higher than outside. Since alpha radiation cannot penetrate more than a few centimetres of soil, radon present deep in the ground cannot directly contribute to the measured cluster ion concentrations. We propose an additional mechanism whereby radon, which is water soluble, is brought up by trees and plants through the uptake of groundwater and released into the atmosphere by transpiration. We estimate that, in a forest comprising eucalyptus trees spaced 4m apart, approximately 28% of the radon in the air may be released by transpiration. Considering that 24% of the earth’s land area is still covered in forests; these findings have potentially important implications for atmospheric aerosol formation and climate.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Operation in urban environments creates unique challenges for research in autonomous ground vehicles. Due to the presence of tall trees and buildings in close proximity to traversable areas, GPS outage is likely to be frequent and physical hazards pose real threats to autonomous systems. In this paper, we describe a novel autonomous platform developed by the Sydney-Berkeley Driving Team for entry into the 2007 DARPA Urban Challenge competition. We report empirical results analyzing the performance of the vehicle while navigating a 560-meter test loop multiple times in an actual urban setting with severe GPS outage. We show that our system is robust against failure of global position estimates and can reliably traverse standard two-lane road networks using vision for localization. Finally, we discuss ongoing efforts in fusing vision data with other sensing modalities.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0–1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0–1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function—that it satisfies a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise, and show that in this case, strictly convex loss functions lead to faster rates of convergence of the risk than would be implied by standard uniform convergence arguments. Finally, we present applications of our results to the estimation of convergence rates in function classes that are scaled convex hulls of a finite-dimensional base class, with a variety of commonly used loss functions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive semidefinite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space - classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm -using the labeled part of the data one can learn an embedding also for the unlabeled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Furthermore, this approach leads directly to a convex method for learning the 2-norm soft margin parameter in support vector machines, solving an important open problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present new expected risk bounds for binary and multiclass prediction, and resolve several recent conjectures on sample compressibility due to Kuzmin and Warmuth. By exploiting the combinatorial structure of concept class F, Haussler et al. achieved a VC(F)/n bound for the natural one-inclusion prediction strategy. The key step in their proof is a d = VC(F) bound on the graph density of a subgraph of the hypercube—oneinclusion graph. The first main result of this paper is a density bound of n [n−1 <=d-1]/[n <=d] < d, which positively resolves a conjecture of Kuzmin and Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved one-inclusion mistake bound. The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is an algebraic topological property of maximum classes of VC-dimension d as being d contractible simplicial complexes, extending the well-known characterization that d = 1 maximum classes are trees. We negatively resolve a minimum degree conjecture of Kuzmin and Warmuth—the second part to a conjectured proof of correctness for Peeling—that every class has one-inclusion minimum degree at most its VCdimension. Our final main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This result improves on known PAC-based expected risk bounds by a factor of O(logn) and is shown to be optimal up to an O(logk) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.