808 resultados para Distance metric


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The main objective of this letter is to formulate a new approach of learning a Mahalanobis distance metric for nearest neighbor regression from a training sample set. We propose a modified version of the large margin nearest neighbor metric learning method to deal with regression problems. As an application, the prediction of post-operative trunk 3-D shapes in scoliosis surgery using nearest neighbor regression is described. Accuracy of the proposed method is quantitatively evaluated through experiments on real medical data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The continuous growth of peer-to-peer networks has made them responsible for a considerable portion of the current Internet traffic. For this reason, improvements in P2P network resources usage are of central importance. One effective approach for addressing this issue is the deployment of locality algorithms, which allow the system to optimize the peers` selection policy for different network situations and, thus, maximize performance. To date, several locality algorithms have been proposed for use in P2P networks. However, they usually adopt heterogeneous criteria for measuring the proximity between peers, which hinders a coherent comparison between the different solutions. In this paper, we develop a thoroughly review of popular locality algorithms, based on three main characteristics: the adopted network architecture, distance metric, and resulting peer selection algorithm. As result of this study, we propose a novel and generic taxonomy for locality algorithms in peer-to-peer networks, aiming to enable a better and more coherent evaluation of any individual locality algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis submitted in the fulfillment of the requirements for the Degree of Master in Biomedical Engineering

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose using the affinity propagation (AP) clustering algorithm for detecting multiple disjoint shoals, and we present an extension of AP, denoted by STAP, that can be applied to shoals that fusion and fission across time. STAP incorporates into AP a soft temporal constraint that takes cluster dynamics into account, encouraging partitions obtained at successive time steps to be consistent with each other. We explore how STAP performs under different settings of its parameters (strength of the temporal constraint, preferences, and distance metric) by applying the algorithm to simulated sequences of collective coordinated motion. We study the validity of STAP by comparing its results to partitioning of the same data obtained from human observers in a controlled experiment. We observe that, under specific circumstances, AP yields partitions that agree quite closely with the ones made by human observers. We conclude that using the STAP algorithm with appropriate parameter settings is an appealing approach for detecting shoal fusion-fission dynamics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tässä työssä johdetaan lineaarimuunnoksella CIE x y z-värinsovitusfunktioista uudet värinsovitusfunktiot. Tarvittava muunnosmatriisi etsitään optimoimalla CIE ja BFD-RIT värieroellipsejä Matlab-ympäristössä. Työn tuloksena saatiin muunnosmatriisi, ja sillä muunnetut uudet värinsovitusfunktiot ja CIELAB-tyyppinen väriavaruus. Euklidisella etäisyydellä mitattuna CIE ja BFD-RIT värieroellipsien muoto ja koko paranivat noin kolmanneksen, mikä oli myös tavoitteena.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A new structure of Radial Basis Function (RBF) neural network called the Dual-orthogonal RBF Network (DRBF) is introduced for nonlinear time series prediction. The hidden nodes of a conventional RBF network compare the Euclidean distance between the network input vector and the centres, and the node responses are radially symmetrical. But in time series prediction where the system input vectors are lagged system outputs, which are usually highly correlated, the Euclidean distance measure may not be appropriate. The DRBF network modifies the distance metric by introducing a classification function which is based on the estimation data set. Training the DRBF networks consists of two stages. Learning the classification related basis functions and the important input nodes, followed by selecting the regressors and learning the weights of the hidden nodes. In both cases, a forward Orthogonal Least Squares (OLS) selection procedure is applied, initially to select the important input nodes and then to select the important centres. Simulation results of single-step and multi-step ahead predictions over a test data set are included to demonstrate the effectiveness of the new approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A new database of weather and circulation type catalogs is presented comprising 17 automated classification methods and five subjective classifications. It was compiled within COST Action 733 "Harmonisation and Applications of Weather Type Classifications for European regions" in order to evaluate different methods for weather and circulation type classification. This paper gives a technical description of the included methods using a new conceptual categorization for classification methods reflecting the strategy for the definition of types. Methods using predefined types include manual and threshold based classifications while methods producing types derived from the input data include those based on eigenvector techniques, leader algorithms and optimization algorithms. In order to allow direct comparisons between the methods, the circulation input data and the methods' configuration were harmonized for producing a subset of standard catalogs of the automated methods. The harmonization includes the data source, the climatic parameters used, the classification period as well as the spatial domain and the number of types. Frequency based characteristics of the resulting catalogs are presented, including variation of class sizes, persistence, seasonal and inter-annual variability as well as trends of the annual frequency time series. The methodological concept of the classifications is partly reflected by these properties of the resulting catalogs. It is shown that the types of subjective classifications compared to automated methods show higher persistence, inter-annual variation and long-term trends. Among the automated classifications optimization methods show a tendency for longer persistence and higher seasonal variation. However, it is also concluded that the distance metric used and the data preprocessing play at least an equally important role for the properties of the resulting classification compared to the algorithm used for type definition and assignment.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider the problem of illusory or artefactual structure from the visualisation of high-dimensional structureless data. In particular we examine the role of the distance metric in the use of topographic mappings based on the statistical field of multidimensional scaling. We show that the use of a squared Euclidean metric (i.e. the SSTRESs measure) gives rise to an annular structure when the input data is drawn from a high-dimensional isotropic distribution, and we provide a theoretical justification for this observation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We analyze an approach to a similarity preserving coding of symbol sequences based on neural distributed representations and show that it can be viewed as a metric embedding process.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Internet has become an integral part of our nation’s critical socio-economic infrastructure. With its heightened use and growing complexity however, organizations are at greater risk of cyber crimes. To aid in the investigation of crimes committed on or via the Internet, a network forensics analysis tool pulls together needed digital evidence. It provides a platform for performing deep network analysis by capturing, recording and analyzing network events to find out the source of a security attack or other information security incidents. Existing network forensics work has been mostly focused on the Internet and fixed networks. But the exponential growth and use of wireless technologies, coupled with their unprecedented characteristics, necessitates the development of new network forensic analysis tools. This dissertation fostered the emergence of a new research field in cellular and ad-hoc network forensics. It was one of the first works to identify this problem and offer fundamental techniques and tools that laid the groundwork for future research. In particular, it introduced novel methods to record network incidents and report logged incidents. For recording incidents, location is considered essential to documenting network incidents. However, in network topology spaces, location cannot be measured due to absence of a ‘distance metric’. Therefore, a novel solution was proposed to label locations of nodes within network topology spaces, and then to authenticate the identity of nodes in ad hoc environments. For reporting logged incidents, a novel technique based on Distributed Hash Tables (DHT) was adopted. Although the direct use of DHTs for reporting logged incidents would result in an uncontrollably recursive traffic, a new mechanism was introduced that overcome this recursive process. These logging and reporting techniques aided forensics over cellular and ad-hoc networks, which in turn increased their ability to track and trace attacks to their source. These techniques were a starting point for further research and development that would result in equipping future ad hoc networks with forensic components to complement existing security mechanisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Internet has become an integral part of our nation's critical socio-economic infrastructure. With its heightened use and growing complexity however, organizations are at greater risk of cyber crimes. To aid in the investigation of crimes committed on or via the Internet, a network forensics analysis tool pulls together needed digital evidence. It provides a platform for performing deep network analysis by capturing, recording and analyzing network events to find out the source of a security attack or other information security incidents. Existing network forensics work has been mostly focused on the Internet and fixed networks. But the exponential growth and use of wireless technologies, coupled with their unprecedented characteristics, necessitates the development of new network forensic analysis tools. This dissertation fostered the emergence of a new research field in cellular and ad-hoc network forensics. It was one of the first works to identify this problem and offer fundamental techniques and tools that laid the groundwork for future research. In particular, it introduced novel methods to record network incidents and report logged incidents. For recording incidents, location is considered essential to documenting network incidents. However, in network topology spaces, location cannot be measured due to absence of a 'distance metric'. Therefore, a novel solution was proposed to label locations of nodes within network topology spaces, and then to authenticate the identity of nodes in ad hoc environments. For reporting logged incidents, a novel technique based on Distributed Hash Tables (DHT) was adopted. Although the direct use of DHTs for reporting logged incidents would result in an uncontrollably recursive traffic, a new mechanism was introduced that overcome this recursive process. These logging and reporting techniques aided forensics over cellular and ad-hoc networks, which in turn increased their ability to track and trace attacks to their source. These techniques were a starting point for further research and development that would result in equipping future ad hoc networks with forensic components to complement existing security mechanisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Obnoxious single facility location models are models that have the aim to find the best location for an undesired facility. Undesired is usually expressed in relation to the so-called demand points that represent locations hindered by the facility. Because obnoxious facility location models as a rule are multimodal, the standard techniques of convex analysis used for locating desirable facilities in the plane may be trapped in local optima instead of the desired global optimum. It is assumed that having more optima coincides with being harder to solve. In this thesis the multimodality of obnoxious single facility location models is investigated in order to know which models are challenging problems in facility location problems and which are suitable for site selection. Selected for this are the obnoxious facility models that appear to be most important in literature. These are the maximin model, that maximizes the minimum distance from demand point to the obnoxious facility, the maxisum model, that maximizes the sum of distance from the demand points to the facility and the minisum model, that minimizes the sum of damage of the facility to the demand points. All models are measured with the Euclidean distances and some models also with the rectilinear distance metric. Furthermore a suitable algorithm is selected for testing multimodality. Of the tested algorithms in this thesis, Multistart is most appropriate. A small numerical experiment shows that Maximin models have on average the most optima, of which the model locating an obnoxious linesegment has the most. Maximin models have few optima and are thus not very hard to solve. From the Minisum models, the models that have the most optima are models that take wind into account. In general can be said that the generic models have less optima than the weighted versions. Models that are measured with the rectilinear norm do have more solutions than the same models measured with the Euclidean norm. This can be explained for the maximin models in the numerical example because the shape of the norm coincides with a bound of the feasible area, so not all solutions are different optima. The difference found in number of optima of the Maxisum and Minisum can not be explained by this phenomenon.