820 resultados para Euclidean distance
Resumo:
Spams and Phishing Scams are some of the abuse forms on the Internet that have grown up now. These abuses influence in user's routine of electronic mail and in the infrastructure of Internet communication. So, this paper proposes a new model messages filter based in Euclidian distance, beyond show the containment's methodologies currently more used. A new model messages filter, based in frequency's distribution of character present in your content and in signature generation is described. An architecture to combat Phishing Scam and spam is proposed in order to contribute to the containment of attempted fraud by mail.
Resumo:
In this work, we study a version of the general question of how well a Haar-distributed orthogonal matrix can be approximated by a random Gaussian matrix. Here, we consider a Gaussian random matrix (Formula presented.) of order n and apply to it the Gram–Schmidt orthonormalization procedure by columns to obtain a Haar-distributed orthogonal matrix (Formula presented.). If (Formula presented.) denotes the vector formed by the first m-coordinates of the ith row of (Formula presented.) and (Formula presented.), our main result shows that the Euclidean norm of (Formula presented.) converges exponentially fast to (Formula presented.), up to negligible terms. To show the extent of this result, we use it to study the convergence of the supremum norm (Formula presented.) and we find a coupling that improves by a factor (Formula presented.) the recently proved best known upper bound on (Formula presented.). Our main result also has applications in Quantum Information Theory.
Resumo:
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.
Resumo:
The p-median model is used to locate P facilities to serve a geographically distributed population. Conventionally, it is assumed that the population patronize the nearest facility and that the distance between the resident and the facility may be measured by the Euclidean distance. Carling, Han, and Håkansson (2012) compared two network distances with the Euclidean in a rural region witha sparse, heterogeneous network and a non-symmetric distribution of thepopulation. For a coarse network and P small, they found, in contrast to the literature, the Euclidean distance to be problematic. In this paper we extend their work by use of a refined network and study systematically the case when P is of varying size (2-100 facilities). We find that the network distance give as gooda solution as the travel-time network. The Euclidean distance gives solutions some 2-7 per cent worse than the network distances, and the solutions deteriorate with increasing P. Our conclusions extend to intra-urban location problems.
Resumo:
We develop spatial statistical models for stream networks that can estimate relationships between a response variable and other covariates, make predictions at unsampled locations, and predict an average or total for a stream or a stream segment. There have been very few attempts to develop valid spatial covariance models that incorporate flow, stream distance, or both. The application of typical spatial autocovariance functions based on Euclidean distance, such as the spherical covariance model, are not valid when using stream distance. In this paper we develop a large class of valid models that incorporate flow and stream distance by using spatial moving averages. These methods integrate a moving average function, or kernel, against a white noise process. By running the moving average function upstream from a location, we develop models that use flow, and by construction they are valid models based on stream distance. We show that with proper weighting, many of the usual spatial models based on Euclidean distance have a counterpart for stream networks. Using sulfate concentrations from an example data set, the Maryland Biological Stream Survey (MBSS), we show that models using flow may be more appropriate than models that only use stream distance. For the MBSS data set, we use restricted maximum likelihood to fit a valid covariance matrix that uses flow and stream distance, and then we use this covariance matrix to estimate fixed effects and make kriging and block kriging predictions.
Resumo:
The Iterative Closest Point algorithm (ICP) is commonly used in engineering applications to solve the rigid registration problem of partially overlapped point sets which are pre-aligned with a coarse estimate of their relative positions. This iterative algorithm is applied in many areas such as the medicine for volumetric reconstruction of tomography data, in robotics to reconstruct surfaces or scenes using range sensor information, in industrial systems for quality control of manufactured objects or even in biology to study the structure and folding of proteins. One of the algorithm’s main problems is its high computational complexity (quadratic in the number of points with the non-optimized original variant) in a context where high density point sets, acquired by high resolution scanners, are processed. Many variants have been proposed in the literature whose goal is the performance improvement either by reducing the number of points or the required iterations or even enhancing the complexity of the most expensive phase: the closest neighbor search. In spite of decreasing its complexity, some of the variants tend to have a negative impact on the final registration precision or the convergence domain thus limiting the possible application scenarios. The goal of this work is the improvement of the algorithm’s computational cost so that a wider range of computationally demanding problems from among the ones described before can be addressed. For that purpose, an experimental and mathematical convergence analysis and validation of point-to-point distance metrics has been performed taking into account those distances with lower computational cost than the Euclidean one, which is used as the de facto standard for the algorithm’s implementations in the literature. In that analysis, the functioning of the algorithm in diverse topological spaces, characterized by different metrics, has been studied to check the convergence, efficacy and cost of the method in order to determine the one which offers the best results. Given that the distance calculation represents a significant part of the whole set of computations performed by the algorithm, it is expected that any reduction of that operation affects significantly and positively the overall performance of the method. As a result, a performance improvement has been achieved by the application of those reduced cost metrics whose quality in terms of convergence and error has been analyzed and validated experimentally as comparable with respect to the Euclidean distance using a heterogeneous set of objects, scenarios and initial situations.
Resumo:
Objectives: The aim of this work was to verify the differentiation between normal and pathological human carotid artery tissues by using fluorescence and reflectance spectroscopy in the 400- to 700-nm range and the spectral characterization by means of principal components analysis. Background Data: Atherosclerosis is the most common and serious pathology of the cardiovascular system. Principal components represent the main spectral characteristics that occur within the spectral data and could be used for tissue classification. Materials and Methods: Sixty postmortem carotid artery fragments (26 non-atherosclerotic and 34 atherosclerotic with non-calcified plaques) were studied. The excitation radiation consisted of a 488-nm argon laser. Two 600-mu m core optical fibers were used, one for excitation and one to collect the fluorescence radiation from the samples. The reflectance system was composed of a halogen lamp coupled to an excitation fiber positioned in one of the ports of an integrating sphere that delivered 5 mW to the sample. The photo-reflectance signal was coupled to a 1/4-m spectrograph via an optical fiber. Euclidean distance was then used to classify each principal component score into one of two classes, normal and atherosclerotic tissue, for both fluorescence and reflectance. Results: The principal components analysis allowed classification of the samples with 81% sensitivity and 88% specificity for fluorescence, and 81% sensitivity and 91% specificity for reflectance. Conclusions: Our results showed that principal components analysis could be applied to differentiate between normal and atherosclerotic tissue with high sensitivity and specificity.
Resumo:
In this paper the continuous Verhulst dynamic model is used to synthesize a new distributed power control algorithm (DPCA) for use in direct sequence code division multiple access (DS-CDMA) systems. The Verhulst model was initially designed to describe the population growth of biological species under food and physical space restrictions. The discretization of the corresponding differential equation is accomplished via the Euler numeric integration (ENI) method. Analytical convergence conditions for the proposed DPCA are also established. Several properties of the proposed recursive algorithm, such as Euclidean distance from optimum vector after convergence, convergence speed, normalized mean squared error (NSE), average power consumption per user, performance under dynamics channels, and implementation complexity aspects, are analyzed through simulations. The simulation results are compared with two other DPCAs: the classic algorithm derived by Foschini and Miljanic and the sigmoidal of Uykan and Koivo. Under estimated errors conditions, the proposed DPCA exhibits smaller discrepancy from the optimum power vector solution and better convergence (under fixed and adaptive convergence factor) than the classic and sigmoidal DPCAs. (C) 2010 Elsevier GmbH. All rights reserved.
Resumo:
Studies on the feeding habits of aquatic organisms are a requirement for the management and sustainable use of marine ecosystems. The aim of the present research was to analyze the habits and trophic similarities of decapods, starfish and fish in order to propose trophic relationships between taxa, using Hennigian methods of phylogenetic systematics. This new grouping hypothesis, based on shared and exclusive food items and food types, corresponds to the broad taxonomic groups used in the analysis. Our results indicate that algae, Mollusca, Polychaeta, Crustacea, Echinodermata and Actinopterygii are the most exploited common resources among the species studied. Starfish were differentiated from other organisms for being stenophagic, and were grouped for feeding on bivalve mollusks. A larger group of fish and crustaceans shares algae and mainly crustaceans as food items. A third group united all eight species of Actinopterygii. This largest subgroup of fish is typically carnivorous, feeding on Anthozoa and a great quantity of Crustacea. Synodus foetens has a special position among fishes, due to its unique feeding on nematodes. A Euclidean distance dendrogram obtained in a previous publication grouped S. foetens with starfish. That result was based on a few non-exclusive shared similarities in feeding modes, as well as on shared absences of items, which are not an adequate grouping factor. Starfish are stenophagic, eating bivalves almost exclusively. Synodus foetens and Isopisthus parvipinnis have restricted food items, and are thus intermediary in relation to starfish, decapods, and other fish, which are euryphagous. The trophic cladogram displays details of food items, whether or not shared by all species. The resulting trophic analysis is consistent with known historical relationships.
Resumo:
Dissertation to Obtain the Degree of Master in Biomedical Engineering
Resumo:
O objetivo desta dissertação foi estudar um conjunto de empresas cotadas na bolsa de valores de Lisboa, para identificar aquelas que têm um comportamento semelhante ao longo do tempo. Para isso utilizamos algoritmos de Clustering tais como K-Means, PAM, Modelos hierárquicos, Funny e C-Means tanto com a distância euclidiana como com a distância de Manhattan. Para selecionar o melhor número de clusters identificado por cada um dos algoritmos testados, recorremos a alguns índices de avaliação/validação de clusters como o Davies Bouldin e Calinski-Harabasz entre outros.
Resumo:
Report for the scientific sojourn at the University of Bern, Swiss, from Mars until June 2008. Writer identification consists in determining the writer of a piece of handwriting from a set of writers. Even though an important amount of compositions contains handwritten text in the music scores, the aim of the work is to use only music notation to determine the author. It’s been developed two approaches for writer identification in old handwritten music scores. The methods proposed extract features from every music line, and also features from a texture image of music symbols. First of all, the music sheet is first preprocessed for obtaining a binarized music score without the staff lines. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates.
Resumo:
Quantitative or algorithmic trading is the automatization of investments decisions obeying a fixed or dynamic sets of rules to determine trading orders. It has increasingly made its way up to 70% of the trading volume of one of the biggest financial markets such as the New York Stock Exchange (NYSE). However, there is not a signi cant amount of academic literature devoted to it due to the private nature of investment banks and hedge funds. This projects aims to review the literature and discuss the models available in a subject that publications are scarce and infrequently. We review the basic and fundamental mathematical concepts needed for modeling financial markets such as: stochastic processes, stochastic integration and basic models for prices and spreads dynamics necessary for building quantitative strategies. We also contrast these models with real market data with minutely sampling frequency from the Dow Jones Industrial Average (DJIA). Quantitative strategies try to exploit two types of behavior: trend following or mean reversion. The former is grouped in the so-called technical models and the later in the so-called pairs trading. Technical models have been discarded by financial theoreticians but we show that they can be properly cast into a well defined scientific predictor if the signal generated by them pass the test of being a Markov time. That is, we can tell if the signal has occurred or not by examining the information up to the current time; or more technically, if the event is F_t-measurable. On the other hand the concept of pairs trading or market neutral strategy is fairly simple. However it can be cast in a variety of mathematical models ranging from a method based on a simple euclidean distance, in a co-integration framework or involving stochastic differential equations such as the well-known Ornstein-Uhlenbeck mean reversal ODE and its variations. A model for forecasting any economic or financial magnitude could be properly defined with scientific rigor but it could also lack of any economical value and be considered useless from a practical point of view. This is why this project could not be complete without a backtesting of the mentioned strategies. Conducting a useful and realistic backtesting is by no means a trivial exercise since the \laws" that govern financial markets are constantly evolving in time. This is the reason because we make emphasis in the calibration process of the strategies' parameters to adapt the given market conditions. We find out that the parameters from technical models are more volatile than their counterpart form market neutral strategies and calibration must be done in a high-frequency sampling manner to constantly track the currently market situation. As a whole, the goal of this project is to provide an overview of a quantitative approach to investment reviewing basic strategies and illustrating them by means of a back-testing with real financial market data. The sources of the data used in this project are Bloomberg for intraday time series and Yahoo! for daily prices. All numeric computations and graphics used and shown in this project were implemented in MATLAB^R scratch from scratch as a part of this thesis. No other mathematical or statistical software was used.
Resumo:
A statistical methodology for the objective comparison of LDI-MS mass spectra of blue gel pen inks was evaluated. Thirty-three blue gel pen inks previously studied by RAMAN were analyzed directly on the paper using both positive and negative mode. The obtained mass spectra were first compared using relative areas of selected peaks using the Pearson correlation coefficient and the Euclidean distance. Intra-variability among results from one ink and inter-variability between results from different inks were compared in order to choose a differentiation threshold minimizing the rate of false negative (i.e. avoiding false differentiation of the inks). This yielded a discriminating power of up to 77% for analysis made in the negative mode. The whole mass spectra were then compared using the same methodology, allowing for a better DP in the negative mode of 92% using the Pearson correlation on standardized data. The positive mode results generally yielded a lower differential power (DP) than the negative mode due to a higher intra-variability compared to the inter-variability in the mass spectra of the ink samples.
Resumo:
The complex relationship between structural and functional connectivity, as measured by noninvasive imaging of the human brain, poses many unresolved challenges and open questions. Here, we apply analytic measures of network communication to the structural connectivity of the human brain and explore the capacity of these measures to predict resting-state functional connectivity across three independently acquired datasets. We focus on the layout of shortest paths across the network and on two communication measures-search information and path transitivity-which account for how these paths are embedded in the rest of the network. Search information is an existing measure of information needed to access or trace shortest paths; we introduce path transitivity to measure the density of local detours along the shortest path. We find that both search information and path transitivity predict the strength of functional connectivity among both connected and unconnected node pairs. They do so at levels that match or significantly exceed path length measures, Euclidean distance, as well as computational models of neural dynamics. This capacity suggests that dynamic couplings due to interactions among neural elements in brain networks are substantially influenced by the broader network context adjacent to the shortest communication pathways.