953 resultados para Euclidean distance model,


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spams and Phishing Scams are some of the abuse forms on the Internet that have grown up now. These abuses influence in user's routine of electronic mail and in the infrastructure of Internet communication. So, this paper proposes a new model messages filter based in Euclidian distance, beyond show the containment's methodologies currently more used. A new model messages filter, based in frequency's distribution of character present in your content and in signature generation is described. An architecture to combat Phishing Scam and spam is proposed in order to contribute to the containment of attempted fraud by mail.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Work Project, presented as part of the requirements for the Award of a Masters Degree in Management from the NOVA – School of Business and Economics

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 62P15.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, we study a version of the general question of how well a Haar-distributed orthogonal matrix can be approximated by a random Gaussian matrix. Here, we consider a Gaussian random matrix (Formula presented.) of order n and apply to it the Gram–Schmidt orthonormalization procedure by columns to obtain a Haar-distributed orthogonal matrix (Formula presented.). If (Formula presented.) denotes the vector formed by the first m-coordinates of the ith row of (Formula presented.) and (Formula presented.), our main result shows that the Euclidean norm of (Formula presented.) converges exponentially fast to (Formula presented.), up to negligible terms. To show the extent of this result, we use it to study the convergence of the supremum norm (Formula presented.) and we find a coupling that improves by a factor (Formula presented.) the recently proved best known upper bound on (Formula presented.). Our main result also has applications in Quantum Information Theory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

India is the largest producer and processor of cashew in the world. The export value of cashew is about Rupees 2600 crore during 2004-05. Kerala is the main processing and exporting center of cashew. In Kerala most of the cashew processing factories are located in Kollam district. The industry provides livelihood for about 6-7 lakhs of employees and farmers, the cashew industry has national importance. In Kollam district alone there are more than 2.5 lakhs employees directly involved in the industry, which comes about 10 per cent of the population of the district, out of which 95 per cent are women workers. It is a fact that any amount received by a woman worker will be utilized directly for the benefit of the family and hence the link relating to family welfare is quite clear. Even though the Government of Kerala has incorporated the Kerala State Cashew Development Corporation (KSCDC) and Kerala State Cashew Workers Apex Industrial Co—operative Society (CAPEX) to develop the Cashew industry, the cashew industry and ancillary industries did not grow as per the expectation. In this context, an attempt has been made to analyze the problems and potential of the industry so as to make the industry viable and sustainable for the perpetual employment and income generation as well as the overall development of the Kollam district.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper the continuous Verhulst dynamic model is used to synthesize a new distributed power control algorithm (DPCA) for use in direct sequence code division multiple access (DS-CDMA) systems. The Verhulst model was initially designed to describe the population growth of biological species under food and physical space restrictions. The discretization of the corresponding differential equation is accomplished via the Euler numeric integration (ENI) method. Analytical convergence conditions for the proposed DPCA are also established. Several properties of the proposed recursive algorithm, such as Euclidean distance from optimum vector after convergence, convergence speed, normalized mean squared error (NSE), average power consumption per user, performance under dynamics channels, and implementation complexity aspects, are analyzed through simulations. The simulation results are compared with two other DPCAs: the classic algorithm derived by Foschini and Miljanic and the sigmoidal of Uykan and Koivo. Under estimated errors conditions, the proposed DPCA exhibits smaller discrepancy from the optimum power vector solution and better convergence (under fixed and adaptive convergence factor) than the classic and sigmoidal DPCAs. (C) 2010 Elsevier GmbH. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a method based on articulated models for the registration of spine data extracted from multimodal medical images of patients with scoliosis. With the ultimate aim being the development of a complete geometrical model of the torso of a scoliotic patient, this work presents a method for the registration of vertebral column data using 3D magnetic resonance images (MRI) acquired in prone position and X-ray data acquired in standing position for five patients with scoliosis. The 3D shape of the vertebrae is estimated from both image modalities for each patient, and an articulated model is used in order to calculate intervertebral transformations required in order to align the vertebrae between both postures. Euclidean distances between anatomical landmarks are calculated in order to assess multimodal registration error. Results show a decrease in the Euclidean distance using the proposed method compared to rigid registration and more physically realistic vertebrae deformations compared to thin-plate-spline (TPS) registration thus improving alignment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The p-median model is used to locate P facilities to serve a geographically distributed population. Conventionally, it is assumed that the population patronize the nearest facility and that the distance between the resident and the facility may be measured by the Euclidean distance. Carling, Han, and Håkansson (2012) compared two network distances with the Euclidean in a rural region witha sparse, heterogeneous network and a non-symmetric distribution of thepopulation. For a coarse network and P small, they found, in contrast to the literature, the Euclidean distance to be problematic. In this paper we extend their work by use of a refined network and study systematically the case when P is of varying size (2-100 facilities). We find that the network distance give as gooda solution as the travel-time network. The Euclidean distance gives solutions some 2-7 per cent worse than the network distances, and the solutions deteriorate with increasing P. Our conclusions extend to intra-urban location problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop spatial statistical models for stream networks that can estimate relationships between a response variable and other covariates, make predictions at unsampled locations, and predict an average or total for a stream or a stream segment. There have been very few attempts to develop valid spatial covariance models that incorporate flow, stream distance, or both. The application of typical spatial autocovariance functions based on Euclidean distance, such as the spherical covariance model, are not valid when using stream distance. In this paper we develop a large class of valid models that incorporate flow and stream distance by using spatial moving averages. These methods integrate a moving average function, or kernel, against a white noise process. By running the moving average function upstream from a location, we develop models that use flow, and by construction they are valid models based on stream distance. We show that with proper weighting, many of the usual spatial models based on Euclidean distance have a counterpart for stream networks. Using sulfate concentrations from an example data set, the Maryland Biological Stream Survey (MBSS), we show that models using flow may be more appropriate than models that only use stream distance. For the MBSS data set, we use restricted maximum likelihood to fit a valid covariance matrix that uses flow and stream distance, and then we use this covariance matrix to estimate fixed effects and make kriging and block kriging predictions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Quantitative or algorithmic trading is the automatization of investments decisions obeying a fixed or dynamic sets of rules to determine trading orders. It has increasingly made its way up to 70% of the trading volume of one of the biggest financial markets such as the New York Stock Exchange (NYSE). However, there is not a signi cant amount of academic literature devoted to it due to the private nature of investment banks and hedge funds. This projects aims to review the literature and discuss the models available in a subject that publications are scarce and infrequently. We review the basic and fundamental mathematical concepts needed for modeling financial markets such as: stochastic processes, stochastic integration and basic models for prices and spreads dynamics necessary for building quantitative strategies. We also contrast these models with real market data with minutely sampling frequency from the Dow Jones Industrial Average (DJIA). Quantitative strategies try to exploit two types of behavior: trend following or mean reversion. The former is grouped in the so-called technical models and the later in the so-called pairs trading. Technical models have been discarded by financial theoreticians but we show that they can be properly cast into a well defined scientific predictor if the signal generated by them pass the test of being a Markov time. That is, we can tell if the signal has occurred or not by examining the information up to the current time; or more technically, if the event is F_t-measurable. On the other hand the concept of pairs trading or market neutral strategy is fairly simple. However it can be cast in a variety of mathematical models ranging from a method based on a simple euclidean distance, in a co-integration framework or involving stochastic differential equations such as the well-known Ornstein-Uhlenbeck mean reversal ODE and its variations. A model for forecasting any economic or financial magnitude could be properly defined with scientific rigor but it could also lack of any economical value and be considered useless from a practical point of view. This is why this project could not be complete without a backtesting of the mentioned strategies. Conducting a useful and realistic backtesting is by no means a trivial exercise since the \laws" that govern financial markets are constantly evolving in time. This is the reason because we make emphasis in the calibration process of the strategies' parameters to adapt the given market conditions. We find out that the parameters from technical models are more volatile than their counterpart form market neutral strategies and calibration must be done in a high-frequency sampling manner to constantly track the currently market situation. As a whole, the goal of this project is to provide an overview of a quantitative approach to investment reviewing basic strategies and illustrating them by means of a back-testing with real financial market data. The sources of the data used in this project are Bloomberg for intraday time series and Yahoo! for daily prices. All numeric computations and graphics used and shown in this project were implemented in MATLAB^R scratch from scratch as a part of this thesis. No other mathematical or statistical software was used.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Two likelihood ratio (LR) approaches are presented to evaluate the strength of evidence of MDMA tablet comparisons. The first one is based on a more 'traditional' comparison of MDMA tablets by using distance measures (e.g., Pearson correlation distance or a Euclidean distance). In this approach, LRs are calculated using the distribution of distances between tablets of the same-batch and that of different-batches. The second approach is based on methods used in some other fields of forensic comparison. Here LRs are calculated based on the distribution of values of MDMA tablet characteristics within a specific batch and from all batches. The data used in this paper must be seen as examples to illustrate both methods. In future research the methods can be applied to other and more complex data. In this paper, the methods and their results are discussed, considering their performance in evidence evaluation and several practical aspects. With respect to evidence in favor of the correct hypothesis, the second method proved to be better than the first one. It is shown that the LRs in same-batch comparisons are generally higher compared to the first method and the LRs in different-batch comparisons are generally lower. On the other hand, for operational purposes (where quick information is needed), the first method may be preferred, because it is less time consuming. With this method a model has to be estimated only once in a while, which means that only a few measurements have to be done, while with the second method more measurements are needed because each time a new model has to be estimated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dans un premier temps, nous avons modélisé la structure d’une famille d’ARN avec une grammaire de graphes afin d’identifier les séquences qui en font partie. Plusieurs autres méthodes de modélisation ont été développées, telles que des grammaires stochastiques hors-contexte, des modèles de covariance, des profils de structures secondaires et des réseaux de contraintes. Ces méthodes de modélisation se basent sur la structure secondaire classique comparativement à nos grammaires de graphes qui se basent sur les motifs cycliques de nucléotides. Pour exemplifier notre modèle, nous avons utilisé la boucle E du ribosome qui contient le motif Sarcin-Ricin qui a été largement étudié depuis sa découverte par cristallographie aux rayons X au début des années 90. Nous avons construit une grammaire de graphes pour la structure du motif Sarcin-Ricin et avons dérivé toutes les séquences qui peuvent s’y replier. La pertinence biologique de ces séquences a été confirmée par une comparaison des séquences d’un alignement de plus de 800 séquences ribosomiques bactériennes. Cette comparaison a soulevée des alignements alternatifs pour quelques unes des séquences que nous avons supportés par des prédictions de structures secondaires et tertiaires. Les motifs cycliques de nucléotides ont été observés par les membres de notre laboratoire dans l'ARN dont la structure tertiaire a été résolue expérimentalement. Une étude des séquences et des structures tertiaires de chaque cycle composant la structure du Sarcin-Ricin a révélé que l'espace des séquences dépend grandement des interactions entre tous les nucléotides à proximité dans l’espace tridimensionnel, c’est-à-dire pas uniquement entre deux paires de bases adjacentes. Le nombre de séquences générées par la grammaire de graphes est plus petit que ceux des méthodes basées sur la structure secondaire classique. Cela suggère l’importance du contexte pour la relation entre la séquence et la structure, d’où l’utilisation d’une grammaire de graphes contextuelle plus expressive que les grammaires hors-contexte. Les grammaires de graphes que nous avons développées ne tiennent compte que de la structure tertiaire et négligent les interactions de groupes chimiques spécifiques avec des éléments extra-moléculaires, comme d’autres macromolécules ou ligands. Dans un deuxième temps et pour tenir compte de ces interactions, nous avons développé un modèle qui tient compte de la position des groupes chimiques à la surface des structures tertiaires. L’hypothèse étant que les groupes chimiques à des positions conservées dans des séquences prédéterminées actives, qui sont déplacés dans des séquences inactives pour une fonction précise, ont de plus grandes chances d’être impliqués dans des interactions avec des facteurs. En poursuivant avec l’exemple de la boucle E, nous avons cherché les groupes de cette boucle qui pourraient être impliqués dans des interactions avec des facteurs d'élongation. Une fois les groupes identifiés, on peut prédire par modélisation tridimensionnelle les séquences qui positionnent correctement ces groupes dans leurs structures tertiaires. Il existe quelques modèles pour adresser ce problème, telles que des descripteurs de molécules, des matrices d’adjacences de nucléotides et ceux basé sur la thermodynamique. Cependant, tous ces modèles utilisent une représentation trop simplifiée de la structure d’ARN, ce qui limite leur applicabilité. Nous avons appliqué notre modèle sur les structures tertiaires d’un ensemble de variants d’une séquence d’une instance du Sarcin-Ricin d’un ribosome bactérien. L’équipe de Wool à l’université de Chicago a déjà étudié cette instance expérimentalement en testant la viabilité de 12 variants. Ils ont déterminé 4 variants viables et 8 létaux. Nous avons utilisé cet ensemble de 12 séquences pour l’entraînement de notre modèle et nous avons déterminé un ensemble de propriétés essentielles à leur fonction biologique. Pour chaque variant de l’ensemble d’entraînement nous avons construit des modèles de structures tertiaires. Nous avons ensuite mesuré les charges partielles des atomes exposés sur la surface et encodé cette information dans des vecteurs. Nous avons utilisé l’analyse des composantes principales pour transformer les vecteurs en un ensemble de variables non corrélées, qu’on appelle les composantes principales. En utilisant la distance Euclidienne pondérée et l’algorithme du plus proche voisin, nous avons appliqué la technique du « Leave-One-Out Cross-Validation » pour choisir les meilleurs paramètres pour prédire l’activité d’une nouvelle séquence en la faisant correspondre à ces composantes principales. Finalement, nous avons confirmé le pouvoir prédictif du modèle à l’aide d’un nouvel ensemble de 8 variants dont la viabilité à été vérifiée expérimentalement dans notre laboratoire. En conclusion, les grammaires de graphes permettent de modéliser la relation entre la séquence et la structure d’un élément structural d’ARN, comme la boucle E contenant le motif Sarcin-Ricin du ribosome. Les applications vont de la correction à l’aide à l'alignement de séquences jusqu’au design de séquences ayant une structure prédéterminée. Nous avons également développé un modèle pour tenir compte des interactions spécifiques liées à une fonction biologique donnée, soit avec des facteurs environnants. Notre modèle est basé sur la conservation de l'exposition des groupes chimiques qui sont impliqués dans ces interactions. Ce modèle nous a permis de prédire l’activité biologique d’un ensemble de variants de la boucle E du ribosome qui se lie à des facteurs d'élongation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis addresses one of the emerging topics in Sonar Signal Processing.,viz.the implementation of a target classifier for the noise sources in the ocean, as the operator assisted classification turns out to be tedious,laborious and time consuming.In the work reported in this thesis,various judiciously chosen components of the feature vector are used for realizing the newly proposed Hierarchical Target Trimming Model.The performance of the proposed classifier has been compared with the Euclidean distance and Fuzzy K-Nearest Neighbour Model classifiers and is found to have better success rates.The procedures for generating the Target Feature Record or the Feature vector from the spectral,cepstral and bispectral features have also been suggested.The Feature vector ,so generated from the noise data waveform is compared with the feature vectors available in the knowledge base and the most matching pattern is identified,for the purpose of target classification.In an attempt to improve the success rate of the Feature Vector based classifier,the proposed system has been augmented with the HMM based Classifier.Institutions where both the classifier decisions disagree,a contention resolving mechanism built around the DUET algorithm has been suggested.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A novel framework for multimodal semantic-associative collateral image labelling, aiming at associating image regions with textual keywords, is described. Both the primary image and collateral textual modalities are exploited in a cooperative and complementary fashion. The collateral content and context based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix, of the visual keywords, A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. Finally, we use Self Organising Maps to examine the classification and retrieval effectiveness of the proposed high-level image feature vector model which is constructed based on the image labelling results.