986 resultados para Similarity measure


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The great interest in nonlinear system identification is mainly due to the fact that a large amount of real systems are complex and need to have their nonlinearities considered so that their models can be successfully used in applications of control, prediction, inference, among others. This work evaluates the application of Fuzzy Wavelet Neural Networks (FWNN) to identify nonlinear dynamical systems subjected to noise and outliers. Generally, these elements cause negative effects on the identification procedure, resulting in erroneous interpretations regarding the dynamical behavior of the system. The FWNN combines in a single structure the ability to deal with uncertainties of fuzzy logic, the multiresolution characteristics of wavelet theory and learning and generalization abilities of the artificial neural networks. Usually, the learning procedure of these neural networks is realized by a gradient based method, which uses the mean squared error as its cost function. This work proposes the replacement of this traditional function by an Information Theoretic Learning similarity measure, called correntropy. With the use of this similarity measure, higher order statistics can be considered during the FWNN training process. For this reason, this measure is more suitable for non-Gaussian error distributions and makes the training less sensitive to the presence of outliers. In order to evaluate this replacement, FWNN models are obtained in two identification case studies: a real nonlinear system, consisting of a multisection tank, and a simulated system based on a model of the human knee joint. The results demonstrate that the application of correntropy as the error backpropagation algorithm cost function makes the identification procedure using FWNN models more robust to outliers. However, this is only achieved if the gaussian kernel width of correntropy is properly adjusted.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Phosphorylation is amongst the most crucial and well-studied post-translational modifications. It is involved in multiple cellular processes which makes phosphorylation prediction vital for understanding protein functions. However, wet-lab techniques are labour and time intensive. Thus, computational tools are required for efficiency. This project aims to provide a novel way to predict phosphorylation sites from protein sequences by adding flexibility and Sezerman Grouping amino acid similarity measure to previous methods, as discovering new protein sequences happens at a greater rate than determining protein structures. The predictor – NOPAY - relies on Support Vector Machines (SVMs) for classification. The features include amino acid encoding, amino acid grouping, predicted secondary structure, predicted protein disorder, predicted protein flexibility, solvent accessibility, hydrophobicity and volume. As a result, we have managed to improve phosphorylation prediction accuracy for Homo sapiens by 3% and 6.1% for Mus musculus. Sensitivity at 99% specificity was also increased by 6% for Homo sapiens and for Mus musculus by 5% on independent test sets. In this study, we have managed to increase phosphorylation prediction accuracy for Homo sapiens and Mus musculus. When there is enough data, future versions of the software may also be able to predict other organisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Artificial Immune Systems have been used successfully to build recommender systems for film databases. In this research, an attempt is made to extend this idea to web site recommendation. A collection of more than 1000 individuals' web profiles (alternatively called preferences / favourites / bookmarks file) will be used. URLs will be classified using the DMOZ (Directory Mozilla) database of the Open Directory Project as our ontology. This will then be used as the data for the Artificial Immune Systems rather than the actual addresses. The first attempt will involve using a simple classification code number coupled with the number of pages within that classification code. However, this implementation does not make use of the hierarchical tree-like structure of DMOZ. Consideration will then be given to the construction of a similarity measure for web profiles that makes use of this hierarchical information to build a better-informed Artificial Immune System.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An earlier Case-based Reasoning (CBR) approach developed by the authors for educational course timetabling problems employed structured cases to represent the complex relationships between courses. Previous solved cases represented by attribute graphs were organized hierarchically into a decision tree. The retrieval searches for graph isomorphism among these attribute graphs. In this paper, the approach is further developed to solve a wider range of problems. We also attempt to retrieve those graphs that have common similar structures but also have some differences. Costs that are assigned to these differences have an input upon the similarity measure. A large number of experiments are performed consisting of different randomly produced timetabling problems and the results presented here strongly indicate that a CBR approach could provide a significant step forward in the development of automated system to solve difficult timetabling problems. They show that using relatively little effort, we can retrieve these structurally similar cases to provide high quality timetables for new timetabling problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Artificial Immune Systems have been used successfully to build recommender systems for film databases. In this research, an attempt is made to extend this idea to web site recommendation. A collection of more than 1000 individuals' web profiles (alternatively called preferences / favourites / bookmarks file) will be used. URLs will be classified using the DMOZ (Directory Mozilla) database of the Open Directory Project as our ontology. This will then be used as the data for the Artificial Immune Systems rather than the actual addresses. The first attempt will involve using a simple classification code number coupled with the number of pages within that classification code. However, this implementation does not make use of the hierarchical tree-like structure of DMOZ. Consideration will then be given to the construction of a similarity measure for web profiles that makes use of this hierarchical information to build a better-informed Artificial Immune System.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper catalogues the procedures and steps involved in agroclimatic classification. These vary from conventional descriptive methods to modern computer-based numerical techniques. There are three mutually independent numerical classification techniques, namely Ordination, Cluster analysis, and Minimum spanning tree; and under each technique there are several forms of grouping techniques existing. The vhoice of numerical classification procedure differs with the type of data set. In the case of numerical continuous data sets with booth positive and negative values, the simple and least controversial procedures are unweighted pair group method (UPGMA) and weighted pair group method (WPGMA) under clustering techniques with similarity measure obtained either from Gower metric or standardized Euclidean metric. Where the number of attributes are large, these could be reduced to fewer new attributes defined by the principal components or coordinates by ordination technique. The first few components or coodinates explain the maximum variance in the data matrix. These revided attributes are less affected by noise in the data set. It is possible to check misclassifications using minimum spanning tree.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Virtualization brought an immense commute in the modern technology especially in computer networks since last decade. The enormity of big data has led the massive graphs to be increased in size exponentially in recent years so that normal tools and algorithms are going weak to process it. Size diminution of the massive graphs is a big challenge in the current era and extraction of useful information from huge graphs is also problematic. In this paper, we presented a concept to design the virtual graph vGraph in the virtual plane above the original plane having original massive graph and proposed a novel cumulative similarity measure for vGraph. The use of vGraph is utile in lieu of massive graph in terms of space and time. Our proposed algorithm has two main parts. In the first part, virtual nodes are designed from the original nodes based on the calculation of cumulative similarity among them. In the second part, virtual edges are designed to link the virtual nodes based on the calculation of similarity measure among the original edges of the original massive graph. The algorithm is tested on synthetic and real-world datasets which shows the efficiency of our proposed algorithms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Failure mode and effect analysis (FMEA) is a popular safety and reliability analysis tool in examining potential failures of products, process, designs, or services, in a wide range of industries. While FMEA is a popular tool, the limitations of the traditional Risk Priority Number (RPN) model in FMEA have been highlighted in the literature. Even though many alternatives to the traditional RPN model have been proposed, there are not many investigations on the use of clustering techniques in FMEA. The main aim of this paper was to examine the use of a new Euclidean distance-based similarity measure and an incremental-learning clustering model, i.e., fuzzy adaptive resonance theory neural network, for similarity analysis and clustering of failure modes in FMEA; therefore, allowing the failure modes to be analyzed, visualized, and clustered. In this paper, the concept of a risk interval encompassing a group of failure modes is investigated. Besides that, a new approach to analyze risk ordering of different failure groups is introduced. These proposed methods are evaluated using a case study related to the edible bird nest industry in Sarawak, Malaysia. In short, the contributions of this paper are threefold: (1) a new Euclidean distance-based similarity measure, (2) a new risk interval measure for a group of failure modes, and (3) a new analysis of risk ordering of different failure groups.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper examines different ways for measuring similarity between software design models for the purpose of software reuse. Current approaches to this problem are discussed and a set of suitable similarity metrics are proposed and evaluated. Work on the optimisation of weights to increase the competence of a CBR system is presented. A graph matching algorithm and associated metrics capturing the structural similarity between UML class diagrams is presented and demonstrated through an example case.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In vector space based approaches to natural language processing, similarity is commonly measured by taking the angle between two vectors representing words or documents in a semantic space. This is natural from a mathematical point of view, as the angle between unit vectors is, up to constant scaling, the only unitarily invariant metric on the unit sphere. However, similarity judgement tasks reveal that human subjects fail to produce data which satisfies the symmetry and triangle inequality requirements for a metric space. A possible conclusion, reached in particular by Tversky et al., is that some of the most basic assumptions of geometric models are unwarranted in the case of psychological similarity, a result which would impose strong limits on the validity and applicability vector space based (and hence also quantum inspired) approaches to the modelling of cognitive processes. This paper proposes a resolution to this fundamental criticism of of the applicability of vector space models of cognition. We argue that pairs of words imply a context which in turn induces a point of view, allowing a subject to estimate semantic similarity. Context is here introduced as a point of view vector (POVV) and the expected similarity is derived as a measure over the POVV's. Different pairs of words will invoke different contexts and different POVV's. Hence the triangle inequality ceases to be a valid constraint on the angles. We test the proposal on a few triples of words and outline further research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a new characterization of protein structure based on the natural tetrahedral geometry of the β carbon and a new geometric measure of structural similarity, called visible volume. In our model, the side-chains are replaced by an ideal tetrahedron, the orientation of which is fixed with respect to the backbone and corresponds to the preferred rotamer directions. Visible volume is a measure of the non-occluded empty space surrounding each residue position after the side-chains have been removed. It is a robust, parameter-free, locally-computed quantity that accounts for many of the spatial constraints that are of relevance to the corresponding position in the native structure. When computing visible volume, we ignore the nature of both the residue observed at each site and the ones surrounding it. We focus instead on the space that, together, these residues could occupy. By doing so, we are able to quantify a new kind of invariance beyond the apparent variations in protein families, namely, the conservation of the physical space available at structurally equivalent positions for side-chain packing. Corresponding positions in native structures are likely to be of interest in protein structure prediction, protein design, and homology modeling. Visible volume is related to the degree of exposure of a residue position and to the actual rotamers in native proteins. In this article, we discuss the properties of this new measure, namely, its robustness with respect to both crystallographic uncertainties and naturally occurring variations in atomic coordinates, and the remarkable fact that it is essentially independent of the choice of the parameters used in calculating it. We also show how visible volume can be used to align protein structures, to identify structurally equivalent positions that are conserved in a family of proteins, and to single out positions in a protein that are likely to be of biological interest. These properties qualify visible volume as a powerful tool in a variety of applications, from the detailed analysis of protein structure to homology modeling, protein structural alignment, and the definition of better scoring functions for threading purposes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Seabirds are effective samplers of the marine environment, and can be used to measure resource partitioning among species and sites via food loads destined for chicks. We examined the composition, overlap, and relationships to changing climate and oceanography of 3,216 food loads from Least, Crested, and Whiskered Auklets (Aethia pusilla, A. cristatella, A. pygmaea) breeding in Alaska during 1994–2006. Meals comprised calanoid copepods (Neocalanus spp.) and euphausiids (Thysanoessa spp.) that reflect secondary marine productivity, with no difference among Buldir, Kiska, and Kasatochi islands across 585 km of the Aleutian Islands. Meals were very similar among species (mean Least–Crested Auklet overlap C = 0.68; Least–Whiskered Auklet overlap C = 0.96) and among sites, indicating limited partitioning of prey resources for auklets feeding chicks. The biomass of copepods and euphausiids in Least and Crested Auklet food loads was related negatively to the summer (June–July–August) North Pacific Gyre Oscillation, while in Whiskered Auklet food loads, this was negatively related to the winter (December–January–February) Pacific Decadal Oscillation, both of which track basin-wide sea-surface temperature (SST) anomalies. We found a significant quadratic relationship between the biomass of calanoid copepods in Least Auklet food loads at all three study sites and summer (June–July) SST, with maximal copepod biomass between 3–6°C (r 2 = 0.71). Outside this temperature range, zooplankton becomes less available to auklets through delayed development. Overall, our results suggest that auklets are able to buffer climate-mediated bottom-up forcing of demographic parameters like productivity, as the composition of chick meals has remained constant over the course of our study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the present paper we mainly introduce an efficient approach to measure the structural similarity of so called directed universal hierarchical graphs. We want to underline that directed universal hierarchical graphs can be obtained from generalized trees which are already introduced. In order to classify these graphs, we state our novel graph similarity method. As a main result we notice that our novel algorithm has low computational complexity. (c) 2007 Elsevier Inc. All rights reserved.