989 resultados para Text similarity measures


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduced a spectral clustering algorithm based on the bipartite graph model for the Manufacturing Cell Formation problem in [Oliveira S, Ribeiro JFF, Seok SC. A spectral clustering algorithm for manufacturing cell formation. Computers and Industrial Engineering. 2007 [submitted for publication]]. It constructs two similarity matrices; one for parts and one for machines. The algorithm executes a spectral clustering algorithm on each separately to find families of parts and cells of machines. The similarity measure in the approach utilized limited information between parts and between machines. This paper reviews several well-known similarity measures which have been used for Group Technology. Computational clustering results are compared by various performance measures. (C) 2008 The Society of Manufacturing Engineers. Published by Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Es descriu l'aproximació de Capes Atòmiques dins de la teoria de la Semblança Molecular Quàntica. Partint només de dades teòriques, s'ha trobat una relació entre estructura molecular i activitat biològica per a diversos conjunts de molècules. Es descriuen els aspectes teòrics de la Semblança Molecular Quàntica i alguns exemples d'aplicació

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A procedure based on quantum molecular similarity measures (QMSM) has been used to compare electron densities obtained from conventional ab initio and density functional methodologies at their respective optimized geometries. This method has been applied to a series of small molecules which have experimentally known properties and molecular bonds of diverse degrees of ionicity and covalency. Results show that in most cases the electron densities obtained from density functional methodologies are of a similar quality than post-Hartree-Fock generalized densities. For molecules where Hartree-Fock methodology yields erroneous results, the density functional methodology is shown to yield usually more accurate densities than those provided by the second order Møller-Plesset perturbation theory

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyse the use of the ordered weighted average (OWA) in decision-making giving special attention to business and economic decision-making problems. We present several aggregation techniques that are very useful for decision-making such as the Hamming distance, the adequacy coefficient and the index of maximum and minimum level. We suggest a new approach by using immediate weights, that is, by using the weighted average and the OWA operator in the same formulation. We further generalize them by using generalized and quasi-arithmetic means. We also analyse the applicability of the OWA operator in business and economics and we see that we can use it instead of the weighted average. We end the paper with an application in a business multi-person decision-making problem regarding production management

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We analyse the use of the ordered weighted average (OWA) in decision-making giving special attention to business and economic decision-making problems. We present several aggregation techniques that are very useful for decision-making such as the Hamming distance, the adequacy coefficient and the index of maximum and minimum level. We suggest a new approach by using immediate weights, that is, by using the weighted average and the OWA operator in the same formulation. We further generalize them by using generalized and quasi-arithmetic means. We also analyse the applicability of the OWA operator in business and economics and we see that we can use it instead of the weighted average. We end the paper with an application in a business multi-person decision-making problem regarding production management

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A complex network is an abstract representation of an intricate system of interrelated elements where the patterns of connection hold significant meaning. One particular complex network is a social network whereby the vertices represent people and edges denote their daily interactions. Understanding social network dynamics can be vital to the mitigation of disease spread as these networks model the interactions, and thus avenues of spread, between individuals. To better understand complex networks, algorithms which generate graphs exhibiting observed properties of real-world networks, known as graph models, are often constructed. While various efforts to aid with the construction of graph models have been proposed using statistical and probabilistic methods, genetic programming (GP) has only recently been considered. However, determining that a graph model of a complex network accurately describes the target network(s) is not a trivial task as the graph models are often stochastic in nature and the notion of similarity is dependent upon the expected behavior of the network. This thesis examines a number of well-known network properties to determine which measures best allowed networks generated by different graph models, and thus the models themselves, to be distinguished. A proposed meta-analysis procedure was used to demonstrate how these network measures interact when used together as classifiers to determine network, and thus model, (dis)similarity. The analytical results form the basis of the fitness evaluation for a GP system used to automatically construct graph models for complex networks. The GP-based automatic inference system was used to reproduce existing, well-known graph models as well as a real-world network. Results indicated that the automatically inferred models exemplified functional similarity when compared to their respective target networks. This approach also showed promise when used to infer a model for a mammalian brain network.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A procedure based on quantum molecular similarity measures (QMSM) has been used to compare electron densities obtained from conventional ab initio and density functional methodologies at their respective optimized geometries. This method has been applied to a series of small molecules which have experimentally known properties and molecular bonds of diverse degrees of ionicity and covalency. Results show that in most cases the electron densities obtained from density functional methodologies are of a similar quality than post-Hartree-Fock generalized densities. For molecules where Hartree-Fock methodology yields erroneous results, the density functional methodology is shown to yield usually more accurate densities than those provided by the second order Møller-Plesset perturbation theory

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Es descriu l'aproximació de Capes Atòmiques dins de la teoria de la Semblança Molecular Quàntica. Partint només de dades teòriques, s'ha trobat una relació entre estructura molecular i activitat biològica per a diversos conjunts de molècules. Es descriuen els aspectes teòrics de la Semblança Molecular Quàntica i alguns exemples d'aplicació

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large number of heuristic algorithms have been developed over the years which have been aimed at solving examination timetabling problems. However, many of these algorithms have been developed specifically to solve one particular problem instance or a small subset of instances related to a given real-life problem. Our aim is to develop a more general system which, when given any exam timetabling problem, will produce results which are comparative to those of a specially designed heuristic for that problem. We are investigating a Case based reasoning (CBR) technique to select from a set of algorithms which have been applied successfully to similar problem instances in the past. The assumption in CBR is that similar problems have similar solutions. For our system, the assumption is that an algorithm used to find a good solution to one problem will also produce a good result for a similar problem. The key to the success of the system will be our definition of similarity between two exam timetabling problems. The study will be carried out by running a series of tests using a simple Simulated Annealing Algorithm on a range of problems with differing levels of similarity and examining the data sets in detail. In this paper an initial investigation of the key factors which will be involved in this measure is presented with a discussion of how the definition of good impacts on this.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large number of heuristic algorithms have been developed over the years which have been aimed at solving examination timetabling problems. However, many of these algorithms have been developed specifically to solve one particular problem instance or a small subset of instances related to a given real-life problem. Our aim is to develop a more general system which, when given any exam timetabling problem, will produce results which are comparative to those of a specially designed heuristic for that problem. We are investigating a Case based reasoning (CBR) technique to select from a set of algorithms which have been applied successfully to similar problem instances in the past. The assumption in CBR is that similar problems have similar solutions. For our system, the assumption is that an algorithm used to find a good solution to one problem will also produce a good result for a similar problem. The key to the success of the system will be our definition of similarity between two exam timetabling problems. The study will be carried out by running a series of tests using a simple Simulated Annealing Algorithm on a range of problems with differing levels of similarity and examining the data sets in detail. In this paper an initial investigation of the key factors which will be involved in this measure is presented with a discussion of how the definition of good impacts on this.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Luokittelujärjestelmää suunniteltaessa tarkoituksena on rakentaa systeemi, joka pystyy ratkaisemaan mahdollisimman tarkasti tutkittavan ongelma-alueen. Hahmontunnistuksessa tunnistusjärjestelmän ydin on luokitin. Luokittelun sovellusaluekenttä on varsin laaja. Luokitinta tarvitaan mm. hahmontunnistusjärjestelmissä, joista kuvankäsittely toimii hyvänä esimerkkinä. Myös lääketieteen parissa tarkkaa luokittelua tarvitaan paljon. Esimerkiksi potilaan oireiden diagnosointiin tarvitaan luokitin, joka pystyy mittaustuloksista päättelemään mahdollisimman tarkasti, onko potilaalla kyseinen oire vai ei. Väitöskirjassa on tehty similaarisuusmittoihin perustuva luokitin ja sen toimintaa on tarkasteltu mm. lääketieteen paristatulevilla data-aineistoilla, joissa luokittelutehtävänä on tunnistaa potilaan oireen laatu. Väitöskirjassa esitetyn luokittimen etuna on sen yksinkertainen rakenne, josta johtuen se on helppo tehdä sekä ymmärtää. Toinen etu on luokittimentarkkuus. Luokitin saadaan luokittelemaan useita eri ongelmia hyvin tarkasti. Tämä on tärkeää varsinkin lääketieteen parissa, missä jo pieni tarkkuuden parannus luokittelutuloksessa on erittäin tärkeää. Väitöskirjassa ontutkittu useita eri mittoja, joilla voidaan mitata samankaltaisuutta. Mitoille löytyy myös useita parametreja, joille voidaan etsiä juuri kyseiseen luokitteluongelmaan sopivat arvot. Tämä parametrien optimointi ongelma-alueeseen sopivaksi voidaan suorittaa mm. evoluutionääri- algoritmeja käyttäen. Kyseisessä työssä tähän on käytetty geneettistä algoritmia ja differentiaali-evoluutioalgoritmia. Luokittimen etuna on sen joustavuus. Ongelma-alueelle on helppo vaihtaa similaarisuusmitta, jos kyseinen mitta ei ole sopiva tutkittavaan ongelma-alueeseen. Myös eri mittojen parametrien optimointi voi parantaa tuloksia huomattavasti. Kun käytetään eri esikäsittelymenetelmiä ennen luokittelua, tuloksia pystytään parantamaan.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Aquesta memòria està estructurada en sis capítols amb l'objectiu final de fonamentar i desenvolupar les eines matemàtiques necessàries per a la classificació de conjunts de subconjunts borrosos. El nucli teòric del treball el formen els capítols 3, 4 i 5; els dos primers són dos capítols de caire més general, i l'últim és una aplicació dels anteriors a la classificació dels països de la Unió Europea en funció de determinades característiques borroses. En el capítol 1 s'analitzen les diferents connectives borroses posant una especial atenció en aquells aspectes que en altres capítols tindran una aplicació específica. És per aquest motiu que s'estudien les ordenacions de famílies de t-normes, donada la seva importància en la transitivitat de les relacions borroses. La verificació del principi del terç exclòs és necessària per assegurar que un conjunt significatiu de mesures borroses generalitzades, introduïdes en el capítol 3, siguin reflexives. Estudiem per a quines t-normes es verifica aquesta propietat i introduïm un nou conjunt de t-normes que verifiquen aquest principi. En el capítol 2 es fa un recorregut general per les relacions borroses centrant-nos en l'estudi de la clausura transitiva per a qualsevol t-norma, el càlcul de la qual és en molts casos fonamental per portar a terme el procés de classificació. Al final del capítol s'exposa un procediment pràctic per al càlcul d'una relació borrosa amb l'ajuda d'experts i de sèries estadístiques. El capítol 3 és un monogràfic sobre mesures borroses. El primer objectiu és relacionar les mesures (o distàncies) usualment utilitzades en les aplicacions borroses amb les mesures conjuntistes crisp. Es tracta d'un enfocament diferent del tradicional enfocament geomètric. El principal resultat és la introducció d'una família parametritzada de mesures que verifiquen unes propietats de caràcter conjuntista prou satisfactòries. L'estudi de la verificació del principi del terç exclòs té aquí la seva aplicació sobre la reflexivitat d'aquestes mesures, que són estudiades amb una certa profunditat en alguns casos particulars. El capítol 4 és, d'entrada, un repàs dels principals resultats i mètodes borrosos per a la classificació dels elements d'un mateix conjunt de subconjunts borrosos. És aquí on s'apliquen els resultats sobre les ordenacions de les famílies de t-normes i t-conormes estudiades en el capítol 1. S'introdueix un nou mètode de clusterització, canviant la matriu de la relació borrosa cada vegada que s'obté un nou clúster. Aquest mètode permet homogeneïtzar la metodologia del càlcul de la relació borrosa amb el mètode de clusterització. El capítol 5 tracta sobre l'agrupació d'objectes de diferent naturalesa; és a dir, subconjunts borrosos que pertanyen a diferents conjunts. Aquesta teoria ja ha estat desenvolupada en el cas binari; aquí, el que es presenta és la seva generalització al cas n-ari. Més endavant s'estudien certs aspectes de les projeccions de la relació sobre un cert espai i el recíproc, l'estudi de cilindres de relacions predeterminades. Una aplicació sobre l'agrupació de les comarques gironines en funció de certes variables borroses es presenta al final del capítol. L'últim capítol és eminentment pràctic, ja que s'aplica allò estudiat principalment en els capítols 3 i 4 a la classificació dels països de la Unió Europea en funció de determinades característiques borroses. Per tal de fer previsions per a anys venidors s'han utilitzat sèries temporals i xarxes neuronals. S'han emprat diverses mesures i mètodes de clusterització per tal de poder comparar els diversos dendogrames que resulten del procés de clusterització. Finalment, als annexos es poden consultar les sèries estadístiques utilitzades, la seva extrapolació, els càlculs per a la construcció de les matrius de les relacions borroses, les matrius de mesura i les seves clausures.