989 resultados para Text similarity measures
Resumo:
Les Mesures de Semblança Quàntica Molecular (MSQM) requereixen la maximització del solapament de les densitats electròniques de les molècules que es comparen. En aquest treball es presenta un algorisme de maximització de les MSQM, que és global en el límit de densitats electròniques deformades a funcions deltes de Dirac. A partir d'aquest algorisme se'n deriva l'equivalent per a densitats no deformades
Resumo:
Es presenta un mètode de selecció d'orbitals atòmics relacionat amb la teoria de la Semblança Molecular Quàntica, que permet reduir l'espai actiu quan es vol dur a terme un càlcul a nivell d'Interacció de Configuracions per a l'àtom d'heli
Resumo:
En aquest treball es descriu l'ús de les mesures de semblança molecular quàntica (MSMQ) per a caracteritzar propietats i activitats biològiques moleculars, i definir descriptors emprables per a construir models QSAR i QSPR. L'estudi que es presenta consisteix en la continuació d'un treball recent, on es descrivien relacions entre el paràmetre log P i MSMQ, donant així una alternativa a aquest parimetre hidrofòbic empíric. L'actual contribució presenta una nova mesura, capaç d'estendre l'ús de les MSMQ, que consisteix en l'energia de repulsió electró-electró (Vee). Aquest valor, disponible normalment a partir de programari de química quàntica, considera la molècula com una sola entitat, i no cal recórrer a l'ús de contribucions de fragments. La metodologia s'ha aplicat a cinc tipus diferents de compostos on diferents propietats moleculars i activitats biològiques s'han correlacionat amb Vee com a únic descriptor molecular. En tots els casos estudiats, s'han obtingut correlacions satisfactòries.
Resumo:
En aquest treball es presenta l'ús de funcions de densitat electrònica de forat de Fermi per incrementar el paper que pren una regió molecular concreta, considerada com a responsable de la reactivitat molecular, tot i mantenir la mida de la funció de densitat original. Aquestes densitats s'utilitzen per fer mesures d'autosemblança molecular quàntica i es presenten com una alternativa a l'ús de fragments moleculars aillats en estudis de relació entre estructura i propietat. El treball es complementa amb un exemple pràctic, on es correlaciona l'autosemblanca molecular a partir de densitats modificades amb l'energia d'una reacció isodòsmica
Resumo:
Partint de les definicions usuals de Mesures de Semblança Quàntica (MSQ), es considera la dependència d'aquestes mesures respecte de la superposició molecular. Pel cas particular en qnè els sistemes comparats siguin una molècula i un Àtom i que les mesures es calculin amb l'aproximació EASA, les MSQ esdevenen funcions de les tres coordenades de l'espai. Mantenint fixa una de les tres coordenades, es pot representar fàcilment la variació del valor de semblança en un pla determinat, i obtenir els anomenats mapes de semblança. En aquest article, es comparen els mapes de semblança obtinguts amb diferents MSQ per a sistemes senzills
Resumo:
Es presenta una sèrie de conceptes de semblança molecular quàtica i uns procediments associats de càlcul i representació gràfica dels resultats. Donada una sèrie de molècules es poden obtenir diversos tipus de gràfics que mostren les relacions entre elles. Com a exemple d'aplicació d'aquest procés s'estudia una família de drogues antitumorals
Resumo:
[EU]Lan honetan semantika distribuzionalaren eta ikasketa automatikoaren erabilera aztertzen dugu itzulpen automatiko estatistikoa hobetzeko. Bide horretan, erregresio logistikoan oinarritutako ikasketa automatikoko eredu bat proposatzen dugu hitz-segiden itzulpen- probabilitatea modu dinamikoan modelatzeko. Proposatutako eredua itzulpen automatiko estatistikoko ohiko itzulpen-probabilitateen orokortze bat dela frogatzen dugu, eta testuinguruko nahiz semantika distribuzionaleko informazioa barneratzeko baliatu ezaugarri lexiko, hitz-cluster eta hitzen errepresentazio bektorialen bidez. Horretaz gain, semantika distribuzionaleko ezagutza itzulpen automatiko estatistikoan txertatzeko beste hurbilpen bat lantzen dugu: hitzen errepresentazio bektorial elebidunak erabiltzea hitz-segiden itzulpenen antzekotasuna modelatzeko. Gure esperimentuek proposatutako ereduen baliagarritasuna erakusten dute, emaitza itxaropentsuak eskuratuz oinarrizko sistema sendo baten gainean. Era berean, gure lanak ekarpen garrantzitsuak egiten ditu errepresentazio bektorialen mapaketa elebidunei eta hitzen errepresentazio bektorialetan oinarritutako hitz-segiden antzekotasun neurriei dagokienean, itzulpen automatikoaz haratago balio propio bat dutenak semantika distribuzionalaren arloan.
Resumo:
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.
Resumo:
The purpose of this study was to assess the performance of a new motion correction algorithm. Twenty-five dynamic MR mammography (MRM) data sets and 25 contrast-enhanced three-dimensional peripheral MR angiographic (MRA) data sets which were affected by patient motion of varying severeness were selected retrospectively from routine examinations. Anonymized data were registered by a new experimental elastic motion correction algorithm. The algorithm works by computing a similarity measure for the two volumes that takes into account expected signal changes due to the presence of a contrast agent while penalizing other signal changes caused by patient motion. A conjugate gradient method is used to find the best possible set of motion parameters that maximizes the similarity measures across the entire volume. Images before and after correction were visually evaluated and scored by experienced radiologists with respect to reduction of motion, improvement of image quality, disappearance of existing lesions or creation of artifactual lesions. It was found that the correction improves image quality (76% for MRM and 96% for MRA) and diagnosability (60% for MRM and 96% for MRA).
Resumo:
Twitter lists organise Twitter users into multiple, often overlapping, sets. We believe that these lists capture some form of emergent semantics, which may be useful to characterise. In this paper we describe an approach for such characterisation, which consists of deriving semantic relations between lists and users by analyzing the cooccurrence of keywords in list names. We use the vector space model and Latent Dirichlet Allocation to obtain similar keywords according to co-occurrence patterns. These results are then compared to similarity measures relying on WordNet and to existing Linked Data sets. Results show that co-occurrence of keywords based on members of the lists produce more synonyms and more correlated results to that of WordNet similarity measures.
Resumo:
En los modelos promovidos por las normativas internacionales de análisis de riesgos en los sistemas de información, los activos están interrelacionados entre sí, de modo que un ataque sobre uno de ellos se puede transmitir a lo largo de toda la red, llegando a alcanzar a los activos más valiosos para la organización. Es necesario entonces asignar el valor de todos los activos, así como las relaciones de dependencia directas e indirectas entre estos, o la probabilidad de materialización de una amenaza y la degradación que ésta puede provocar sobre los activos. Sin embargo, los expertos encargados de asignar tales valores, a menudo aportan información vaga e incierta, de modo que las técnicas difusas pueden ser muy útiles en este ámbito. Pero estas técnicas no están libres de ciertas dificultades, como la necesidad de uso de una aritmética adecuada al modelo o el establecimiento de medidas de similitud apropiadas. En este documento proponemos un tratamiento difuso para los modelos de análisis de riesgos promovidos por las metodologías internacionales, mediante el establecimiento de tales elementos.Abstract— Assets are interrelated in risk analysis methodologies for information systems promoted by international standards. This means that an attack on one asset can be propagated through the network and threaten an organization’s most valuable assets. It is necessary to valuate all assets, the direct and indirect asset dependencies, as well as the probability of threats and the resulting asset degradation. However, the experts in charge to assign such values often provide only vague and uncertain information. Fuzzy logic can be very helpful in such situation, but it is not free of some difficulties, such as the need of a proper arithmetic to the model under consideration or the establishment of appropriate similarity measures. Throughout this paper we propose a fuzzy treatment for risk analysis models promoted by international methodologies through the establishment of such elements.
Resumo:
En esta tesis se estudia la representación, modelado y comparación de colecciones mediante el uso de ontologías en el ámbito de la Web Semántica. Las colecciones, entendidas como agrupaciones de objetos o elementos con entidad propia, son construcciones que aparecen frecuentemente en prácticamente todos los dominios del mundo real, y por tanto, es imprescindible disponer de conceptualizaciones de estas estructuras abstractas y de representaciones de estas conceptualizaciones en los sistemas informáticos, que definan adecuadamente su semántica. Mientras que en muchos ámbitos de la Informática y la Inteligencia Artificial, como por ejemplo la programación, las bases de datos o la recuperación de información, las colecciones han sido ampliamente estudiadas y se han desarrollado representaciones que responden a multitud de conceptualizaciones, en el ámbito de la Web Semántica, sin embargo, su estudio ha sido bastante limitado. De hecho hasta la fecha existen pocas propuestas de representación de colecciones mediante ontologías, y las que hay sólo cubren algunos tipos de colecciones y presentan importantes limitaciones. Esto impide la representación adecuada de colecciones y dificulta otras tareas comunes como la comparación de colecciones, algo crítico en operaciones habituales como las búsquedas semánticas o el enlazado de datos en la Web Semántica. Para solventar este problema esta tesis hace una propuesta de modelización de colecciones basada en una nueva clasificación de colecciones de acuerdo a sus características estructurales (homogeneidad, unicidad, orden y cardinalidad). Esta clasificación permite definir una taxonomía con hasta 16 tipos de colecciones distintas. Entre otras ventajas, esta nueva clasificación permite aprovechar la semántica de las propiedades estructurales de cada tipo de colección para realizar comparaciones utilizando las funciones de similitud y disimilitud más apropiadas. De este modo, la tesis desarrolla además un nuevo catálogo de funciones de similitud para las distintas colecciones, donde se han recogido las funciones de (di)similitud más conocidas y también algunas nuevas. Esta propuesta se ha implementado mediante dos ontologías paralelas, la ontología E-Collections, que representa los distintos tipos de colecciones de la taxonomía y su axiomática, y la ontología SIMEON (Similarity Measures Ontology) que representa los tipos de funciones de (di)similitud para cada tipo de colección. Gracias a estas ontologías, para comparar dos colecciones, una vez representadas como instancias de la clase más apropiada de la ontología E-Collections, automáticamente se sabe qué funciones de (di)similitud de la ontología SIMEON pueden utilizarse para su comparación. Abstract This thesis studies the representation, modeling and comparison of collections in the Semantic Web using ontologies. Collections, understood as groups of objects or elements with their own identities, are constructions that appear frequently in almost all areas of the real world. Therefore, it is essential to have conceptualizations of these abstract structures and representations of these conceptualizations in computer systems, that define their semantic properly. While in many areas of Computer Science and Artificial Intelligence, such as Programming, Databases or Information Retrieval, the collections have been extensively studied and there are representations that match many conceptualizations, in the field Semantic Web, however, their study has been quite limited. In fact, there are few representations of collections using ontologies so far, and they only cover some types of collections and have important limitations. This hinders a proper representation of collections and other common tasks like comparing collections, something critical in usual operations such as semantic search or linking data on the Semantic Web. To solve this problem this thesis makes a proposal for modelling collections based on a new classification of collections according to their structural characteristics (homogeneity, uniqueness, order and cardinality). This classification allows to define a taxonomy with up to 16 different types of collections. Among other advantages, this new classification can leverage the semantics of the structural properties of each type of collection to make comparisons using the most appropriate (dis)similarity functions. Thus, the thesis also develops a new catalog of similarity functions for the different types of collections. This catalog contains the most common (dis)similarity functions as well as new ones. This proposal is implemented through two parallel ontologies, the E-Collections ontology that represents the different types of collections in the taxonomy and their axiomatic, and the SIMEON ontology (Similarity Measures Ontology) that represents the types of (dis)similarity functions for each type of collection. Thanks to these ontologies, to compare two collections, once represented as instances of the appropriate class of E-Collections ontology, we can know automatically which (dis)similarity functions of the SIMEON ontology are suitable for the comparison. Finally, the feasibility and usefulness of this modeling and comparison of collections proposal is proved in the field of oenology, applying both E-Collections and SIMEON ontologies to the representation and comparison of wines with the E-Baco ontology.
Resumo:
This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain, we are investigating a preliminary prototype topographic visualisation model. The focus of the current paper is on the mathematical problem of exploiting a relative dissimilarity representation of signals in a visual informatics mapping model, driven by real-world sonar systems. A realistic noise model is explored and incorporated into non-linear and topographic visualisation algorithms building on the approach of [9]. Concepts are illustrated using a real world dataset of 32 hydrophones monitoring a shallow-water environment in which targets are present and dynamic.
Resumo:
What role do state party organizations play in twenty-first century American politics? What is the nature of the relationship between the state and national party organizations in contemporary elections? These questions frame the three studies presented in this dissertation. More specifically, I examine the organizational development of the state party organizations and the strategic interactions and connections between the state and national party organizations in contemporary elections.
In the first empirical chapter, I argue that the Internet Age represents a significant transitional period for state party organizations. Using data collected from surveys of state party leaders, this chapter reevaluates and updates existing theories of party organizational strength and demonstrates the importance of new indicators of party technological capacity to our understanding of party organizational development in the early twenty-first century. In the second chapter, I ask whether the national parties utilize different strategies in deciding how to allocate resources to state parties through fund transfers and through the 50-state-strategy party-building programs that both the Democratic and Republican National Committees advertised during the 2010 elections. Analyzing data collected from my 2011 state party survey and party-fund-transfer data collected from the Federal Election Commission, I find that the national parties considered a combination of state and national electoral concerns in directing assistance to the state parties through their 50-state strategies, as opposed to the strict battleground-state strategy that explains party fund transfers. In my last chapter, I examine the relationships between platforms issued by Democratic and Republican state and national parties and the strategic considerations that explain why state platforms vary in their degree of similarity to the national platform. I analyze an extensive platform dataset, using cluster analysis and document similarity measures to compare platform content across the 1952 to 2014 period. The analysis shows that, as a group, Democratic and Republican state platforms exhibit greater intra-party homogeneity and inter-party heterogeneity starting in the early 1990s, and state-national platform similarity is higher in states that are key players in presidential elections, among other factors. Together, these three studies demonstrate the significance of the state party organizations and the state-national party partnership in contemporary politics.
Resumo:
Conceptual interpretation of languages has gathered peak interest in the world of artificial intelligence. The challenge in modeling various complications involved in a language is the main motivation behind our work. Our main focus in this work is to develop conceptual graphical representation for image captions. We have used discourse representation structure to gain semantic information which is further modeled into a graphical structure. The effectiveness of the model is evaluated by a caption based image retrieval system. The image retrieval is performed by computing subgraph based similarity measures. Best retrievals were given an average rating of . ± . out of 4 by a group of 25 human judges. The experiments were performed on a subset of the SBU Captioned Photo Dataset. This purpose of this work is to establish the cognitive sensibility of the approach to caption representations