64 resultados para query rewriting
Resumo:
This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.
Resumo:
The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. A second order combination was also tested, mainly by averaging or selective combination of the documents retrieved by different approaches for a particular query.
Resumo:
This article describes the work performed over the database of questions belonging to the different opinion polls carried during the last 50 years in Spain. Approximately half of the questions are provided with a title while the other half remain untitled. The work and implemented techniques in order to automatically generate the titles for untitled questions are described. This process is performed over very short texts and generated titles are subject to strong stylistic conventions and should be fully grammatical pieces of Spanish
Resumo:
El cálculo de relaciones binarias fue creado por De Morgan en 1860 para ser posteriormente desarrollado en gran medida por Peirce y Schröder. Tarski, Givant, Freyd y Scedrov demostraron que las álgebras relacionales son capaces de formalizar la lógica de primer orden, la lógica de orden superior así como la teoría de conjuntos. A partir de los resultados matemáticos de Tarski y Freyd, esta tesis desarrolla semánticas denotacionales y operacionales para la programación lógica con restricciones usando el álgebra relacional como base. La idea principal es la utilización del concepto de semántica ejecutable, semánticas cuya característica principal es el que la ejecución es posible utilizando el razonamiento estándar del universo semántico, este caso, razonamiento ecuacional. En el caso de este trabajo, se muestra que las álgebras relacionales distributivas con un operador de punto fijo capturan toda la teoría y metateoría estándar de la programación lógica con restricciones incluyendo los árboles utilizados en la búsqueda de demostraciones. La mayor parte de técnicas de optimización de programas, evaluación parcial e interpretación abstracta pueden ser llevadas a cabo utilizando las semánticas aquí presentadas. La demostración de la corrección de la implementación resulta extremadamente sencilla. En la primera parte de la tesis, un programa lógico con restricciones es traducido a un conjunto de términos relacionales. La interpretación estándar en la teoría de conjuntos de dichas relaciones coincide con la semántica estándar para CLP. Las consultas contra el programa traducido son llevadas a cabo mediante la reescritura de relaciones. Para concluir la primera parte, se demuestra la corrección y equivalencia operacional de esta nueva semántica, así como se define un algoritmo de unificación mediante la reescritura de relaciones. La segunda parte de la tesis desarrolla una semántica para la programación lógica con restricciones usando la teoría de alegorías—versión categórica del álgebra de relaciones—de Freyd. Para ello, se definen dos nuevos conceptos de Categoría Regular de Lawvere y _-Alegoría, en las cuales es posible interpretar un programa lógico. La ventaja fundamental que el enfoque categórico aporta es la definición de una máquina categórica que mejora e sistema de reescritura presentado en la primera parte. Gracias al uso de relaciones tabulares, la máquina modela la ejecución eficiente sin salir de un marco estrictamente formal. Utilizando la reescritura de diagramas, se define un algoritmo para el cálculo de pullbacks en Categorías Regulares de Lawvere. Los dominios de las tabulaciones aportan información sobre la utilización de memoria y variable libres, mientras que el estado compartido queda capturado por los diagramas. La especificación de la máquina induce la derivación formal de un juego de instrucciones eficiente. El marco categórico aporta otras importantes ventajas, como la posibilidad de incorporar tipos de datos algebraicos, funciones y otras extensiones a Prolog, a la vez que se conserva el carácter 100% declarativo de nuestra semántica. ABSTRACT The calculus of binary relations was introduced by De Morgan in 1860, to be greatly developed by Peirce and Schröder, as well as many others in the twentieth century. Using different formulations of relational structures, Tarski, Givant, Freyd, and Scedrov have shown how relation algebras can provide a variable-free way of formalizing first order logic, higher order logic and set theory, among other formal systems. Building on those mathematical results, we develop denotational and operational semantics for Constraint Logic Programming using relation algebra. The idea of executable semantics plays a fundamental role in this work, both as a philosophical and technical foundation. We call a semantics executable when program execution can be carried out using the regular theory and tools that define the semantic universe. Throughout this work, the use of pure algebraic reasoning is the basis of denotational and operational results, eliminating all the classical non-equational meta-theory associated to traditional semantics for Logic Programming. All algebraic reasoning, including execution, is performed in an algebraic way, to the point we could state that the denotational semantics of a CLP program is directly executable. Techniques like optimization, partial evaluation and abstract interpretation find a natural place in our algebraic models. Other properties, like correctness of the implementation or program transformation are easy to check, as they are carried out using instances of the general equational theory. In the first part of the work, we translate Constraint Logic Programs to binary relations in a modified version of the distributive relation algebras used by Tarski. Execution is carried out by a rewriting system. We prove adequacy and operational equivalence of the semantics. In the second part of the work, the relation algebraic approach is improved by using allegory theory, a categorical version of the algebra of relations developed by Freyd and Scedrov. The use of allegories lifts the semantics to typed relations, which capture the number of logical variables used by a predicate or program state in a declarative way. A logic program is interpreted in a _-allegory, which is in turn generated from a new notion of Regular Lawvere Category. As in the untyped case, program translation coincides with program interpretation. Thus, we develop a categorical machine directly from the semantics. The machine is based on relation composition, with a pullback calculation algorithm at its core. The algorithm is defined with the help of a notion of diagram rewriting. In this operational interpretation, types represent information about memory allocation and the execution mechanism is more efficient, thanks to the faithful representation of shared state by categorical projections. We finish the work by illustrating how the categorical semantics allows the incorporation into Prolog of constructs typical of Functional Programming, like abstract data types, and strict and lazy functions.
Resumo:
The access to medical literature collections such as PubMed, MedScape or Cochrane has been increased notably in the last years by the web-based tools that provide instant access to the information. However, more sophisticated methodologies are needed to exploit efficiently all that information. The lack of advanced search methods in clinical domain produce that even using well-defined questions for a particular disease, clinicians receive too many results. Since no information analysis is applied afterwards, some relevant results which are not presented in the top of the resultant collection could be ignored by the expert causing an important loose of information. In this work we present a new method to improve scientific article search using patient information for query generation. Using federated search strategy, it is able to simultaneously search in different resources and present a unique relevant literature collection. And applying NLP techniques it presents semantically similar publications together, facilitating the identification of relevant information to clinicians. This method aims to be the foundation of a collaborative environment for sharing clinical knowledge related to patients and scientific publications.
Resumo:
Non-failure analysis aims at inferring that predicate calis in a program will never fail. This type of information has many applications in functional/logic programming. It is essential for determining lower bounds on the computational cost of calis, useful in the context of program parallelization, instrumental in partial evaluation and other program transformations, and has also been used in query optimization. In this paper, we re-cast the non-failure analysis proposed by Debray et al. as an abstract interpretation, which not only allows to investígate it from a standard and well understood theoretical framework, but has also several practical advantages. It allows us to incorpórate non-failure analysis into a standard, generic abstract interpretation engine. The analysis thus benefits from the fixpoint propagation algorithm, which leads to improved information propagation. Also, the analysis takes advantage of the multi-variance of the generic engine, so that it is now able to infer sepárate non-failure information for different cali patterns. Moreover, the implementation is simpler, and allows to perform non-failure and covering analyses alongside other analyses, such as those for modes and types, in the same framework. Finally, besides the precisión improvements and the additional simplicity, our implementation (in the Ciao/CiaoPP multiparadigm programming system) also shows better efRciency.
Resumo:
It is generally recognized that information about the runtime cost of computations can be useful for a variety of applications, including program transformation, granularity control during parallel execution, and query optimization in deductive databases. Most of the work to date on compile-time cost estimation of logic programs has focused on the estimation of upper bounds on costs. However, in many applications, such as parallel implementations on distributed-memory machines, one would prefer to work with lower bounds instead. The problem with estimating lower bounds is that in general, it is necessary to account for the possibility of failure of head unification, leading to a trivial lower bound of 0. In this paper, we show how, given type and mode information about procedures in a logic program, it is possible to (semi-automatically) derive nontrivial lower bounds on their computational costs. We also discuss the cost analysis for the special and frequent case of divide-and-conquer programs and show how —as a pragmatic short-term solution —it may be possible to obtain useful results simply by identifying and treating divide-and-conquer programs specially.
Resumo:
This paper presents and proves some fundamental results for independent and-parallelism (IAP). First, the paper treats the issues of correctness and efficiency: after defining strict and non-strict goal independence, it is proved that if strictly independent goals are executed in parallel the solutions obtained are the same as those produced by standard sequential execution. It is also shown that, in the absence of failure, the parallel proof procedure doesn't genérate any additional work (with respect to standard SLDresolution) while the actual execution time is reduced. The same results hold even if non-strictly independent goals are executed in parallel, provided a trivial rewriting of such goals is performed. In addition, and most importantly, treats the issue of compile-time generation of IAP by proposing conditions, to be written at compile-time, to efficiently check strict and non-strict goal independence at run-time and proving the sufficiency of such conditions. It is also shown how simpler conditions can be constructed if some information regarding the binding context of the goals to be executed in parallel is available to the compiler trough either local or program-level analysis. These results therefore provide a formal basis for the automatic compile-time generation of IAP. As a corollary of such results, the paper also proves that negative goals are always non-strictly independent, and that goals which share a first occurrence of an existential variable are never independent.
Resumo:
The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.
Resumo:
Information about the computational cost of programs is potentially useful for a variety of purposes, including selecting among different algorithms, guiding program transformations, in granularity control and mapping decisions in parallelizing compilers, and query optimization in deductive databases. Cost analysis of logic programs is complicated by nondeterminism: on the one hand, procedures can return múltiple Solutions, making it necessary to estímate the number of solutions in order to give nontrivial upper bound cost estimates; on the other hand, the possibility of failure has to be taken into account while estimating lower bounds. Here we discuss techniques to address these problems to some extent.
Resumo:
This introduction gives a general perspective of the debugging methodology and the tools developed in the ESPRIT IV project DiSCiPl Debugging Systems for Constraint Programming. It has been prepared by the editors of this volume by substantial rewriting of the DiSCiPl deliverable CP Debugging Tools [1]. This introduction is organised as follows. Section 1 outlines the DiSCiPl view of debugging, its associated debugging methodology, and motivates the kinds of tools proposed: the assertion based tools, the declarative diagnoser and the visualisation tools. Sections 2 through 4 provide a short presentation of the tools of each kind. Finally, Section 5 presents a summary of the tools developed in the project. This introduction gives only a general view of the DiSCiPl debugging methodology and tools. For details and for specific bibliographic referenees the reader is referred to the subsequent chapters.
Resumo:
Tabled evaluation has been proved an effective method to improve several aspeets of goal-oriented query evaluation, including termination and complexity. Several "native" implementations of tabled evaluation have been developed which offer good performance, but many of them need significant changes to the underlying Prolog implementation. More portable approaches, generally using program transformation, have been proposed but they often result in lower efficieney. We explore some techniques aimed at combining the best of these worlds, i.e., developing a portable and extensible implementation, with minimal modifications at the abstract machine level, and with reasonably good performance. Our preliminary results indícate promising results.
Resumo:
Con el surgir de los problemas irresolubles de forma eficiente en tiempo polinomial en base al dato de entrada, surge la Computación Natural como alternativa a la computación clásica. En esta disciplina se trata de o bien utilizar la naturaleza como base de cómputo o bien, simular su comportamiento para obtener mejores soluciones a los problemas que los encontrados por la computación clásica. Dentro de la computación natural, y como una representación a nivel celular, surge la Computación con Membranas. La primera abstracción de las membranas que se encuentran en las células, da como resultado los P sistemas de transición. Estos sistemas, que podrían ser implementados en medios biológicos o electrónicos, son la base de estudio de esta Tesis. En primer lugar, se estudian las implementaciones que se han realizado, con el fin de centrarse en las implementaciones distribuidas, que son las que pueden aprovechar las características intrínsecas de paralelismo y no determinismo. Tras un correcto estudio del estado actual de las distintas etapas que engloban a la evolución del sistema, se concluye con que las distribuciones que buscan un equilibrio entre las dos etapas (aplicación y comunicación), son las que mejores resultados presentan. Para definir estas distribuciones, es necesario definir completamente el sistema, y cada una de las partes que influyen en su transición. Además de los trabajos de otros investigadores, y junto a ellos, se realizan variaciones a los proxies y arquitecturas de distribución, para tener completamente definidos el comportamiento dinámico de los P sistemas. A partir del conocimiento estático –configuración inicial– del P sistema, se pueden realizar distribuciones de membranas en los procesadores de un clúster para obtener buenos tiempos de evolución, con el fin de que la computación del P sistema sea realizada en el menor tiempo posible. Para realizar estas distribuciones, hay que tener presente las arquitecturas –o forma de conexión– de los procesadores del clúster. La existencia de 4 arquitecturas, hace que el proceso de distribución sea dependiente de la arquitectura a utilizar, y por tanto, aunque con significativas semejanzas, los algoritmos de distribución deben ser realizados también 4 veces. Aunque los propulsores de las arquitecturas han estudiado el tiempo óptimo de cada arquitectura, la inexistencia de distribuciones para estas arquitecturas ha llevado a que en esta Tesis se probaran las 4, hasta que sea posible determinar que en la práctica, ocurre lo mismo que en los estudios teóricos. Para realizar la distribución, no existe ningún algoritmo determinista que consiga una distribución que satisfaga las necesidades de la arquitectura para cualquier P sistema. Por ello, debido a la complejidad de dicho problema, se propone el uso de metaheurísticas de Computación Natural. En primer lugar, se propone utilizar Algoritmos Genéticos, ya que es posible realizar alguna distribución, y basada en la premisa de que con la evolución, los individuos mejoran, con la evolución de dichos algoritmos, las distribuciones también mejorarán obteniéndose tiempos cercanos al óptimo teórico. Para las arquitecturas que preservan la topología arbórea del P sistema, han sido necesarias realizar nuevas representaciones, y nuevos algoritmos de cruzamiento y mutación. A partir de un estudio más detallado de las membranas y las comunicaciones entre procesadores, se ha comprobado que los tiempos totales que se han utilizado para la distribución pueden ser mejorados e individualizados para cada membrana. Así, se han probado los mismos algoritmos, obteniendo otras distribuciones que mejoran los tiempos. De igual forma, se han planteado el uso de Optimización por Enjambres de Partículas y Evolución Gramatical con reescritura de gramáticas (variante de Evolución Gramatical que se presenta en esta Tesis), para resolver el mismo cometido, obteniendo otro tipo de distribuciones, y pudiendo realizar una comparativa de las arquitecturas. Por último, el uso de estimadores para el tiempo de aplicación y comunicación, y las variaciones en la topología de árbol de membranas que pueden producirse de forma no determinista con la evolución del P sistema, hace que se deba de monitorizar el mismo, y en caso necesario, realizar redistribuciones de membranas en procesadores, para seguir obteniendo tiempos de evolución razonables. Se explica, cómo, cuándo y dónde se deben realizar estas modificaciones y redistribuciones; y cómo es posible realizar este recálculo. Abstract Natural Computing is becoming a useful alternative to classical computational models since it its able to solve, in an efficient way, hard problems in polynomial time. This discipline is based on biological behaviour of living organisms, using nature as a basis of computation or simulating nature behaviour to obtain better solutions to problems solved by the classical computational models. Membrane Computing is a sub discipline of Natural Computing in which only the cellular representation and behaviour of nature is taken into account. Transition P Systems are the first abstract representation of membranes belonging to cells. These systems, which can be implemented in biological organisms or in electronic devices, are the main topic studied in this thesis. Implementations developed in this field so far have been studied, just to focus on distributed implementations. Such distributions are really important since they can exploit the intrinsic parallelism and non-determinism behaviour of living cells, only membranes in this case study. After a detailed survey of the current state of the art of membranes evolution and proposed algorithms, this work concludes that best results are obtained using an equal assignment of communication and rules application inside the Transition P System architecture. In order to define such optimal distribution, it is necessary to fully define the system, and each one of the elements that influence in its transition. Some changes have been made in the work of other authors: load distribution architectures, proxies definition, etc., in order to completely define the dynamic behaviour of the Transition P System. Starting from the static representation –initial configuration– of the Transition P System, distributions of membranes in several physical processors of a cluster is algorithmically done in order to get a better performance of evolution so that the computational complexity of the Transition P System is done in less time as possible. To build these distributions, the cluster architecture –or connection links– must be considered. The existence of 4 architectures, makes that the process of distribution depends on the chosen architecture, and therefore, although with significant similarities, the distribution algorithms must be implemented 4 times. Authors who proposed such architectures have studied the optimal time of each one. The non existence of membrane distributions for these architectures has led us to implement a dynamic distribution for the 4. Simulations performed in this work fix with the theoretical studies. There is not any deterministic algorithm that gets a distribution that meets the needs of the architecture for any Transition P System. Therefore, due to the complexity of the problem, the use of meta-heuristics of Natural Computing is proposed. First, Genetic Algorithm heuristic is proposed since it is possible to make a distribution based on the premise that along with evolution the individuals improve, and with the improvement of these individuals, also distributions enhance, obtaining complexity times close to theoretical optimum time. For architectures that preserve the tree topology of the Transition P System, it has been necessary to make new representations of individuals and new algorithms of crossover and mutation operations. From a more detailed study of the membranes and the communications among processors, it has been proof that the total time used for the distribution can be improved and individualized for each membrane. Thus, the same algorithms have been tested, obtaining other distributions that improve the complexity time. In the same way, using Particle Swarm Optimization and Grammatical Evolution by rewriting grammars (Grammatical Evolution variant presented in this thesis), to solve the same distribution task. New types of distributions have been obtained, and a comparison of such genetic and particle architectures has been done. Finally, the use of estimators for the time of rules application and communication, and variations in tree topology of membranes that can occur in a non-deterministic way with evolution of the Transition P System, has been done to monitor the system, and if necessary, perform a membrane redistribution on processors to obtain reasonable evolution time. How, when and where to make these changes and redistributions, and how it can perform this recalculation, is explained.
Resumo:
Bruynooghe described a framework for the top-down abstract interpretation of logic programs. In this framework, abstract interpretation is carried out by constructing an abstract and-or tree in a top-down fashion for a given query and program. Such an abstract interpreter requires fixpoint computation for programs which contain recursive predicates. This paper presents in detail a fixpoint algorithm that has been developed for this purpose and the motivation behind it. We start off by describing a simple-minded algorithm. After pointing out its shortcomings, we present a series of refinements to this algorithm, until we reach the final version. The aim is to give an intuitive grasp and provide justification for the relative complexity of the final algorithm. We also present an informal proof of correctness of the algorithm and some results obtained from an implementation.
Resumo:
Because of the growing availability of third-party APIs, services, widgets and any other reusable web component, mashup developers now face a vast amount of candidate components for their developments. Moreover, these components quite often are scattered in many different repositories and web sites, which makes difficult their selection or discovery. In this paper, we discuss the problem of component selection in Service-Oriented Architectures (SOA) and Mashup-Driven Development, and introduce the Linked Mashups Ontology (LiMOn), a model that allows describing mashups and their components for integrating and sharing mashup information such as categorization or dependencies. The model has allowed the building of an integrated, centralized metadirectory of web components for query and selection, which has served to evaluate the model. The metadirectory allows accessing various heterogeneous repositories of mashups and web components while using external information from the Linked Data cloud, helping mashup development.