21 resultados para Vector Space IR, Search Engines, Document Clustering, Document
em Universitat de Girona, Spain
Resumo:
When publishing information on the web, one expects it to reach all the people that could be interested in. This is mainly achieved with general purpose indexing and search engines like Google which is the most used today. In the particular case of geographic information (GI) domain, exposing content to mainstream search engines is a complex task that needs specific actions. In many occasions it is convenient to provide a web site with a specially tailored search engine. Such is the case for on-line dictionaries (wikipedia, wordreference), stores (amazon, ebay), and generally all those holding thematic databases. Due to proliferation of these engines, A9.com proposed a standard interface called OpenSearch, used by modern web browsers to manage custom search engines. Geographic information can also benefit from the use of specific search engines. We can distinguish between two main approaches in GI retrieval information efforts: Classical OGC standardization on one hand (CSW, WFS filters), which are very complex for the mainstream user, and on the other hand the neogeographer’s approach, usually in the form of specific APIs lacking a common query interface and standard geographic formats. A draft ‘geo’ extension for OpenSearch has been proposed. It adds geographic filtering for queries and recommends a set of simple standard response geographic formats, such as KML, Atom and GeoRSS. This proposal enables standardization while keeping simplicity, thus covering a wide range of use cases, in both OGC and the neogeography paradigms. In this article we will analyze the OpenSearch geo extension in detail and its use cases, demonstrating its applicability to both the SDI and the geoweb. Open source implementations will be presented as well
Resumo:
Tenint en compte l’evolució a Internet dels portals d’informació dels mitjans de comunicació, sorgeix la idea d’un motor de cerca orientat a la recaptació de notícies dispersades per les diferents pàgines web dels grans mitjans de comunicació espanyols, que permetés obtenir informació sobre “descriptors contractats” pels usuaris d’un portal. El primer objectiu és l’anàlisi de les necessitats que es volen cobrir per a un hipotètic client de l’aplicació, el segon és en l’àmbit algorítmic, cal obtenir una metodologia de treball que permeti l’obtenció de la notícia. En l’àmbit de la programació es consideren tres etapes: descarregar les pàgines web necessàries, que es farà mitjançant les eines que proporciona la llibreria cUrl; l’anàlisi de les notícies (obtenir tots els enllaços que corresponen a notícies, filtrar els descriptors per decidir si cal guardar la notícia, analitzar l’estructura interna de les notícies seleccionades per guardar-ne només les parts establertes), i la base de dades que ens ha de permetre organitzar i gestionar les notícies escollides
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
L’objectiu dels Serveis Intel·ligents d’Atenció Ciutadana (SAC) és donar resposta a les necessitats d'informació dels ciutadans sobre els serveis i les actuacions del municipi i, per extensió, del conjunt del serveis d'interès ciutadà. Des que l’ iSAC s’ha posat en funcionament, periòdicament s’analitzen les consultes que es fan en el sistema i el grau de satisfacció que la ciutadania té d’aquest servei. Tot i que en general les valoracions són satisfactòries s’ha observat que actualment aquest sistema té un buit, hi ha un ampli ventall de respostes que, de moment, l’iSAC no és capaç de resoldre, i possiblement el 010, el call center del servei d’atenció ciutadana, tampoc. Algunes de les cerques realitzades marxen molt de l’àmbit municipal i és l’experiència de la mateixa ciutadania la que pot oferir un millor resultat. És per aquest motiu que ha sorgit la necessitat de crear wikiSAC. Eina que te com a principals objectius que: poder crear, modificar i eliminar el contingut d’una pàgina de forma interactiva de manera fàcil i ràpida a través d’un navegador web; controlar els continguts ofensius i malintencionats; conservar un historial de canvis; incentivar la participació ciutadana i aconseguir que aquest sigui un lloc on els ciutadans preguntin, suggereixin i opinin sobre temes relacionats amb el seu municipi i aconseguir que els ciutadans es sentin més integrats amb el funcionament de l’administració, col∙laborant en les tasques d’informació i atenció ciutadana
Resumo:
El projecte iSAC (Servei Intel·ligent d’Atenció Ciutadana via web) es va iniciar el mes de gener de 2006 amb l’ajut del nou coneixement científic en agents intel·ligents, junt amb l’aplicació de les Tecnologies de la Informació i la Comunicació (TIC) i els cercadors. Actualment, el servei actual d’atenció al ciutadà està composat per dues àrees: l’atenció directa a les oficines i l’atenció telefònica a través del Call Center. Les limitacions de personal i horari d’atenció fan que aquest servei perdi eficàcia. Es vol desenvolupar un producte amb una tecnologia capaç d’ampliar i millorar la capacitat i la qualitat de l’atenció ciutadana en les administracions públiques, sigui quina sigui la seva dimensió. Tot i això, aquest projecte l’explotaran especialment els ajuntaments, als quals la ciutadania s'acosta amb tot tipus de preguntes i dubtes, habitualment no restringides a l'àmbit local. Més concretament, es vol automatitzar a través d’un portal web l’atenció al ciutadà per tal d’obtenir un servei més efectiu
Resumo:
The incorporation of space allows the establishment of a more precise relationship between a contaminating input, a contaminating byproduct and emissions that reach the final receptor. However, the presence of asymmetric information impedes the implementation of the first-best policy. As a solution to this problem a site specific deposit refund system for the contaminating input and the contaminating byproduct are proposed. Moreover, the utilization of a successive optimization technique first over space and second over time enables definition of the optimal intertemporal site specific deposit refund system
Resumo:
Estudi, disseny i implementació de diferents tècniques d’agrupament de fibres (clustering) per tal d’integrar a la plataforma DTIWeb diferents algorismes de clustering i tècniques de visualització de clústers de fibres de forma que faciliti la interpretació de dades de DTI als especialistes
Resumo:
Ethernet is becoming the dominant aggregation technology for carrier transport networks; however, as it is a LAN technology, native bridged ethernet does not fulfill all the carrier requirements. One of the schemes proposed by the research community to make ethernet fulfill carrier requirements is ethernet VLAN-label switching (ELS). ELS allows the creation of label switched data paths using a 12-bit label encoded in the VLAN TAG control information field. Previous label switching technologies such as MPLS use more bits for encoding the label. Hence, they do not suffer from label sparsity issues as ELS might. This paper studies the sparsity issues resulting from the reduced ELS VLAN-label space and proposes the use of the label merging technique to improve label space usage. Experimental results show that label merging considerably improves label space usage
Resumo:
This paper proposes a high-level reinforcement learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, when using RL, has been to apply value function based algorithms, the system here detailed is characterized by the use of direct policy search methods. Rather than approximating a value function, these methodologies approximate a policy using an independent function approximator with its own parameters, trying to maximize the future expected reward. The policy based algorithm presented in this paper is used for learning the internal state/action mapping of a behavior. In this preliminary work, we demonstrate its feasibility with simulated experiments using the underwater robot GARBI in a target reaching task
A new approach to segmentation based on fusing circumscribed contours, region growing and clustering
Resumo:
One of the major problems in machine vision is the segmentation of images of natural scenes. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. The main contours of the scene are detected and used to guide the posterior region growing process. The algorithm places a number of seeds at both sides of a contour allowing stating a set of concurrent growing processes. A previous analysis of the seeds permits to adjust the homogeneity criterion to the regions's characteristics. A new homogeneity criterion based on clustering analysis and convex hull construction is proposed
Resumo:
In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach
Resumo:
The objective of traffic engineering is to optimize network resource utilization. Although several works have been published about minimizing network resource utilization, few works have focused on LSR (label switched router) label space. This paper proposes an algorithm that takes advantage of the MPLS label stack features in order to reduce the number of labels used in LSPs. Some tunnelling methods and their MPLS implementation drawbacks are also discussed. The described algorithm sets up NHLFE (next hop label forwarding entry) tables in each LSR, creating asymmetric tunnels when possible. Experimental results show that the described algorithm achieves a great reduction factor in the label space. The presented works apply for both types of connections: P2MP (point-to-multipoint) and P2P (point-to-point)
Resumo:
The aim of traffic engineering is to optimise network resource utilization. Although several works on minimizing network resource utilization have been published, few works have focused on LSR label space. This paper proposes an algorithm that uses MPLS label stack features in order to reduce the number of labels used in LSPs forwarding. Some tunnelling methods and their MPLS implementation drawbacks are also discussed. The algorithm described sets up the NHLFE tables in each LSR, creating asymmetric tunnels when possible. Experimental results show that the algorithm achieves a large reduction factor in the label space. The work presented here applies for both types of connections: P2MP and P2P
Resumo:
Traffic Engineering objective is to optimize network resource utilization. Although several works have been published about minimizing network resource utilization in MPLS networks, few of them have been focused in LSR label space reduction. This letter studies Asymmetric Merged Tunneling (AMT) as a new method for reducing the label space in MPLS network. The proposed method may be regarded as a combination of label merging (proposed in the MPLS architecture) and asymmetric tunneling (proposed recently in our previous works). Finally, simulation results are performed by comparing AMT with both ancestors. They show a great improvement in the label space reduction factor
Resumo:
Most network operators have considered reducing LSR label spaces (number of labels used) as a way of simplifying management of underlaying virtual private networks (VPNs) and therefore reducing operational expenditure (OPEX). The IETF outlined the label merging feature in MPLS-allowing the configuration of multipoint-to-point connections (MP2P)-as a means of reducing label space in LSRs. We found two main drawbacks in this label space reduction a)it should be separately applied to a set of LSPs with the same egress LSR-which decreases the options for better reductions, and b)LSRs close to the edge of the network experience a greater label space reduction than those close to the core. The later implies that MP2P connections reduce the number of labels asymmetrically