891 resultados para 080704 Information Retrieval and Web Search
Resumo:
OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.
Resumo:
Technological and cultural factors influence access to health information on the web in multifarious ways. We evaluated structural differences and availability of communication services on the web in three diverse language and cultural groups: Chinese, English, and Spanish. A total of 382 web sites were analyzed: 144 were English language sites (38%), 129 were Chinese language sites (34%), and 108 were Spanish language sites (28%). We did not find technical differences in the number of outgoing links per domain or the total availability of communication services between the three groups. There were differences in the distribution of available services between Chinese and English sites. In the Chinese sites, there were more communication services between consumers and health experts. Our results suggest that the health-related web presence of these three cultural groups is technologically comparable, but reflects differences that may be attributable to cultural factors.
Resumo:
BACKGROUND Anthelmintic drugs have been widely used in sheep as a cost-effective means for gastro-intestinal nematode (GIN) control. However, growing anthelmintic resistance (AHR) has created a compelling need to identify evidence-based management recommendations that reduce the risk of further development and impact of AHR. OBJECTIVE To identify, critically assess, and synthesize available data from primary research on factors associated with AHR in sheep. METHODS Publications reporting original observational or experimental research on selected factors associated with AHR in sheep GINs and published after 1974, were identified through two processes. Three electronic databases (PubMed, Agricola, CAB) and Web of Science (a collection of databases) were searched for potentially relevant publications. Additional publications were identified through consultation with experts, manual search of references of included publications and conference proceedings, and information solicited from small ruminant practitioner list-serves. Two independent investigators screened abstracts for relevance. Relevant publications were assessed for risk of systematic bias. Where sufficient data were available, random-effects Meta-Analyses (MAs) were performed to estimate the pooled Odds Ratio (OR) and 95% Confidence Intervals (CIs) of AHR for factors reported in ≥2 publications. RESULTS Of the 1712 abstracts screened for eligibility, 131 were deemed relevant for full publication review. Thirty publications describing 25 individual studies (15 observational studies, 7 challenge trials, and 3 controlled trials) were included in the qualitative synthesis and assessed for systematic bias. Unclear (i.e. not reported, or unable to assess) or high risk of selection bias and confounding bias was found in 93% (14/15) and 60% (9/15) of the observational studies, respectively, while unclear risk of selection bias was identified in all of the trials. Ten independent studies were included in the quantitative synthesis, and MAs were performed for five factors. Only high frequency of treatment was a significant risk factor (OR=4.39; 95% CI=1.59, 12.14), while the remaining 4 variables were marginally significant: mixed-species grazing (OR=1.63; 95% CI=0.66, 4.07); flock size (OR=1.02; 95% CI=0.97, 1.07); use of long-acting drug formulations (OR=2.85; 95% CI=0.79, 10.24); and drench-and-shift pasture management (OR=4.08; 95% CI=0.75, 22.16). CONCLUSIONS While there is abundant literature on the topic of AHR in sheep GINs, few studies have explicitly investigated the association between putative risk or protective factors and AHR. Consequently, several of the current recommendations on parasite management are not evidence-based. Moreover, many of the studies included in this review had a high or unclear risk of systematic bias, highlighting the need to improve study design and/or reporting of future research carried out in this field.
Resumo:
Background. Decision-making on reproductive issues is influenced by an interplay of individual, familial, medical, religious and socio-cultural factors. Women with chronic medical illnesses such an HIV infection and cancers are often fraught with decisional conflicts about child-bearing. With increase in the incidence of these illnesses as well as improvement in survival rates, there is a need to pay due attention to the issue of reproductive decision-making. Examining the prevalence and determinants of fertility desires in the two groups in a comparative manner would help bring to light perception of the medical community and the society in general on the two illnesses and the issue of motherhood. ^ Methods. Systematic literature search was undertaken using databases such as MEDLINE (PubMED), MEDLINE (Ovid), PsycInfo and Web of Science. Articles published in English and English language abstracts for foreign articles were included. Studies that explore ‘fertility desires’ as the outcome variable were included. Quantitative studies which have assessed the prevalence of fertility desires as well as qualitative studies which have provided a descriptive understanding of factors governing reproductive desires were included in the review. ^ Results. A total of 34 articles (29 studies examining HIV and 5 studies examining cancer in relation to fertility desires). Variables such as age, stage of illness, support of spouse and family, perception of the medical community and one’s own view of motherhood were key determinants among both groups. ^ Conclusion. There is a need for uniform, systematic research in this field. It is important that health care workers acknowledge these decisional conflicts, include them as part of the medical care of these patients and provide guidance with the right balance of information, practicality and compassion.^
Resumo:
Background: Hypertension and Diabetes is a public health and economic concern in the United States. The utilization of medical home concepts increases the receipt of preventive services, however, do they also increase adherence to treatments? This study examined the effect of patient-centered medical home technologies such as the electronic health record, clinical support system, and web-based care management in improving health outcomes related to hypertension and diabetes. Methods: A systematic review of the literature used a best evidence synthesis approach to address the general question " Do patient-centered medical home technologies have an effect of diabetes and hypertension treatment?" This was followed by an evaluation of specific examples of the technologies utilized such as computer-assisted recommendations and web-based care management provided by the patient's electronic health record. Ebsco host, Ovid host, and Google Scholar were the databases used to conduct the literature search. Results: The initial search identified over 25 studies based on content and quality that implemented technology interventions to improve communication between provider and patient. After further assessing the articles for risk of bias and study design, 13 randomized controlled studies were chosen. All of the studies chosen were conducted in various primary care settings in both private practices and hospitals between the years 2000 and 2007. The sample sizes of the studies ranged from 42 to 2924 participants. The mean age for all of the studies ranged from 56 to 71 years. The percent women in the studies ranged from one to 78 percent. Over one-third of the studies did not provide the racial composition of the participants. For the seven studies that did provide information about the ethnic composition, 64% of the intervention participants were White. All of the studies utilized some type of web-based or computer-based communication to manage hypertension or diabetes care. Findings on outcomes were mixed, with nine out of 13 studies showing no significant effect on outcomes examined, and four of the studies showing significant and positive impact on health outcomes related to hypertension or diabetes Conclusion: Although the technologies improved patient and provider satisfaction, the outcomes measures such as blood pressure control and glucose control were inconclusive. Further research is needed with diverse ethnic and SES population to investigate the role of patient-centered technologies on hypertension and diabetes control. Also, further research is needed to investigate the effects of innovative medical home technologies that can be used by both patients and providers to increase quality of communication concerning adherence to treatments.^
Resumo:
El municipio es considerado como un espacio donde sus habitantes comparten no sólo el territorio sino también los problemas y los recursos existentes. La institución municipal -como gobierno local- es el ámbito en el cual se toman decisiones sobre el territorio, que implican a sus habitantes. En cuanto a los actores, estos pueden ser funcionarios, empleados y la comunidad (individual y organizada en ongs), todos aportan sus conocimientos y valores, pero tienen diferentes intereses y diferentes tiempos. Vinculada a las decisiones, encontramos que la forma en que se gestiona la información territorial, es determinante si se pretende apuntar hacia acciones con impacto positivo, y sustentables en lo ambiental y en el tiempo. Este trabajo toma tres municipios: San Salvador de Jujuy, capital de la provincia localizada en los Valles Templados; San Pedro de Jujuy, principal municipio de la región de las Yungas y Tilcara en la Quebrada de Humahuaca. El aporte de la Inteligencia Territorial, a través del observatorio OIDTe, permite analizar los modos de gestión de la información, especialmente mediante el uso de las tecnologías de la información y comunicación (pagina web municipal, equipamiento informático en las oficinas, estrategias de comunicación y vinculación con la población) y mediante la organización de las estructuras administrativas (organigrama) por las cuales circula la información municipal. Además, con la participación enriquecedora de equipos multidisciplinarios en las diferentes etapas. Se busca, a partir de un diagnóstico, generar estrategias para la introducción de innovaciones con los propios actores municipales, a partir de las situaciones y modos culturales propios de cada lugar, incorporando los marcos conceptuales de la Inteligencia Territorial. En este sentido el OIDTe al promover el entendimiento entre los actores, institucionales y la sociedad, facilita la coordinación de diferentes intereses propiciando la toma de decisiones por acuerdos. Asimismo, el método Portulano, puede orientar la introducción de innovaciones en la coordinación de la información cartográfica, para que las diferentes oficinas puedan complementar sus aportes y la comunicación hacia fuera de la institución. En la fase de diagnóstico, se aplicaron entrevistas a informantes claves, se realizó un workshop con técnicos de planta permanente y funcionarios de áreas que manejan información territorial, y de planificación. También por la importancia de la capacidad instalada de recursos humanos, se analizó el nivel de instrucción y la capacitación con que cuenta el personal de planta permanente de cada área
Resumo:
El municipio es considerado como un espacio donde sus habitantes comparten no sólo el territorio sino también los problemas y los recursos existentes. La institución municipal -como gobierno local- es el ámbito en el cual se toman decisiones sobre el territorio, que implican a sus habitantes. En cuanto a los actores, estos pueden ser funcionarios, empleados y la comunidad (individual y organizada en ongs), todos aportan sus conocimientos y valores, pero tienen diferentes intereses y diferentes tiempos. Vinculada a las decisiones, encontramos que la forma en que se gestiona la información territorial, es determinante si se pretende apuntar hacia acciones con impacto positivo, y sustentables en lo ambiental y en el tiempo. Este trabajo toma tres municipios: San Salvador de Jujuy, capital de la provincia localizada en los Valles Templados; San Pedro de Jujuy, principal municipio de la región de las Yungas y Tilcara en la Quebrada de Humahuaca. El aporte de la Inteligencia Territorial, a través del observatorio OIDTe, permite analizar los modos de gestión de la información, especialmente mediante el uso de las tecnologías de la información y comunicación (pagina web municipal, equipamiento informático en las oficinas, estrategias de comunicación y vinculación con la población) y mediante la organización de las estructuras administrativas (organigrama) por las cuales circula la información municipal. Además, con la participación enriquecedora de equipos multidisciplinarios en las diferentes etapas. Se busca, a partir de un diagnóstico, generar estrategias para la introducción de innovaciones con los propios actores municipales, a partir de las situaciones y modos culturales propios de cada lugar, incorporando los marcos conceptuales de la Inteligencia Territorial. En este sentido el OIDTe al promover el entendimiento entre los actores, institucionales y la sociedad, facilita la coordinación de diferentes intereses propiciando la toma de decisiones por acuerdos. Asimismo, el método Portulano, puede orientar la introducción de innovaciones en la coordinación de la información cartográfica, para que las diferentes oficinas puedan complementar sus aportes y la comunicación hacia fuera de la institución. En la fase de diagnóstico, se aplicaron entrevistas a informantes claves, se realizó un workshop con técnicos de planta permanente y funcionarios de áreas que manejan información territorial, y de planificación. También por la importancia de la capacidad instalada de recursos humanos, se analizó el nivel de instrucción y la capacitación con que cuenta el personal de planta permanente de cada área
Resumo:
El municipio es considerado como un espacio donde sus habitantes comparten no sólo el territorio sino también los problemas y los recursos existentes. La institución municipal -como gobierno local- es el ámbito en el cual se toman decisiones sobre el territorio, que implican a sus habitantes. En cuanto a los actores, estos pueden ser funcionarios, empleados y la comunidad (individual y organizada en ongs), todos aportan sus conocimientos y valores, pero tienen diferentes intereses y diferentes tiempos. Vinculada a las decisiones, encontramos que la forma en que se gestiona la información territorial, es determinante si se pretende apuntar hacia acciones con impacto positivo, y sustentables en lo ambiental y en el tiempo. Este trabajo toma tres municipios: San Salvador de Jujuy, capital de la provincia localizada en los Valles Templados; San Pedro de Jujuy, principal municipio de la región de las Yungas y Tilcara en la Quebrada de Humahuaca. El aporte de la Inteligencia Territorial, a través del observatorio OIDTe, permite analizar los modos de gestión de la información, especialmente mediante el uso de las tecnologías de la información y comunicación (pagina web municipal, equipamiento informático en las oficinas, estrategias de comunicación y vinculación con la población) y mediante la organización de las estructuras administrativas (organigrama) por las cuales circula la información municipal. Además, con la participación enriquecedora de equipos multidisciplinarios en las diferentes etapas. Se busca, a partir de un diagnóstico, generar estrategias para la introducción de innovaciones con los propios actores municipales, a partir de las situaciones y modos culturales propios de cada lugar, incorporando los marcos conceptuales de la Inteligencia Territorial. En este sentido el OIDTe al promover el entendimiento entre los actores, institucionales y la sociedad, facilita la coordinación de diferentes intereses propiciando la toma de decisiones por acuerdos. Asimismo, el método Portulano, puede orientar la introducción de innovaciones en la coordinación de la información cartográfica, para que las diferentes oficinas puedan complementar sus aportes y la comunicación hacia fuera de la institución. En la fase de diagnóstico, se aplicaron entrevistas a informantes claves, se realizó un workshop con técnicos de planta permanente y funcionarios de áreas que manejan información territorial, y de planificación. También por la importancia de la capacidad instalada de recursos humanos, se analizó el nivel de instrucción y la capacitación con que cuenta el personal de planta permanente de cada área
Resumo:
Enriching knowledge bases with multimedia information makes it possible to complement textual descriptions with visual and audio information. Such complementary information can help users to understand the meaning of assertions, and in general improve the user experience with the knowledge base. In this paper we address the problem of how to enrich ontology instances with candidate images retrieved from existing Web search engines. DBpedia has evolved into a major hub in the Linked Data cloud, interconnecting millions of entities organized under a consistent ontology. Our approach taps into the Wikipedia corpus to gather context information for DBpedia instances and takes advantage of image tagging information when this is available to calculate semantic relatedness between instances and candidate images. We performed experiments with focus on the particularly challenging problem of highly ambiguous names. Both methods presented in this work outperformed the baseline. Our best method leveraged context words from Wikipedia, tags from Flickr and type information from DBpedia to achieve an average precision of 80%.
Resumo:
Cultural content on the Web is available in various domains (cultural objects, datasets, geospatial data, moving images, scholarly texts and visual resources), concerns various topics, is written in different languages, targeted to both laymen and experts, and provided by different communities (libraries, archives museums and information industry) and individuals (Figure 1). The integration of information technologies and cultural heritage content on the Web is expected to have an impact on everyday life from the point of view of institutions, communities and individuals. In particular, collaborative environment scan recreate 3D navigable worlds that can offer new insights into our cultural heritage (Chan 2007). However, the main barrier is to find and relate cultural heritage information by end-users of cultural contents, as well as by organisations and communities managing and producing them. In this paper, we explore several visualisation techniques for supporting cultural interfaces, where the role of metadata is essential for supporting the search and communication among end-users (Figure 2). A conceptual framework was developed to integrate the data, purpose, technology, impact, and form components of a collaborative environment, Our preliminary results show that collaborative environments can help with cultural heritage information sharing and communication tasks because of the way in which they provide a visual context to end-users. They can be regarded as distributed virtual reality systems that offer graphically realised, potentially infinite, digital information landscapes. Moreover, collaborative environments also provide a new way of interaction between an end-user and a cultural heritage data set. Finally, the visualisation of metadata of a dataset plays an important role in helping end-users in their search for heritage contents on the Web.
Resumo:
In the beginning of the 90s, ontology development was similar to an art: ontology developers did not have clear guidelines on how to build ontologies but only some design criteria to be followed. Work on principles, methods and methodologies, together with supporting technologies and languages, made ontology development become an engineering discipline, the so-called Ontology Engineering. Ontology Engineering refers to the set of activities that concern the ontology development process and the ontology life cycle, the methods and methodologies for building ontologies, and the tool suites and languages that support them. Thanks to the work done in the Ontology Engineering field, the development of ontologies within and between teams has increased and improved, as well as the possibility of reusing ontologies in other developments and in final applications. Currently, ontologies are widely used in (a) Knowledge Engineering, Artificial Intelligence and Computer Science, (b) applications related to knowledge management, natural language processing, e-commerce, intelligent information integration, information retrieval, database design and integration, bio-informatics, education, and (c) the Semantic Web, the Semantic Grid, and the Linked Data initiative. In this paper, we provide an overview of Ontology Engineering, mentioning the most outstanding and used methodologies, languages, and tools for building ontologies. In addition, we include some words on how all these elements can be used in the Linked Data initiative.
Resumo:
This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.
Resumo:
Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.
Resumo:
The emergence of cloud datacenters enhances the capability of online data storage. Since massive data is stored in datacenters, it is necessary to effectively locate and access interest data in such a distributed system. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These techniques cannot satisfy the requirements of content based image retrieval (CBIR). In this paper, we propose a scalable image retrieval framework which can efficiently support content similarity search and semantic search in the distributed environment. Its key idea is to integrate image feature vectors into distributed hash tables (DHTs) by exploiting the property of locality sensitive hashing (LSH). Thus, images with similar content are most likely gathered into the same node without the knowledge of any global information. For searching semantically close images, the relevance feedback is adopted in our system to overcome the gap between low-level features and high-level features. We show that our approach yields high recall rate with good load balance and only requires a few number of hops.
Resumo:
En la realización de este proyecto se ha tratado principalmente la temática del web scraping sobre documentos HTML en Android. Como resultado del mismo, se ha propuesto una metodología para poder realizar web scraping en aplicaciones implementadas para este sistema operativo y se desarrollará una aplicación basada en esta metodología que resulte útil a los alumnos de la escuela. Web scraping se puede definir como una técnica basada en una serie de algoritmos de búsqueda de contenido con el fin de obtener una determinada información de páginas web, descartando aquella que no sea relevante. Como parte central, se ha dedicado bastante tiempo al estudio de los navegadores y servidores Web, y del lenguaje HTML presente en casi todas las páginas web en la actualidad así como de los mecanismos utilizados para la comunicación entre cliente y servidor ya que son los pilares en los que se basa esta técnica. Se ha realizado un estudio de las técnicas y herramientas necesarias, aportándose todos los conceptos teóricos necesarios, así como la proposición de una posible metodología para su implementación. Finalmente se ha codificado la aplicación UPMdroid, desarrollada con el fin de ejemplificar la implementación de la metodología propuesta anteriormente y a la vez desarrollar una aplicación cuya finalidad es brindar al estudiante de la ETSIST un soporte móvil en Android que le facilite el acceso y la visualización de aquellos datos más importantes del curso académico como son: el horario de clases y las calificaciones de las asignaturas en las que se matricule. Esta aplicación, además de implementar la metodología propuesta, es una herramienta muy interesante para el alumno, ya que le permite utilizar de una forma sencilla e intuitiva gran número de funcionalidades de la escuela solucionando así los problemas de visualización de contenido web en los dispositivos. ABSTRACT. The main topic of this project is about the web scraping over HTML documents on Android OS. As a result thereof, it is proposed a methodology to perform web scraping in deployed applications for this operating system and based on this methodology that is useful to the ETSIST school students. Web scraping can be defined as a technique based on a number of content search algorithms in order to obtain certain information from web pages, discarding those that are not relevant. As a main part, has spent considerable time studying browsers and Web servers, and the HTML language that is present today in almost all websites as well as the mechanisms used for communication between client and server because they are the pillars which this technique is based. We performed a study of the techniques and tools needed, providing all the necessary theoretical concepts, as well as the proposal of a possible methodology for implementation. Finally it has codified UPMdroid application, developed in order to illustrate the implementation of the previously proposed methodology and also to give the student a mobile ETSIST Android support to facilitate access and display those most important data of the current academic year such as: class schedules and scores for the subjects in which you are enrolled. This application, in addition to implement the proposed methodology is also a very interesting tool for the student, as it allows a simple and intuitive way of use these school functionalities thus fixing the viewing web content on devices.