984 resultados para Unstructured content search


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Continuous content management of health information portals is a feature vital for its sustainability and widespread acceptance. Knowledge and experience of a domain expert is essential for content management in the health domain. The rate of generation of online health resources is exponential and thereby manual examination for relevance to a specific topic and audience is a formidable challenge for domain experts. Intelligent content discovery for effective content management is a less researched topic. An existing expert-endorsed content repository can provide the necessary leverage to automatically identify relevant resources and evaluate qualitative metrics.Objective: This paper reports on the design research towards an intelligent technique for automated content discovery and ranking for health information portals. The proposed technique aims to improve efficiency of the current mostly manual process of portal content management by utilising an existing expert-endorsed content repository as a supporting base and a benchmark to evaluate the suitability of new contentMethods: A model for content management was established based on a field study of potential users. The proposed technique is integral to this content management model and executes in several phases (ie, query construction, content search, text analytics and fuzzy multi-criteria ranking). The construction of multi-dimensional search queries with input from Wordnet, the use of multi-word and single-word terms as representative semantics for text analytics and the use of fuzzy multi-criteria ranking for subjective evaluation of quality metrics are original contributions reported in this paper.Results: The feasibility of the proposed technique was examined with experiments conducted on an actual health information portal, the BCKOnline portal. Both intermediary and final results generated by the technique are presented in the paper and these help to establish benefits of the technique and its contribution towards effective content management.Conclusions: The prevalence of large numbers of online health resources is a key obstacle for domain experts involved in content management of health information portals and websites. The proposed technique has proven successful at search and identification of resources and the measurement of their relevance. It can be used to support the domain expert in content management and thereby ensure the health portal is up-to-date and current.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bayesian networks (BNs) provide a statistical modelling framework which is ideally suited for modelling the many factors and components of complex problems such as healthcare-acquired infections. The methicillin-resistant Staphylococcus aureus (MRSA) organism is particularly troublesome since it is resistant to standard treatments for Staph infections. Overcrowding and understa�ng are believed to increase infection transmission rates and also to inhibit the effectiveness of disease control measures. Clearly the mechanisms behind MRSA transmission and containment are very complicated and control strategies may only be e�ective when used in combination. BNs are growing in popularity in general and in medical sciences in particular. A recent Current Content search of the number of published BN journal articles showed a fi�ve fold increase in general and a six fold increase in medical and veterinary science from 2000 to 2009. This chapter introduces the reader to Bayesian network (BN) modelling and an iterative modelling approach to build and test the BN created to investigate the possible role of high bed occupancy on transmission of MRSA while simultaneously taking into account other risk factors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although blogs exist from the beginning of the Internet, their use has considerablybeen increased in the last decade. Nowadays, they are ready for being used bya broad range of people. From teenagers to multinationals, everyone can have aglobal communication space.Companies know blogs are a valuable publicity tool to share information withthe participants, and the importance of creating consumer communities aroundthem: participants come together to exchange ideas, review and recommend newproducts, and even support each other. Also, companies can use blogs for differentpurposes, such as a content management system to manage the content of websites,a bulletin board to support communication and document sharing in teams,an instrument in marketing to communicate with Internet users, or a KnowledgeManagement Tool. However, an increasing number of blog content do not findtheir source in the personal experiences of the writer. Thus, the information cancurrently be kept in the user¿s desktop documents, in the companies¿ catalogues,or in another blogs. Although the gap between blog and data source can be manuallytraversed in a manual coding, this is a cumbersome task that defeats the blog¿seasiness principle. Moreover, depending on the quantity of information and itscharacterisation (i.e., structured content, unstructured content, etc.), an automaticapproach can be more effective.Based on these observations, the aim of this dissertation is to assist blog publicationthrough annotation, model transformation and crossblogging techniques.These techniques have been implemented to give rise to Blogouse, Catablog, andBlogUnion. These tools strive to improve the publication process considering theaforementioned data sources.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En la realización de este proyecto se ha tratado principalmente la temática del web scraping sobre documentos HTML en Android. Como resultado del mismo, se ha propuesto una metodología para poder realizar web scraping en aplicaciones implementadas para este sistema operativo y se desarrollará una aplicación basada en esta metodología que resulte útil a los alumnos de la escuela. Web scraping se puede definir como una técnica basada en una serie de algoritmos de búsqueda de contenido con el fin de obtener una determinada información de páginas web, descartando aquella que no sea relevante. Como parte central, se ha dedicado bastante tiempo al estudio de los navegadores y servidores Web, y del lenguaje HTML presente en casi todas las páginas web en la actualidad así como de los mecanismos utilizados para la comunicación entre cliente y servidor ya que son los pilares en los que se basa esta técnica. Se ha realizado un estudio de las técnicas y herramientas necesarias, aportándose todos los conceptos teóricos necesarios, así como la proposición de una posible metodología para su implementación. Finalmente se ha codificado la aplicación UPMdroid, desarrollada con el fin de ejemplificar la implementación de la metodología propuesta anteriormente y a la vez desarrollar una aplicación cuya finalidad es brindar al estudiante de la ETSIST un soporte móvil en Android que le facilite el acceso y la visualización de aquellos datos más importantes del curso académico como son: el horario de clases y las calificaciones de las asignaturas en las que se matricule. Esta aplicación, además de implementar la metodología propuesta, es una herramienta muy interesante para el alumno, ya que le permite utilizar de una forma sencilla e intuitiva gran número de funcionalidades de la escuela solucionando así los problemas de visualización de contenido web en los dispositivos. ABSTRACT. The main topic of this project is about the web scraping over HTML documents on Android OS. As a result thereof, it is proposed a methodology to perform web scraping in deployed applications for this operating system and based on this methodology that is useful to the ETSIST school students. Web scraping can be defined as a technique based on a number of content search algorithms in order to obtain certain information from web pages, discarding those that are not relevant. As a main part, has spent considerable time studying browsers and Web servers, and the HTML language that is present today in almost all websites as well as the mechanisms used for communication between client and server because they are the pillars which this technique is based. We performed a study of the techniques and tools needed, providing all the necessary theoretical concepts, as well as the proposal of a possible methodology for implementation. Finally it has codified UPMdroid application, developed in order to illustrate the implementation of the previously proposed methodology and also to give the student a mobile ETSIST Android support to facilitate access and display those most important data of the current academic year such as: class schedules and scores for the subjects in which you are enrolled. This application, in addition to implement the proposed methodology is also a very interesting tool for the student, as it allows a simple and intuitive way of use these school functionalities thus fixing the viewing web content on devices.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Colombia atraviesa un proceso de desmovilización y una de las metas es la reintegración laboral, entendida como el proceso a través del cual las personas que han hecho parte de un grupo armado ilegal obtienen un empleo y se insertan definitivamente a la sociedad. El presente estudio tiene como objetivo fundamental comprender las actitudes de un grupo de tres directivos hacia la vinculación laboral de las personas en proceso de reintegración laboral (PPR), mediante un diseño cualitativo. Para ello, se llevó a cabo una serie de entrevistas semiestructuradas a una muestra de tres directivos del sector público y privado. La información obtenida se analizó mediante un proceso de codificación axial. Los resultados obtenidos evidencian que las actitudes de los tres empresarios frente a la contratación de personas en proceso de reintegración laboral, pueden ser positivas o negativas. Así mismo, una de las actitudes predominantes, son la evaluación de creencias y prejuicios de los empresarios frente al proceso de integración laboral, estos son: la incertidumbre frente al desempeño laboral del PPR, la falta de dedicación por parte del PPR, los posibles conflictos laborales y la dificultad de relacionamiento del PPR. En conclusión, el modelo del comportamiento organizacional juega un papel muy importante, dado que abarca los elementos que influyen y determinan la construcción de las actitudes. Estas guían la evaluación de conductas que pueden ser a favor o en contra, de diversos ámbitos del proceso de contratación de personas desmovilizadas.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. As increasingly more material becomes available electronically the influence of search engines on our lives will continue to grow. This presents the problem of how to find what information is contained in each search engine, what bias a search engine may have, and how to select the best search engine for a particular information need. This research introduces a new method, search engine content analysis, in order to solve the above problem. Search engine content analysis is a new development of traditional information retrieval field called collection selection, which deals with general information repositories. Current research in collection selection relies on full access to the collection or estimations of the size of the collections. Also collection descriptions are often represented as term occurrence statistics. An automatic ontology learning method is developed for the search engine content analysis, which trains an ontology with world knowledge of hundreds of different subjects in a multilevel taxonomy. This ontology is then mined to find important classification rules, and these rules are used to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Instead of representing collections as a set of terms, which commonly occurs in collection selection, they are represented as a set of subjects, leading to a more robust representation of information and a decrease of synonymy. The ontology based method was compared with ReDDE (Relevant Document Distribution Estimation method for resource selection) using the standard R-value metric, with encouraging results. ReDDE is the current state of the art collection selection method which relies on collection size estimation. The method was also used to analyse the content of the most popular search engines in use today, including Google and Yahoo. In addition several specialist search engines such as Pubmed and the U.S. Department of Agriculture were analysed. In conclusion, this research shows that the ontology based method mitigates the need for collection size estimation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations â in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Images represent a valuable source of information for the construction industry. Due to technological advancements in digital imaging, the increasing use of digital cameras is leading to an ever-increasing volume of images being stored in construction image databases and thus makes it hard for engineers to retrieve useful information from them. Content-Based Search Engines are tools that utilize the rich image content and apply pattern recognition methods in order to retrieve similar images. In this paper, we illustrate several project management tasks and show how Content-Based Search Engines can facilitate automatic retrieval, and indexing of construction images in image databases.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Search filters are combinations of words and phrases designed to retrieve an optimal set of records on a particular topic (subject filters) or study design (methodological filters). Information specialists are increasingly turning to reusable filters to focus their searches. However, the extent of the academic literature on search filters is unknown. We provide a broad overview to the academic literature on search filters. <br/>Objectives: To map the academic literature on search filters from 2004 to 2015 using a novel form of content analysis. <br/>Methods: We conducted a comprehensive search for literature between 2004 and 2015 across eight databases using a subjectively derived search strategy. We identified key words from titles, grouped them into categories, and examined their frequency and co-occurrences. <br/>Results: The majority of records were housed in Embase (n = 178) and MEDLINE (n = 154). Over the last decade, both databases appeared to exhibit a bimodal distribution with the number of publications on search filters rising until 2006, before dipping in 2007, and steadily increasing until 2012. Few articles appeared in social science databases over the same time frame (e.g. Social Services Abstracts, n = 3). <br/>Unsurprisingly, the term â˜searchâ appeared in most titles, and quite often, was used as a noun adjunct for the word 'filter' and â˜strategyâ. Across the papers, the purpose of searches as a means of 'identifying' information and gathering â˜evidenceâ from 'databases' emerged quite strongly. Other terms relating to the methodological assessment of search filters, such as precision and validation, also appeared albeit less frequently. <br/>Conclusions: Our findings show surprising commonality across the papers with regard to the literature on search filters. Much of the literature seems to be focused on developing search filters to identify and retrieve information, as opposed to testing or validating such filters. Furthermore, the literature is mostly housed in health-related databases, namely MEDLINE, CINAHL, and Embase, implying that it is medically driven. Relatively few papers focus on the use of search filters in the social sciences.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

There are three key driving forces behind the development of Internet Content Management Systems (CMS) - a desire to manage the explosion of content, a desire to provide structure and meaning to content in order to make it accessible, and a desire to work collaboratively to manipulate content in some meaningful way. Yet the traditional CMS has been unable to meet the latter of these requirements, often failing to provide sufficient tools for collaboration in a distributed context. Peer-to-Peer (P2P) systems are networks in which every node is an equal participant (whether transmitting data, exchanging content, or invoking services) and there is an absence of any centralised administrative or coordinating authorities. P2P systems are inherently more scalable than equivalent client-server implementations as they tend to use resources at the edge of the network much more effectively. This paper details the rationale and design of a P2P middleware for collaborative content management.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This presentation was offered as part of the CUNY Library Assessment Conference, Reinventing Libraries: Reinventing Assessment, held at the City University of New York in June 2014.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Peer-to-peer (P2P) networks are gaining increased attention from both the scientific community and the larger Internet user community. Data retrieval algorithms lie at the center of P2P networks, and this paper addresses the problem of efficiently searching for files in unstructured P2P systems. We propose an Improved Adaptive Probabilistic Search (IAPS) algorithm that is fully distributed and bandwidth efficient. IAPS uses ant-colony optimization and takes file types into consideration in order to search for file container nodes with a high probability of success. We have performed extensive simulations to study the performance of IAPS, and we compare it with the Random Walk and Adaptive Probabilistic Search algorithms. Our experimental results show that IAPS achieves high success rates, high response rates, and significant message reduction.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Internet growth has provoked that information search had come to have one of the most relevant roles in the industry and to be one of the most current topics in research environments. Internet is the largest information container in history and its facility to generate new information leads to new challenges when talking about retrieving information and discern which one is more relevant than the rest. Parallel to the information growth in quantity, the way information is provided has also changed. One of these changes that has provoked more information traffic has been the emergence of social networks. We have seen how social networks can provoke more traffic than search engines themselves. We can draw conclusions that allow us to take a new approach to the information retrieval problem. Public trusts the most information coming from known contacts. In this document we will explore a possible change in classic search engines to bring them closer to the social side and adquire those social advantages.