866 resultados para topic, web information gathering, web personalization
Resumo:
O surgir da World Wide Web providenciou aos utilizadores uma série de oportunidades no que diz respeito ao acesso a dados e informação. Este acesso tornou-se um ato banal para qualquer utilizador da Web, tanto pelo utilizador comum como por outros mais experientes, tanto para obter informações básicas, como outras informações mais complexas. Todo este avanço tecnológico permitiu que os utilizadores tivessem acesso a uma vasta quantidade de informação, dispersa pelo globo, não tendo, na maior parte das vezes, a informação qualquer tipo de ligação entre si. A necessidade de se obter informação de interesse relativamente a determinado tema, mas tendo que recorrer a diversas fontes para obter toda a informação que pretende obter e comparar, torna-se um processo moroso para o utilizador. Pretende-se que este processo de recolha de informação de páginas web seja o mais automatizado possível, dando ao utilizador a possibilidade de utilizar algoritmos e ferramentas de análise e processamento automáticas, reduzindo desta forma o tempo e esforço de realização de tarefas sobre páginas web. Este processo é denominado Web Scraping. Neste trabalho é descrita uma arquitetura de sistema de web scraping automático e configurável baseado em tecnologias existentes, nomeadamente no contexto da web semântica. Para tal o trabalho desenvolvido analisa os efeitos da aplicação do Web Scraping percorrendo os seguintes pontos: • Identificação e análise de diversas ferramentas de web scraping; • Identificação do processo desenvolvido pelo ser humano complementar às atuais ferramentas de web scraping; • Design duma arquitetura complementar às ferramentas de web scraping que dê apoio ao processo de web scraping do utilizador; • Desenvolvimento dum protótipo baseado em ferramentas e tecnologias existentes; • Realização de experiências no domínio de aplicação de páginas de super-mercados portugueses; • Analisar resultados obtidos a partir destas.
Resumo:
BACKGROUND: Many users search the Internet for answers to health questions. Complementary and alternative medicine (CAM) is a particularly common search topic. Because many CAM therapies do not require a clinician's prescription, false or misleading CAM information may be more dangerous than information about traditional therapies. Many quality criteria have been suggested to filter out potentially harmful online health information. However, assessing the accuracy of CAM information is uniquely challenging since CAM is generally not supported by conventional literature. OBJECTIVE: The purpose of this study is to determine whether domain-independent technical quality criteria can identify potentially harmful online CAM content. METHODS: We analyzed 150 Web sites retrieved from a search for the three most popular herbs: ginseng, ginkgo and St. John's wort and their purported uses on the ten most commonly used search engines. The presence of technical quality criteria as well as potentially harmful statements (commissions) and vital information that should have been mentioned (omissions) was recorded. RESULTS: Thirty-eight sites (25%) contained statements that could lead to direct physical harm if acted upon. One hundred forty five sites (97%) had omitted information. We found no relationship between technical quality criteria and potentially harmful information. CONCLUSIONS: Current technical quality criteria do not identify potentially harmful CAM information online. Consumers should be warned to use other means of validation or to trust only known sites. Quality criteria that consider the uniqueness of CAM must be developed and validated.
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
Resumo:
Electrical activity is extremely broad and distinct, requiring by one hand, a deep knowledge on rules, regulations, materials, equipments, technical solutions and technologies and assistance in several areas, as electrical equipment, telecommunications, security and efficiency and rational use of energy, on the other hand, also requires other skills, depending on the specific projects to be implemented, being this knowledge a characteristic that belongs to the professionals with relevant experience, in terms of complexity and specific projects that were made.
Resumo:
Web tornou-se uma ferramenta indispensável para a sociedade moderna. A capacidade de aceder a enormes quantidades de informação, disponível em praticamente todo o mundo, é uma grande vantagem para as nossas vidas. No entanto, a quantidade avassaladora de informação disponível torna-se um problema, que é o de encontrar a informação que precisamos no meio de muita informação irrelevante. Para nos ajudar nesta tarefa, foram criados poderosos motores de pesquisa online, que esquadrinham a Web à procura dos melhores resultados, segundo os seus critérios, para os dados que precisamos. Actualmente, os motores de pesquisa em voga, usam um formato de apresentação de resultados simples, que consiste apenas numa caixa de texto para o utilizador inserir as palavras-chave sobre o tema que quer pesquisar e os resultados são dispostos sobre uma lista de hiperligações ordenada pela relevância que o motor atribui a cada resultado. Porém, existem outras formas de apresentar resultados. Uma das alternativas é apresentar os resultados sobre interfaces em 3 dimensões. É nestes tipos de sistemas que este trabalho vai incidir, os motores de pesquisa com interfaces em 3 dimensões. O problema é que as páginas Web não estão preparadas para serem consumidas por este tipo de motores de pesquisa. Para resolver este problema foi construído um modelo generalista para páginas Web, que consegue alimentar os requisitos das diversas variantes destes motores de pesquisa. Foi também desenvolvido um protótipo de instanciação automático, que recolhe as informações necessárias das páginas Web e preenche o modelo.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
In the last few years, we have observed an exponential increasing of the information systems, and parking information is one more example of them. The needs of obtaining reliable and updated information of parking slots availability are very important in the goal of traffic reduction. Also parking slot prediction is a new topic that has already started to be applied. San Francisco in America and Santander in Spain are examples of such projects carried out to obtain this kind of information. The aim of this thesis is the study and evaluation of methodologies for parking slot prediction and the integration in a web application, where all kind of users will be able to know the current parking status and also future status according to parking model predictions. The source of the data is ancillary in this work but it needs to be understood anyway to understand the parking behaviour. Actually, there are many modelling techniques used for this purpose such as time series analysis, decision trees, neural networks and clustering. In this work, the author explains the best techniques at this work, analyzes the result and points out the advantages and disadvantages of each one. The model will learn the periodic and seasonal patterns of the parking status behaviour, and with this knowledge it can predict future status values given a date. The data used comes from the Smart Park Ontinyent and it is about parking occupancy status together with timestamps and it is stored in a database. After data acquisition, data analysis and pre-processing was needed for model implementations. The first test done was with the boosting ensemble classifier, employed over a set of decision trees, created with C5.0 algorithm from a set of training samples, to assign a prediction value to each object. In addition to the predictions, this work has got measurements error that indicates the reliability of the outcome predictions being correct. The second test was done using the function fitting seasonal exponential smoothing tbats model. Finally as the last test, it has been tried a model that is actually a combination of the previous two models, just to see the result of this combination. The results were quite good for all of them, having error averages of 6.2, 6.6 and 5.4 in vacancies predictions for the three models respectively. This means from a parking of 47 places a 10% average error in parking slot predictions. This result could be even better with longer data available. In order to make this kind of information visible and reachable from everyone having a device with internet connection, a web application was made for this purpose. Beside the data displaying, this application also offers different functions to improve the task of searching for parking. The new functions, apart from parking prediction, were: - Park distances from user location. It provides all the distances to user current location to the different parks in the city. - Geocoding. The service for matching a literal description or an address to a concrete location. - Geolocation. The service for positioning the user. - Parking list panel. This is not a service neither a function, is just a better visualization and better handling of the information.
Resumo:
ABSTRACT: In order to evaluate the one-year evolution of web-based information on alcohol dependence, we re-assessed alcohol-related sites in July 2007 with the same evaluating tool that had been used to assess these sites in June 2006. Websites were assessed with a standardized form designed to rate sites on the basis of accountability, presentation, interactivity, readability, and content quality. The DISCERN scale was also used, which aimed to assist persons without content expertise in assessing the quality of written health publications. Scores were highly stable for all components of the form one year later (r = .77 to .95, p < .01). Analysis of variance for repeated measures showed no time effect, no interaction between time and scale, no interaction between time and group (affiliation categories), and no interaction between time, group, and scale. The study highlights lack of change of alcohol-dependence-related web pages across one year.
Resumo:
Aquesta memòria tracta sobre el procediment de creació d’una aplicació web de notícies. Està dividida en 3 zones, una on usuaris amb permisos d’administració poden penjar notícies per ser visualitzades per tothom, una altra que s’hi accedeix si s’és usuari registrat i permet visualitzar noticies d’altres servidors mitjançant el format de dades RSS, i un tercer apartat de gestió administrativa, incorporar noves notícies, modificar-ne de presents o introduir noves pàgines web que continguin notícies. Els usuaris registrats podran seleccionar el diaris dels quals rebran informació, així com especificar quines temàtiques prefereixen en la cerca de notícies.
Resumo:
Rapport de synthèse : Introduction : Internet est une source importante d'information sur la santé mentale. Le trouble bipolaire est communément associé à un handicap, des comorbidités, un faible taux d'introspection et une mauvaise compliance au traitement. Le fardeau de la maladie, de par les épisodes dépressifs et maniaques, peut conduire les personnes (dont le diagnostic de trouble bipolaire a été déjà posé ou non), ainsi que leur famille à rechercher des informations sur Internet. De ce fait, il est important que les sites Web traitant du sujet contiennent de l'information de haute qualité, basée sur les évidences scientifiques. Objectif.: évaluer la qualité des informations consultables sur Internat au sujet du trouble bipolaire et identifier des indicateurs de qualité. Méthode: deux mots-clés : « bipolar disorder » et « manic depressive illness » ont été introduits dans les moteurs de recherche les plus souvent utilisés sur Internet. Les sites Internet ont été évalués avec un formulaire standard conçu pour noter les sites sur la base de l'auteur (privé, université, entreprise,...), la présentation, l'interactivité, la lisibilité et la qualité du contenu. Le label de qualité « Health On the Net» (HON), et l'outil DISCERN ont été utilisés pour vérifier leur efficacité comme indicateurs de la qualité. Résultats: sur les 80 sites identifiés, 34 ont été inclus. Sur la base de la mesure des résultats, la qualité du contenu des sites s'est avérée être bonne. La qualité du contenu des sites Web qui traitent du trouble bipolaire est expliquée de manière significative par la lisibilité, la responsabilité et l'interactivité aussi bien que par un score global. Conclusions: dans l'ensemble, la qualité du contenu de l'étude des sites Web traitant du trouble bipolaire est de bonne qualité.
Resumo:
Internet is increasingly used as a source of information on health issues and is probably a major source of patients' empowerment. This process is however limited by the frequently poor quality of web-based health information designed for consumers. A better diffusion of information about criteria defining the quality of the content of websites, and about useful methods designed for searching such needed information, could be particularly useful to patients and their relatives. A brief, six-items DISCERN version, characterized by a high specificity for detecting websites with good or very good content quality was recently developed. This tool could facilitate the identification of high-quality information on the web by patients and may improve the empowerment process initiated by the development of the health-related web.
Resumo:
BACKGROUND: The Internet is increasingly used as a source of information for mental health issues. The burden of obsessive compulsive disorder (OCD) may lead persons with diagnosed or undiagnosed OCD, and their relatives, to search for good quality information on the Web. This study aimed to evaluate the quality of Web-based information on English-language sites dealing with OCD and to compare the quality of websites found through a general and a medically specialized search engine. METHODS: Keywords related to OCD were entered into Google and OmniMedicalSearch. Websites were assessed on the basis of accountability, interactivity, readability, and content quality. The "Health on the Net" (HON) quality label and the Brief DISCERN scale score were used as possible content quality indicators. Of the 235 links identified, 53 websites were analyzed. RESULTS: The content quality of the OCD websites examined was relatively good. The use of a specialized search engine did not offer an advantage in finding websites with better content quality. A score ≥16 on the Brief DISCERN scale is associated with better content quality. CONCLUSION: This study shows the acceptability of the content quality of OCD websites. There is no advantage in searching for information with a specialized search engine rather than a general one. Practical implications: The Internet offers a number of high quality OCD websites. It remains critical, however, to have a provider-patient talk about the information found on the Web.