948 resultados para Web search engines
Resumo:
This paper describes the implementation of a semantic web search engine on conversation styled transcripts. Our choice of data is Hansard, a publicly available conversation style transcript of parliamentary debates. The current search engine implementation on Hansard is limited to running search queries based on keywords or phrases hence lacks the ability to make semantic inferences from user queries. By making use of knowledge such as the relationship between members of parliament, constituencies, terms of office, as well as topics of debates the search results can be improved in terms of both relevance and coverage. Our contribution is not algorithmic instead we describe how we exploit a collection of external data sources, ontologies, semantic web vocabularies and named entity extraction in the analysis of underlying semantics of user queries as well as the semantic enrichment of the search index thereby improving the quality of results.
Resumo:
This study examines the evolution of prices in markets with Internet price-comparison search engines. The empirical study analyzes laboratory data of prices available to informed consumers, for two industry sizes and two conditions on the sample (complete and incomplete). Distributions are typically bimodal. One of the two modes of distribution, corresponding to monopoly pricing, tends to attract such pricing strategies increasingly over time. The second one, corresponding to interior pricing, follows a decreasing trend. Monopoly pricing can serve as a means of insurance against more competitive (but riskier) behavior. In fact, experimental subjects who initially earn low profits due to interior pricing are more likely to switch to monopoly pricing than subjects who experience good returns from the start.
Resumo:
This article is concerned with the liability of search engines for algorithmically produced search suggestions, such as through Google’s ‘autocomplete’ function. Liability in this context may arise when automatically generated associations have an offensive or defamatory meaning, or may even induce infringement of intellectual property rights. The increasing number of cases that have been brought before courts all over the world puts forward questions on the conflict of fundamental freedoms of speech and access to information on the one hand, and personality rights of individuals— under a broader right of informational self-determination—on the other. In the light of the recent judgment of the Court of Justice of the European Union (EU) in Google Spain v AEPD, this article concludes that many requests for removal of suggestions including private individuals’ information will be successful on the basis of EU data protection law, even absent prejudice to the person concerned.
Resumo:
No julgamento do recurso especial referente à ação ajuizada pela apresentadora Xuxa Meneghel para compelir o Google Search a desvincular dos seus índices de busca os resultados relativos à pesquisa sobre a expressão “Xuxa pedófila” ou qualquer outra que associasse o nome da autora a esta prática criminosa, a relatora da decisão, a Ministra Nancy Andrighi, definiu de maneira clara a controvérsia de que cuida este trabalho: o cotidiano de milhares de pessoas depende atualmente de informações que estão na web, e que dificilmente seriam encontradas sem a utilização das ferramentas de pesquisas oferecidas pelos sites de busca. Por outro lado, esses mesmos buscadores horizontais podem ser usados para a localização de páginas com informações, URLs prejudiciais resultantes da busca com o nome das pessoas. Diante disso, o que fazer? Existiria realmente um direito de ser esquecido, isto é, de ter uma URL resultante de uma pesquisa sobre o nome de uma pessoa desvinculado do índice de pesquisa do buscador horizontal? Há quem afirme que a medida mais apropriada para lidar com esse problema seria ir atrás do terceiro que publicou essa informação originariamente na web. Há também quem defenda que a proteção de um direito de ser esquecido representaria uma ameaça grande demais para a liberdade de expressão e de informação. Diante deste quadro, esta dissertação visa a estabelecer quais podem ser as características e os limites do direito ao esquecimento na era digital, de acordo com o estado atual da legislação brasileira a respeito, confrontando-se tal direito com outros direitos e interesses públicos e privados (especialmente o direito à liberdade de expressão e à informação) e levando em conta as características de funcionamento da própria rede mundial de computadores, em especial das ferramentas de buscas. Tendo em vista a importância dos buscadores horizontais no exercício do acesso à informação e, além disso, as dificuldades relacionadas à retirada de URLs de todos os sítios em que tenham sido publicadas, nossa pesquisa focará no potencial – e nas dificuldades – de se empregar a regulação de tais ferramentas de busca para a proteção eficaz do direito ao esquecimento na era digital.
Resumo:
VANTI, Nadia. Links hipertextuais na comunicação científica: análise webométrica dos sítios acadêmicos latino-americanos em Ciências Sociais. Porto Alegre, 2007. 292 f. Tese (Doutorado em Comunicação e Informação) – Universidade Federal do Rio Grande do Sul. Porto Alegre, 2007.
Resumo:
This study aims to identify, through the application of webometric indicators, which Post-Graduate Courses in Engineering recommended by the Coordination of Improvement of Higher Personnel Education (CAPES) in Brazil stand out in the web space, in relation to the communication process and dissemination of scientific information in the academic environment. For this, we analyzed the structures content of the sites, the use, through the conduct of investigations and searches, the quality of information available, as well as the structure of existent hypertexts in the sites of this universe of search. The tools and methodologies adopted for this study are: search engines (Google, Yahoo), Mapper software (Xenu Link Sleuth) and analysis software and visualization of networks (and Ucinet6 NetDraw). Webometric indicators are also used, such as size of the web sites, visibility, web impact factor, brightness and density of the network. These instruments provide a brief analysis and evaluation for this webometric study. Therefore, from the incursion of the literature used, it appears that there are many advantages of using this type of metric study in the so called Information Society. The obtained results could identify which postgraduate courses in engineering has a better availability of their information on the Web, as well to define which of these courses stands out in relation to the use of their information, which has been outstanding in respect to its impact factor and which offers a greater number of links that serve as a source of information for its users, contributing, in its turn, with the navigability of the same network. In summary, it is asserted that the webometric study presents promising results, which are able to achieve the proposed objectives, as well as identify the factors that contribute significantly to the good visualization of these sites in the network, thus helping the spread of information and scientific communication through the use of the Web.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Models are becoming increasingly important in the software development process. As a consequence, the number of models being used is increasing, and so is the need for efficient mechanisms to search them. Various existing search engines could be used for this purpose, but they lack features to properly search models, mainly because they are strongly focused on text-based search. This paper presents Moogle, a model search engine that uses metamodeling information to create richer search indexes and to allow more complex queries to be performed. The paper also presents the results of an evaluation of Moogle, which showed that the metamodel information improves the accuracy of the search.
Quality evaluation of the available Internet information regarding pain during orthodontic treatment
Resumo:
OBJECTIVE To investigate the quality of the data disseminated via the Internet regarding pain experienced by orthodontic patients. MATERIALS AND METHODS A systematic online search was performed for 'orthodontic pain' and 'braces pain' separately using five search engines. The first 25 results from each search term-engine combination were pooled for analysis. After excluding advertising sites, discussion groups, video feeds, and links to scientific articles, 25 Web pages were evaluated in terms of accuracy, readability, accessibility, usability, and reliability using recommended research methodology; reference textbook material, the Flesch Reading Ease Score; and the LIDA instrument. Author and information details were also recorded. RESULTS Overall, the results indicated a variable quality of the available informational material. Although the readability of the Web sites was generally acceptable, the individual LIDA categories were rated of medium or low quality, with average scores ranging from 16.9% to 86.2%. The orthodontic relevance of the Web sites was not accompanied by the highest assessment results, and vice versa. CONCLUSIONS The quality of the orthodontic pain information cited by Web sources appears to be highly variable. Further structural development of health information technology along with public referral to reliable sources by specialists are recommended.
Resumo:
Specialized search engines such as PubMed, MedScape or Cochrane have increased dramatically the visibility of biomedical scientific results. These web-based tools allow physicians to access scientific papers instantly. However, this decisive improvement had not a proportional impact in clinical practice due to the lack of advanced search methods. Even queries highly specified for a concrete pathology frequently retrieve too many information, with publications related to patients treated by the physician beyond the scope of the results examined. In this work we present a new method to improve scientific article search using patient information. Two pathologies have been used within the project to retrieve relevant literature to patient data and to be integrated with other sources. Promising results suggest the suitability of the approach, highlighting publications dealing with patient features and facilitating literature search to physicians.
Resumo:
Evaluating and measuring the pedagogical quality of Learning Objects is essential for achieving a successful web-based education. On one hand, teachers need some assurance of quality of the teaching resources before making them part of the curriculum. On the other hand, Learning Object Repositories need to include quality information into the ranking metrics used by the search engines in order to save users time when searching. For these reasons, several models such as LORI (Learning Object Review Instrument) have been proposed to evaluate Learning Object quality from a pedagogical perspective. However, no much effort has been put in defining and evaluating quality metrics based on those models. This paper proposes and evaluates a set of pedagogical quality metrics based on LORI. The work exposed in this paper shows that these metrics can be effectively and reliably used to provide quality-based sorting of search results. Besides, it strongly evidences that the evaluation of Learning Objects from a pedagogical perspective can notably enhance Learning Object search if suitable evaluations models and quality metrics are used. An evaluation of the LORI model is also described. Finally, all the presented metrics are compared and a discussion on their weaknesses and strengths is provided.
Resumo:
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Resumo:
A anotação geográfica de documentos consiste na adoção de metadados para a identificação de nomes de locais e a posição de suas ocorrências no texto. Esta informação é útil, por exemplo, para mecanismos de busca. A partir dos topônimos mencionados no texto é possível identificar o contexto espacial em que o assunto do texto está inserido, o que permite agrupar documentos que se refiram a um mesmo contexto, atribuindo ao documento um escopo geográfico. Esta Dissertação de Mestrado apresenta um novo método, batizado de Geofier, para determinação do escopo geográfico de documentos. A novidade apresentada pelo Geofier é a possibilidade da identificação do escopo geográfico de um documento por meio de classificadores de aprendizagem de máquina treinados sem o uso de um gazetteer e sem premissas quanto à língua dos textos analisados. A Wikipédia foi utilizada como fonte de um conjunto de documentos anotados geograficamente para o treinamento de uma hierarquia de Classificadores Naive Bayes e Support Vector Machines (SVMs). Uma comparação de desempenho entre o Geofier e uma reimplementação do sistema Web-a-Where foi realizada em relação à determinação do escopo geográfico dos textos da Wikipédia. A hierarquia do Geofier foi treinada e avaliada de duas formas: usando topônimos do mesmo gazetteer que o Web-a-Where e usando n-gramas extraídos dos documentos de treinamento. Como resultado, o Geofier manteve desempenho superior ao obtido pela reimplementação do Web-a-Where.