837 resultados para Search Engine Indexing
Resumo:
* This work was financially supported by RFBF-04-01-00858.
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. ^ In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.^
Resumo:
The design of interfaces to facilitate user search has become critical for search engines, ecommercesites, and intranets. This study investigated the use of targeted instructional hints to improve search by measuring the quantitative effects of users' performance and satisfaction. The effects of syntactic, semantic and exemplar search hints on user behavior were evaluated in an empirical investigation using naturalistic scenarios. Combining the three search hint components, each with two levels of intensity, in a factorial design generated eight search engine interfaces. Eighty participants participated in the study and each completed six realistic search tasks. Results revealed that the inclusion of search hints improved user effectiveness, efficiency and confidence when using the search interfaces, but with complex interactions that require specific guidelines for search interface designers. These design guidelines will allow search designers to create more effective interfaces for a variety of searchapplications.
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.
Resumo:
Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.
Resumo:
Chemical cross-linking has emerged as a powerful approach for the structural characterization of proteins and protein complexes. However, the correct identification of covalently linked (cross-linked or XL) peptides analyzed by tandem mass spectrometry is still an open challenge. Here we present SIM-XL, a software tool that can analyze data generated through commonly used cross-linkers (e.g., BS3/DSS). Our software introduces a new paradigm for search-space reduction, which ultimately accounts for its increase in speed and sensitivity. Moreover, our search engine is the first to capitalize on reporter ions for selecting tandem mass spectra derived from cross-linked peptides. It also makes available a 2D interaction map and a spectrum-annotation tool unmatched by any of its kind. We show SIM-XL to be more sensitive and faster than a competing tool when analyzing a data set obtained from the human HSP90. The software is freely available for academic use at http://patternlabforproteomics.org/sim-xl. A video demonstrating the tool is available at http://patternlabforproteomics.org/sim-xl/video. SIM-XL is the first tool to support XL data in the mzIdentML format; all data are thus available from the ProteomeXchange consortium (identifier PXD001677).
Resumo:
The study reported here is a classical bottom-up proteomic approach where proteins from wasp venom were extracted and separated by 2-DE; the individual protein spots were proteolytically digested and subsequently identified by using tandem mass spectrometry and database query with the protein search engine MASCOT. Eighty-four venom proteins belonging to 12 different molecular functions were identified. These proteins were classified into three groups; the first is constituted of typical venom proteins: antigens-5, hyaluronidases, phospholipases, heat shock proteins, metalloproteinases, metalloproteinase-desintegrin like proteins, serine proteinases, proteinase inhibitors, vascular endothelial growth factor-related protein, arginine kinases, Sol i-II and -II like proteins, alpha-glucosidase, and superoxide dismutases. The second contained proteins structurally related to the muscles that involves the venom reservoir. The third group, associated with the housekeeping of cells from venom glands, was composed of enzymes, membrane proteins of different types, and transcriptional factors. The composition of P. paulista venom permits us to hypothesize about a general envenoming mechanism based on five actions: (i) diffusion of venom through the tissues and to the blood, (ii) tissue, (iii) hemolysis, (iv) inflammation, and (v) allergy-played by antigen-5, PLA1, hyaluronidase, HSP 60, HSP 90, and arginine kinases.
Resumo:
Tecnologias da Web Semântica como RDF, OWL e SPARQL sofreram nos últimos anos um forte crescimento e aceitação. Projectos como a DBPedia e Open Street Map começam a evidenciar o verdadeiro potencial da Linked Open Data. No entanto os motores de pesquisa semânticos ainda estão atrasados neste crescendo de tecnologias semânticas. As soluções disponíveis baseiam-se mais em recursos de processamento de linguagem natural. Ferramentas poderosas da Web Semântica como ontologias, motores de inferência e linguagens de pesquisa semântica não são ainda comuns. Adicionalmente a esta realidade, existem certas dificuldades na implementação de um Motor de Pesquisa Semântico. Conforme demonstrado nesta dissertação, é necessária uma arquitectura federada de forma a aproveitar todo o potencial da Linked Open Data. No entanto um sistema federado nesse ambiente apresenta problemas de performance que devem ser resolvidos através de cooperação entre fontes de dados. O standard actual de linguagem de pesquisa na Web Semântica, o SPARQL, não oferece um mecanismo para cooperação entre fontes de dados. Esta dissertação propõe uma arquitectura federada que contém mecanismos que permitem cooperação entre fontes de dados. Aborda o problema da performance propondo um índice gerido de forma centralizada assim como mapeamentos entre os modelos de dados de cada fonte de dados. A arquitectura proposta é modular, permitindo um crescimento de repositórios e funcionalidades simples e de forma descentralizada, à semelhança da Linked Open Data e da própria World Wide Web. Esta arquitectura trabalha com pesquisas por termos em linguagem natural e também com inquéritos formais em linguagem SPARQL. No entanto os repositórios considerados contêm apenas dados em formato RDF. Esta dissertação baseia-se em múltiplas ontologias partilhadas e interligadas.
Resumo:
Several Web-based on-line judges or on-line programming trainers have been developed in order to allow students to train their programming skills. However, their pedagogical functionalities in the learning of programming have not been clearly defined. EduJudge is a project which aims to integrate the “UVA On-line Judge”, an existing on-line programming trainer with an important number of problems and users, into an effective educational environment consisting of the e-learning platform Moodle and the competitive learning tool QUESTOURnament. The result is the EduJudge system which allows teachers to apply different pedagogical approaches using a proven e-learning platform, makes problems easy to search through an effective search engine, and provides an automated evaluation of the solutions submitted to these problems. The final objective is to provide new learning strategies to motivate students and present programming as an easy and attractive challenge. EduJudge has been tried and tested in three algorithms and programming courses in three different Engineering degrees. The students’ motivation and satisfaction levels were analysed alongside the effects of the EduJudge system on students’ academic outcomes. Results indicate that both students and teachers found that among other multiple benefits the EduJudge system facilitates the learning process. Furthermore, the experi- ment also showed an improvement in students’ academic outcomes. It must be noted that the students’ level of satisfaction did not depend on their computer skills or their gender.
Resumo:
OBJECTIVE To review studies on the readability of package leaflets of medicinal products for human use.METHODS We conducted a systematic literature review between 2008 and 2013 using the keywords “Readability and Package Leaflet” and “Readability and Package Insert” in the academic search engine Biblioteca do Conhecimento Online,comprising different bibliographic resources/databases. The preferred reporting items for systematic reviews and meta-analyses criteria were applied to prepare the draft of the report. Quantitative and qualitative original studies were included. Opinion or review studies not written in English, Portuguese, Italian, French, or Spanish were excluded.RESULTS We identified 202 studies, of which 180 were excluded and 22 were enrolled [two enrolling healthcare professionals, 10 enrolling other type of participants (including patients), three focused on adverse reactions, and 7 descriptive studies]. The package leaflets presented various readability problems, such as complex and difficult to understand texts, small font size, or few illustrations. The main methods to assess the readability of the package leaflet were usability tests or legibility formulae. Limitations with these methods included reduced number of participants; lack of readability formulas specifically validated for specific languages (e.g., Portuguese); and absence of an assessment on patients literacy, health knowledge, cognitive skills, levels of satisfaction, and opinions.CONCLUSIONS Overall, the package leaflets presented various readability problems. In this review, some methodological limitations were identified, including the participation of a limited number of patients and healthcare professionals, the absence of prior assessments of participant literacy, humor or sense of satisfaction, or the predominance of studies not based on role-plays about the use of medicines. These limitations should be avoided in future studies and be considered when interpreting the results.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Trabalho de Projeto apresentado ao Instituto Superior de Contabilidade e Administração do Porto para a obtenção do grau de Mestre em Marketing Digital, sob orientação do Mestre António da Silva Vieira
Resumo:
Os mercados eletrónicos são sistemas de informação (SI) utilizados por várias entidades organizacionais distintas dentro de um ou vários níveis em termos das cadeias de valor económico (Journal Electronic Markets, 2012). Segundo (Bakos, 1998) têm um papel central na economia, facilitando a troca de informações, produtos, serviços e pagamentos. Durante o processo, é criado valor económico para o comprador, fornecedor, intermediários do mercado e para a sociedade em geral. O comércio eletrónico é o ato de realizar um qualquer tipo de negócio através de uma via eletrónica e é constituído por modelos diversificados onde se destacam o Business to Business (B2B) e o Business to Consumer (B2C). O modelo B2B possui uma quota de 90 % de todo o comércio, sendo esse sucesso intrinsecamente relacionado em grande parte às vantagens que as suas plataformas oferecem às empresas que inseridas nelas (Anacom, 2004). O âmbito principal deste trabalho é o estudo dos B2B, tendo sido para tal definidos os seguintes objetivos: Realizar a identificação do estado atual bem como a evolução dos mercados B2B em Portugal; Caraterização das funcionalidades das plataformas que atuam no tecido nacional e por fim fazer a criação de um conjunto de orientações de apoio a empresas que desejam fazer a inserção nestes mercados. Para serem alcançados os objetivos propostos na dissertação, foram inquiridas várias organizações ao mesmo tempo que foi realizada uma pesquisa de temas e artigos relacionados com os mercados eletrónicos e plataformas B2B, recorrendo a sites como a B-On.pt, e utilizando o motor de busca Google. Este relatório apresenta a seguinte estrutura: No capítulo 1 é apresentada a introdução teórica das matérias apresentadas nos capítulos seguintes; O capítulo 2 é centrado no comércio eletrónico, definições mais comuns e são demonstrados os modelos mais preponderantes do comércio eletrónico; No capítulo 3 são expostas todas as vertentes do modelo B2B, as principais plataformas B2B a atuar no tecido nacional, as suas funcionalidades e modo de operação bem como uma apresentação das principais plataformas a nível mundial; No capítulo 4 são apresentados um conjunto de tópicos de auxílio a empresas que desejem fazer a sua inserção neste tipo de mercados; Por último são descritas as principais conclusões retiradas na realização da dissertação. Em suma, este trabalho reúne um conjunto de dados e orientações úteis, decorrentes do estudo realizado de auxílio a empresas a aderirem aos mercados eletrónicos, visto ser uma abordagem promissora para as organizações.
Resumo:
Grande parte do tráfego online tem origem em páginas de resultados de motores de de pesquisa. Estes constituem hoje uma ferramenta fundamental de que os turistas se socorrem para pesquisar e filtrar a informação necessária ao planeamento das suas viagens, sendo, por isso, bastante tidos em conta pelas entidades ligadas ao turismo no momento da definição das suas estratégias de marketing. No presente documento é descrita a investigação feita em torno do modo de funcionamento do motor de pesquisa Google e das métricas que utiliza para avaliação de websites e páginas web. Desta investigação resultou a implementação de um website de conteúdos afetos ao mercado de turismo e viagens em Portugal, focado no mercado do turismo externo – All About Portugal. A implementação do website pretende provar, sustentando-se em orientações da área do SEO, que a propagação de conteúdos baseada unicamente nos motores de pesquisa é viável, confirmando, deste modo, a sua importância. Os dados de utilização desse mesmo website introduzem novos elementos que poderão servir de base a novos estudos.