891 resultados para 080704 Information Retrieval and Web Search
Resumo:
Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.
Resumo:
Background As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs. Methods We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease. Results Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians. Conclusions Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. ^ In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.^
Resumo:
In order to run a successful business, today’s manager needs to combine business skills with an understanding of information systems and the opportunities and benefits that they bring to an organisation. Starting from basic concepts, this book provides a comprehensive and accessible guide to: •understanding the technology of business information systems; •choosing the right information system for an organisation; •developing and managing an efficient business information system; •employing information systems strategically to achieve organisational goals. Taking a problem-solving approach, Business Information Systems looks at information systems theory within the context of the most recent business and technological advances. This thoroughly revised new edition has updated and expanded coverage of contemporary key topics such as: •Web 2.0 •enterprise systems •implementation and design of IS strategy •outsourcing
Resumo:
Aims: 1. To investigate the reliability and readability of information on the Internet on adult orthodontics. 2. To evaluate the profile and treatment of adults by specialist orthodontists in the Republic of Ireland (ROI). Materials and methods: 1. An Internet search was conducted in May 2015 using three search engines (Google, Yahoo and Bing), with two search terms (“adult orthodontics” and “adult braces”). The first 50 websites from each engine were screened and exclusion criteria applied. Included websites were then assessed for reliability using the JAMA benchmarks, the DISCERN and LIDA tools and the presence of the HON seal. Readability was assessed using the FRES. 2. A pilot-tested questionnaire about adult orthodontics was distributed to 122 eligible specialist orthodontists in the ROI. Questions addressed general and treatment information about adult orthodontic patients, methods of information provision and respondent demographics. Results: 1. Thirteen websites met the inclusion criteria. Three websites contained all JAMA benchmarks and one displayed the HON Seal. The mean overall score for DISCERN was 3.9/5 and the mean total LIDA score was 115/120. The average FRES score was 63.1. 2. The questionnaire yielded a response rate of 83%. The typical demographic profile of adult orthodontic patients was professional females between 25-35 years. The most common incisor relationship and skeletal base was Class II, division 1 (51%) and Class II (61%) respectively. Aesthetic upper brackets and metal lower brackets were the most frequently used appliances. Only 30% of orthodontists advise their adult patients to find extra information on the Internet. Conclusions: 1. The reliability and readability of information on the Internet on adult orthodontics is of moderate quality. 2. The provision of adult orthodontic treatment is common among specialist orthodontists in the Republic of Ireland.
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.
Resumo:
Multimedia objects, especially images and figures, are essential for the visualization and interpretation of research findings. The distribution and reuse of these scientific objects is significantly improved under open access conditions, for instance in Wikipedia articles, in research literature, as well as in education and knowledge dissemination, where licensing of images often represents a serious barrier. Whereas scientific publications are retrievable through library portals or other online search services due to standardized indices there is no targeted retrieval and access to the accompanying images and figures yet. Consequently there is a great demand to develop standardized indexing methods for these multimedia open access objects in order to improve the accessibility to this material. With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try - frustratingly so, even if there is in fact such an image included in some open access article.
Resumo:
Individuals living in highly networked societies publish a large amount of personal, and potentially sensitive, information online. Web investigators can exploit such information for a variety of purposes, such as in background vetting and fraud detection. However, such investigations require a large number of expensive man hours and human effort. This paper describes InfoScout, a search tool which is intended to reduce the time it takes to identify and gather subject centric information on the Web. InfoScout collects relevance feedback information from the investigator in order to rerank search results, allowing the intended information to be discovered more quickly. Users may still direct their search as they see fit, issuing ad-hoc queries and filtering existing results by keywords. Design choices are informed by prior work and industry collaboration.
Resumo:
Problem This dissertation presents a literature-based framework for communication in science (with the elements partners, purposes, message, and channel), which it then applies in and amends through an empirical study of how geoscientists use two social computing technologies (SCTs), blogging and Twitter (both general use and tweeting from conferences). How are these technologies used and what value do scientists derive from them? Method The empirical part used a two-pronged qualitative study, using (1) purposive samples of ~400 blog posts and ~1000 tweets and (2) a purposive sample of 8 geoscientist interviews. Blog posts, tweets, and interviews were coded using the framework, adding new codes as needed. The results were aggregated into 8 geoscientist case studies, and general patterns were derived through cross-case analysis. Results A detailed picture of how geoscientists use blogs and twitter emerged, including a number of new functions not served by traditional channels. Some highlights: Geoscientists use SCTs for communication among themselves as well as with the public. Blogs serve persuasion and personal knowledge management; Twitter often amplifies the signal of traditional communications such as journal articles. Blogs include tutorials for peers, reviews of basic science concepts, and book reviews. Twitter includes links to readings, requests for assistance, and discussions of politics and religion. Twitter at conferences provides live coverage of sessions. Conclusions Both blogs and Twitter are routine parts of scientists' communication toolbox, blogs for in-depth, well-prepared essays, Twitter for faster and broader interactions. Both have important roles in supporting community building, mentoring, and learning and teaching. The Framework of Communication in Science was a useful tool in studying these two SCTs in this domain. The results should encourage science administrators to facilitate SCT use of scientists in their organization and information providers to search SCT documents as an important source of information.
Resumo:
This thesis attempts to provide deeper historical and theoretical grounding for sense-making, thereby illustrating its applicability to practical information seeking research. In Chapter One I trace the philosophical origins of Brenda Dervin’s theory known as “sense making,” reaching beyond current scholarship that locates the origins of sense-making in twentieth-century Phenomenology and Communication theory and find its rich ontological, epistemological, and etymological heritage that dates back to the Pre-Socratics. After exploring sense-making’s Greek roots, I examine sense-making’s philosophical undercurrents found in Hegel’s Phenomenology of Spirit (1807), where he also returns to the simplicity of the Greeks for his concept of sense. With Chapter Two I explore sense-making methodology and find, in light of the Greek and Hegelian dialectic, a dialogical bridge connecting sense-making’s theory with pragmatic uses. This bridge between Dervin’s situation and use occupies a distinct position in sense-making theory. Moreover, building upon Brenda Dervin’s model of sense-making, I use her metaphors of gap and bridge analogy to discuss the dialectic and dialogic components of sense making. The purpose of Chapter Three is pragmatic – to gain insight into the online information-seeking needs, experiences, and motivation of first-degree relatives (FDRs) of breast cancer survivors through the lens of sense-making. This research analyses four questions: 1) information-seeking behavior among FDRs of cancer survivors compared to survivors and to undiagnosed, non-related online cancer information seekers in the general population, 2) types of and places where information is sought, 3) barriers or gaps and satisfaction rates FDRs face in their cancer information quest, and 4) types and degrees of cancer information and resources FDRs want and use in their information search for themselves and other family members. An online survey instrument designed to investigate these questions was developed and pilot tested. Via an email communication, the Susan Love Breast Cancer Research Foundation distributed 322,000 invitations to its membership to complete the survey, and from March 24th to April 5th 10,692 women agreed to take the survey with 8,804 volunteers actually completing survey responses. Of the 8,804 surveys, 95% of FDRs have searched for cancer information online, and 84% of FDRs use the Internet as a sense-making tool for additional information they have received from doctors or nurses. FDRs report needing much more information than either survivors or family/friends in ten out of fifteen categories related to breast and ovarian cancer. When searching for cancer information online, FDRs also rank highest in several of sense-making’s emotional levels: uncertainty, confusion, frustration, doubt, and disappointment than do either survivors or friends and family. The sense-making process has existed in theory and praxis since the early Greeks. In applying sense–making’s theory to a contemporary problem, the survey reveals unaddressed situations and gaps of FDRs’ information search process. FDRs are a highly motivated group of online information seekers whose needs are largely unaddressed as a result of gaps in available online information targeted to address their specific needs. Since FDRs represent a quarter of the population, further research addressing their specific online information needs and experiences is necessary.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
I investigate the effects of information frictions in price setting decisions. I show that firms' output prices and wages are less sensitive to aggregate economic conditions when firms and workers cannot perfectly understand (or know) the aggregate state of the economy. Prices and wages respond with a lag to aggregate innovations because agents learn slowly about those changes, and this delayed adjustment in prices makes output and unemployment more sensitive to aggregate shocks. In the first chapter of this dissertation, I show that workers' noisy information about the state of the economy help us to explain why real wages are sluggish. In the context of a search and matching model, wages do not immediately respond to a positive aggregate shock because workers do not (yet) have enough information to demand higher wages. This increases firms' incentives to post more vacancies, and it makes unemployment volatile and sensitive to aggregate shocks. This mechanism is robust to two major criticisms of existing theories of sluggish wages and volatile unemployment: the flexibility of wages for new hires and the cyclicality of the opportunity cost of employment. Calibrated to U.S. data, the model explains 60% of the overall unemployment volatility. Consistent with empirical evidence, the response of unemployment to TFP shocks predicted by my model is large, hump-shaped, and peaks one year after the TFP shock, while the response of the aggregate wage is weak and delayed, peaking after two years. In the second chapter of this dissertation, I study the role of information frictions and inventories in firms' price setting decisions in the context of a monetary model. In this model, intermediate goods firms accumulate output inventories, observe aggregate variables with one period lag, and observe their nominal input prices and demand at all times. Firms face idiosyncratic shocks and cannot perfectly infer the state of nature. After a contractionary nominal shock, nominal input prices go down, and firms accumulate inventories because they perceive some positive probability that the nominal price decline is due to a good productivity shock. This prevents firms' prices from decreasing and makes current profits, households' income, and aggregate demand go down. According to my model simulations, a 1% decrease in the money growth rate causes output to decline 0.17% in the first quarter and 0.38% in the second followed by a slow recovery to the steady state. Contractionary nominal shocks also have significant effects on total investment, which remains 1% below the steady state for the first 6 quarters.
Resumo:
Today's society called information society, grows rapidly and undergoes changes in the sources of information under the Information Technology and Communication ("tics"), in this situation it is necessary to develop tools or reference sources that allow the (to) user (a) the accessibility and use of information. It systematized information on the Meritorious Citizen of the Fatherland and Honor that is Costa Rica, since 1847 to 2008, due to its contribution to culture, science, recreation, among others. The overall objective of this research was to make a work print and digital reference, which will collect each of the biographies and works written by (as) Benefactors (as) of the country and citizens (as) of Honor and to facilitate access to information and strengthen outreach conducted by the Library "Victor Manuel Sanabria Martínez" of the Legislature, through its publications, exhibitions and related activities, to publicize its documentary. The variables used for this investigation were:-sources (primary and secondary), Organization of information - tools in various documentation centers and libraries. This was carried out a questionnaire, which was structured in the Excel program, aimed at (as) directors (as) or officers (as) in different libraries and documentation centers, and visits to selected sites for search and selection information. It is important to spread this final graduation in different public and school libraries in the country, since history and culture rescues national, who gave identity to the Costa Rican people. The systematization made by a thematic CD-ROM, provide accessibility to all (as) the (as) citizens (as) who access the Internet through the website of the Legislative Assembly Library and other state institutions wishing through a hyperlink on your "web", to refer the same.
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.