958 resultados para web users
Resumo:
Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.
Resumo:
This paper explores behavioral patterns of web users on an online magazine web-site. The goal of the study is to first find and visualize user paths within the data generated during collection, and to identify some generic behavioral typologies of user behavior. To form a theoretical foundation for processing data and identifying behavioral ar-chetypes, the study relies on established consumer behavior literature to propose typologies of behavior. For data processing, the study utilizes methodologies of ap-plied cluster analysis and sequential path analysis. Utilizing a dataset of click stream data generated from the real-life clicks of 250 ran-domly selected website visitors over a period of six weeks. Based on the data collect-ed, an exploratory method is followed in order to find and visualize generally occur-ring paths of users on the website. Six distinct behavioral typologies were recog-nized, with the dominant user consuming mainly blog content, as opposed to editori-al content. Most importantly, it was observed that approximately 80% of clicks were of the blog content category, meaning that the majority of web traffic occurring in the site takes place in content other than the desired editorial content pages. The out-come of the study is a set of managerial recommendations for each identified behavioral archetype.
Resumo:
La protection des renseignements personnels est au cœur des préoccupations de tous les acteurs du Web, commerçants ou internautes. Si pour les uns trop de règles en la matière pourraient freiner le développement du commerce électronique, pour les autres un encadrement des pratiques est essentiel à la protection de leur vie privée. Même si les motivations de chacun sont divergentes, le règlement de cette question apparaît comme une étape essentielle dans le développement du réseau. Le Platform for Privacy Preference (P3P) propose de contribuer à ce règlement par un protocole technique permettant la négociation automatique, entre l’ordinateur de l’internaute et celui du site qu’il visite, d’une entente qui encadrera les échanges de renseignements. Son application pose de nombreuses questions, dont celle de sa capacité à apporter une solution acceptable à tous et surtout, celle du respect des lois existantes. La longue et difficile élaboration du protocole, ses dilutions successives et sa mise en vigueur partielle témoignent de la difficulté de la tâche à accomplir et des résistances qu’il rencontre. La première phase du projet se limite ainsi à l’encodage des politiques de vie privée des sites et à leur traduction en termes accessibles par les systèmes des usagers. Dans une deuxième phase, P3P devrait prendre en charge la négociation et la conclusion d’ententes devant lier juridiquement les parties. Cette tâche s’avère plus ardue, tant sous l’angle juridique que sous celui de son adaptation aux us et coutumes du Web. La consolidation des fonctions mises en place dans la première version apparaît fournir une solution moins risquée et plus profitable en écartant la possible conclusion d’ententes incertaines fondées sur une technique encore imparfaite. Mieux éclairer le consentement des internautes à la transmission de leurs données personnelles par la normalisation des politiques de vie privée pourrait être en effet une solution plus simple et efficace à court terme.
Resumo:
From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users today.
Resumo:
According to a new report (http://tinyurl.com/2g6ghps), if you are on the Web at all you’re not safe from hackers, phishers, and spammers (oh my!). The Norton Cybercrime Report: The Human Impact (http://cybercrime.newslinevine.com/) of 7,000 Web users tells us that 65% of all users globally, and 73% of U. S. users, have been hacked in some sort of cybercrime. Globally, the U. S. ranks very high but in this case we’re not first in line. China wins Number One with 83% of its users web-abused in some manner. These are figures to give one pause.
Resumo:
Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.
Resumo:
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of click-stream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.
Resumo:
Collaborative recommendation is one of widely used recommendation systems, which recommend items to visitor on a basis of referring other's preference that is similar to current user. User profiling technique upon Web transaction data is able to capture such informative knowledge of user task or interest. With the discovered usage pattern information, it is likely to recommend Web users more preferred content or customize the Web presentation to visitors via collaborative recommendation. In addition, it is helpful to identify the underlying relationships among Web users, items as well as latent tasks during Web mining period. In this paper, we propose a Web recommendation framework based on user profiling technique. In this approach, we employ Probabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence activities and develop a modified k-means clustering algorithm to build user profiles as the representatives of usage patterns. Moreover, the hidden task model is derived by characterizing the meaningful latent factor space. With the discovered user profiles, we then choose the most matched profile, which possesses the closely similar preference to current user and make collaborative recommendation based on the corresponding page weights appeared in the selected user profile. The preliminary experimental results performed on real world data sets show that the proposed approach is capable of making recommendation accurately and efficiently.
Resumo:
Because some Web users will be able to design a template to visualize information from scratch, while other users need to automatically visualize information by changing some parameters, providing different levels of customization of the information is a desirable goal. Our system allows the automatic generation of visualizations given the semantics of the data, and the static or pre-specified visualization by creating an interface language. We address information visualization taking into consideration the Web, where the presentation of the retrieved information is a challenge. ^ We provide a model to narrow the gap between the user's way of expressing queries and database manipulation languages (SQL) without changing the system itself thus improving the query specification process. We develop a Web interface model that is integrated with the HTML language to create a powerful language that facilitates the construction of Web-based database reports. ^ As opposed to other papers, this model offers a new way of exploring databases focusing on providing Web connectivity to databases with minimal or no result buffering, formatting, or extra programming. We describe how to easily connect the database to the Web. In addition, we offer an enhanced way on viewing and exploring the contents of a database, allowing users to customize their views depending on the contents and the structure of the data. Current database front-ends typically attempt to display the database objects in a flat view making it difficult for users to grasp the contents and the structure of their result. Our model narrows the gap between databases and the Web. ^ The overall objective of this research is to construct a model that accesses different databases easily across the net and generates SQL, forms, and reports across all platforms without requiring the developer to code a complex application. This increases the speed of development. In addition, using only the Web browsers, the end-user can retrieve data from databases remotely to make necessary modifications and manipulations of data using the Web formatted forms and reports, independent of the platform, without having to open different applications, or learn to use anything but their Web browser. We introduce a strategic method to generate and construct SQL queries, enabling inexperienced users that are not well exposed to the SQL world to build syntactically and semantically a valid SQL query and to understand the retrieved data. The generated SQL query will be validated against the database schema to ensure harmless and efficient SQL execution. (Abstract shortened by UMI.)^
Resumo:
Gracias al crecimiento, expansión y popularización de la World Wide Web, su desarrollo tecnológico tiene una creciente importancia en la sociedad. La simbiosis que protagonizan estos dos entornos ha propiciado una mayor influencia social en las innovaciones de la plataforma y un enfoque mucho más práctico. Nuestro objetivo en este artículo es describir, caracterizar y analizar el surgimiento y difusión del nuevo estándar de hipertexto que rige la Web; HTML5. Al mismo tiempo exploramos este proceso a la luz de varias teorías que aúnan tecnología y sociedad. Dedicamos especial atención a los usuarios de la World Wide Web y al uso genérico que realizan de los Medios Sociales o "Social Media". Sugerimos que el desarrollo de los estándares web está influenciado por el uso cotidiano de este nuevo tipo de tecnologías y aplicaciones.
Resumo:
Les biblioteques llicencien la informació electrònica en nom dels seus clients i administradors. L'accés a la majoria dels recursos electrònics disponibles a les biblioteques es proporciona a través del world wide web per mitjà de les interfícies dels navegadors. La ràpida evolució del world wide web, com el mecanisme més gran de distribució d'informació, ha fet del web una rica font d'informació per als proveïdors sobre els interessos, comportaments i hàbits dels usuaris. De vegades es dóna el cas de que aquest tipus d'informació és recollida durant la utilització d'un producte web sense el coneixement o permís de l'individu que està utilitzant el producte.
Resumo:
"Thèse en vue de l'obtention du grade de docteur en droit de l'Université Panthéon-Assas (Paris II) et de docteur en droit de la faculté de droit de l'Université de Montréal en droit privé"
Resumo:
Es una plataforma de doble lado que tiene como objetivo fundamental acercar a los buscadores de talento de una forma eficaz y precisa a los actores. Esta propuesta empresarial quiere mejorar las condiciones laborales de estos últimos mediante la tecnología disponible y un know know digital. Así mismo, generar el primer banco consolidado de talentos del país y contribuir a la solidificación de un segmento cultural.
Resumo:
The aim of this study is to analyze and study French pupils spelling and more specificallyspelling errors. The study is based on the blogs of 78 young french students where spellingerrors are analyzed based on a typology by Nina Catach (1980). This typology explains whattype of spelling errors that can occur in the French language. La maitrise de l’orthographelexicale du français et de l’espagnol is a study made by Sony Mayard (2007) which explainsthe complexity of the French spelling.The study shows that first of all, the spelling of pupils is not something that can concretely bejudged or cataloged. You can never explain exactly why French pupils have an way ofspelling, since spelling is individual and takes several key factors in to consideration.However, this study presents one approach of how spelling errors can be differentiated andthe possible reasons why they are judged as incorrect.
Resumo:
Each year search engines like Google, Bing and Yahoo, complete trillions of search queries online. Students are especially dependent on these search tools because of their popularity, convenience and accessibility. However, what students are unaware of, by choice or naiveté is the amount of personal information that is collected during each search session, how that data is used and who is interested in their online behavior profile. Privacy policies are frequently updated in favor of the search companies but are lengthy and often are perused briefly or ignored entirely with little thought about how personal web habits are being exploited for analytics and marketing. As an Information Literacy instructor, and a member of the Electronic Frontier Foundation, I believe in the importance of educating college students and web users in general that they have a right to privacy online. Class discussions on the topic of web privacy have yielded an interesting perspective on internet search usage. Students are unaware of how their online behavior is recorded and have consistently expressed their hesitancy to use tools that disguise or delete their IP address because of the stigma that it may imply they have something to hide or are engaging in illegal activity. Additionally, students fear they will have to surrender the convenience of uber connectivity in their applications to maintain their privacy. The purpose of this lightning presentation is to provide educators with a lesson plan highlighting and simplifying the privacy terms for the three major search engines, Google, Bing and Yahoo. This presentation focuses on what data these search engines collect about users, how that data is used and alternative search solutions, like DuckDuckGo, for increased privacy. Students will directly benefit from this lesson because informed internet users can protect their data, feel safer online and become more effective web searchers.