884 resultados para web content
Resumo:
Web servers are accessible by anyone who can access the Internet. Although this universal accessibility is attractive for all kinds of Web-based applications, Web servers are exposed to attackers who may want to alter their contents. Alterations range from humorous additions or changes, which are typically easy to spot, to more sinister tampering, such as providing false or damaging information.
Resumo:
Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.
Resumo:
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative XPath expressions, although not widely used, should be used in preference to absolute XPath expressions in extracting content from human-created Web documents. Evaluation of robustness covers four thousand queries executed on several hundred webpages. We show that in referencing parts of real world dynamic HTML documents, relative XPath expressions are on average significantly more robust than absolute XPath ones.
Resumo:
Plain Text - ASCII, Unicode, UTF-8 Content Formats - XML-based formats (RSS, MathML, SVG, Office) + PDF Text based data formats: CSV, JSON
Resumo:
This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.
Resumo:
Diversity-based designing, or the goal of ensuring that web-based information is accessible to as many diverse users as possible, has received growing international acceptance in recent years, with many countries introducing legislation to enforce it. This paper analyses web content accessibility levels in Spanish education portals according to the international guidelines established by the World Wide Web Consortium (W3C) and the Web Accessibility Initiative (WAI). Additionally, it suggests the calculation of an inaccessibility rate as a tool for measuring the degree of non-compliance with WAI Guidelines 2.0 as well as illustrating the significant gap that separates people with disabilities from digital education environments (with a 7.77% average). A total of twenty-one educational web portals with two different web depth levels (42 sampling units) were assessed for this purpose using the automated analysis tool Web Accessibility Test 2.0 (TAW, for its initials in Spanish). The present study reveals a general trend towards non-compliance with the technical accessibility recommendations issued by the W3C-WAI group (97.62% of the websites examined present mistakes in Level A conformance). Furthermore, despite the increasingly high number of legal and regulatory measures about accessibility, their practical application still remains unsatisfactory. A greater level of involvement must be assumed in order to raise awareness and enhance training efforts towards accessibility in the context of collective Information and Communication Technologies (ICTs), since this represents not only a necessity but also an ethical, social, political and legal commitment to be assumed by society.
Resumo:
This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.
Resumo:
Search engines exploit the Web's hyperlink structure to help infer information content. The new phenomenon of personal Web logs, or 'blogs', encourage more extensive annotation of Web content. If their resulting link structures bias the Web crawling applications that search engines depend upon, there are implications for another form of annotation rapidly on the rise, the Semantic Web. We conducted a Web crawl of 160 000 pages in which the link structure of the Web is compared with that of several thousand blogs. Results show that the two link structures are significantly different. We analyse the differences and infer the likely effect upon the performance of existing and future Web agents. The Semantic Web offers new opportunities to navigate the Web, but Web agents should be designed to take advantage of the emerging link structures, or their effectiveness will diminish.
Resumo:
This work involves the organization and content perspectives on Enterprise Content Management (ECM) framework. The case study at the Federal University of Rio Grande do Norte was based on ECM model to analyse the information management provided by the three main administrative systems: The Integrated Management of Academic Activities (SIGAA), Integrated System of Inheritance, and Contracts Administration (SIPAC) and the Integrated System for Administration and Human Resources (SIGRH). A case study protocol was designed to provide greater reliability to research process. Four propositions were examined in order to reach the specific objectives of identification and evaluation of ECM components from UFRN perspective. The preliminary phase provided the guidelines for the data collection. In total, 75 individuals were interviewed. Interviews with four managers directly involved on systems design were recorded (average duration of 90 minutes). The 70 remaining individuals were approached in random way in UFRN s units, including teachers, administrative-technical employees and students. The results showed the presence of many ECM elements in the management of UFRN administrative information. The technological component with higher presence was "management of web content / collaboration". But initiatives of other components (e.g. email and document management) were found and are in continuous improvement. The assessment made use of eQual 4.0 to examine the effectiveness of applications under three factors: usability, quality of information and offered service. In general, the quality offered by the systems was very good and walk side by side with the obtained benefits of ECM strategy adoption in the context of the whole institution
Resumo:
In the context of Software Engineering, web accessibility is gaining more room, establishing itself as an important quality attribute. This fact is due to initiatives of institutions such as the W3C (World Wide Web Consortium) and the introduction of norms and laws such as Section 508 that underlie the importance of developing accessible Web sites and applications. Despite these improvements, the lack of web accessibility is still a persistent problem, and could be related to the moment or phase in which this requirement is solved within the development process. From the moment when Web accessibility is generally regarded as a programming problem or treated when the application is already developed entirely. Thus, consider accessibility already during activities of analysis and requirements specification shows itself a strategy to facilitate project progress, avoiding rework in advanced phases of software development because of possible errors, or omissions in the elicitation. The objective of this research is to develop a method and a tool to support requirements elicitation of web accessibility. The strategy for the requirements elicitation of this method is grounded by the Goal-Oriented approach NFR Framework and the use of catalogs NFRs, created based on the guidelines contained in WCAG 2.0 (Web Content Accessibility Guideline) proposed by W3C
Resumo:
Web content hosting, in which a Web server stores and provides Web access to documents for different customers, is becoming increasingly common. For example, a web server can host webpages for several different companies and individuals. Traditionally, Web Service Providers (WSPs) provide all customers with the same level of performance (best-effort service). Most service differentiation has been in the pricing structure (individual vs. business rates) or the connectivity type (dial-up access vs. leased line, etc.). This report presents DiffServer, a program that implements two simple, server-side, application-level mechanisms (server-centric and client-centric) to provide different levels of web service. The results of the experiments show that there is not much overhead due to the addition of this additional layer of abstraction between the client and the Apache web server under light load conditions. Also, the average waiting time for high priority requests decreases significantly after they are assigned priorities as compared to a FIFO approach.
Resumo:
Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.
Resumo:
In his in uential article about the evolution of the Web, Berners-Lee [1] envisions a Semantic Web in which humans and computers alike are capable of understanding and processing information. This vision is yet to materialize. The main obstacle for the Semantic Web vision is that in today's Web meaning is rooted most often not in formal semantics, but in natural language and, in the sense of semiology, emerges not before interpretation and processing. Yet, an automated form of interpretation and processing can be tackled by precisiating raw natural language. To do that, Web agents extract fuzzy grassroots ontologies through induction from existing Web content. Inductive fuzzy grassroots ontologies thus constitute organically evolved knowledge bases that resemble automated gradual thesauri, which allow precisiating natural language [2]. The Web agents' underlying dynamic, self-organizing, and best-effort induction, enable a sub-syntactical bottom up learning of semiotic associations. Thus, knowledge is induced from the users' natural use of language in mutual Web interactions, and stored in a gradual, thesauri-like lexical-world knowledge database as a top-level ontology, eventually allowing a form of computing with words [3]. Since when computing with words the objects of computation are words, phrases and propositions drawn from natural languages, it proves to be a practical notion to yield emergent semantics for the Semantic Web. In the end, an improved understanding by computers on the one hand should upgrade human- computer interaction on the Web, and, on the other hand allow an initial version of human- intelligence amplification through the Web.