937 resultados para web content


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web servers are accessible by anyone who can access the Internet. Although this universal accessibility is attractive for all kinds of Web-based applications, Web servers are exposed to attackers who may want to alter their contents. Alterations range from humorous additions or changes, which are typically easy to spot, to more sinister tampering, such as providing false or damaging information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative XPath expressions, although not widely used, should be used in preference to absolute XPath expressions in extracting content from human-created Web documents. Evaluation of robustness covers four thousand queries executed on several hundred webpages. We show that in referencing parts of real world dynamic HTML documents, relative XPath expressions are on average significantly more robust than absolute XPath ones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Plain Text - ASCII, Unicode, UTF-8 Content Formats - XML-based formats (RSS, MathML, SVG, Office) + PDF Text based data formats: CSV, JSON

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an approach for assisting low-literacy readers in accessing Web online information. The oEducational FACILITAo tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that oEducational FACILITAo improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a real application of Web-content mining using an incremental FP-Growth approach. We firstly restructure the semi-structured data retrieved from the web pages of Chinese car market to fit into the local database, and then employ an incremental algorithm to discover the association rules for the identification of car preference. To find more general regularities, a method of attribute-oriented induction is also utilized to find customer’s consumption preferences. Experimental results show some interesting consumption preference patterns that may be beneficial for the government in making policy to encourage and guide car consumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diversity-based designing, or the goal of ensuring that web-based information is accessible to as many diverse users as possible, has received growing international acceptance in recent years, with many countries introducing legislation to enforce it. This paper analyses web content accessibility levels in Spanish education portals according to the international guidelines established by the World Wide Web Consortium (W3C) and the Web Accessibility Initiative (WAI). Additionally, it suggests the calculation of an inaccessibility rate as a tool for measuring the degree of non-compliance with WAI Guidelines 2.0 as well as illustrating the significant gap that separates people with disabilities from digital education environments (with a 7.77% average). A total of twenty-one educational web portals with two different web depth levels (42 sampling units) were assessed for this purpose using the automated analysis tool Web Accessibility Test 2.0 (TAW, for its initials in Spanish). The present study reveals a general trend towards non-compliance with the technical accessibility recommendations issued by the W3C-WAI group (97.62% of the websites examined present mistakes in Level A conformance). Furthermore, despite the increasingly high number of legal and regulatory measures about accessibility, their practical application still remains unsatisfactory. A greater level of involvement must be assumed in order to raise awareness and enhance training efforts towards accessibility in the context of collective Information and Communication Technologies (ICTs), since this represents not only a necessity but also an ethical, social, political and legal commitment to be assumed by society.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper investigates how people return to information in a dynamic information environment. For example, a person might want to return to Web content via a link encountered earlier on a Web page, only to learn that the link has since been removed. Changes can benefit users by providing new information, but they hinder returning to previously viewed information. The observational study presented here analyzed instances, collected via a Web search, where people expressed difficulty re-finding information because of changes to the information or its environment. A number of interesting observations arose from this analysis, including that the path originally taken to get to the information target appeared important in its re-retrieval, whereas, surprisingly, the temporal aspects of when the information was seen before were not. While people expressed frustration when problems arose, an explanation of why the change had occurred was often sufficient to allay that frustration, even in the absence of a solution. The implications of these observations for systems that support re-finding in dynamic environments are discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Search engines exploit the Web's hyperlink structure to help infer information content. The new phenomenon of personal Web logs, or 'blogs', encourage more extensive annotation of Web content. If their resulting link structures bias the Web crawling applications that search engines depend upon, there are implications for another form of annotation rapidly on the rise, the Semantic Web. We conducted a Web crawl of 160 000 pages in which the link structure of the Web is compared with that of several thousand blogs. Results show that the two link structures are significantly different. We analyse the differences and infer the likely effect upon the performance of existing and future Web agents. The Semantic Web offers new opportunities to navigate the Web, but Web agents should be designed to take advantage of the emerging link structures, or their effectiveness will diminish.