Utility of web content blocks in content extraction


Autoria(s): Kowalkiewicz, M.
Data(s)

2007

Resumo

Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.

Identificador

http://eprints.qut.edu.au/85993/

Publicador

Springer Netherlands

Relação

DOI:10.1007/1-4020-5634-6_22

Kowalkiewicz, M. (2007) Utility of web content blocks in content extraction. In Technologies for Business Information Systems. Springer Netherlands, pp. 253-262.

Tipo

Book Chapter