937 resultados para World Wide Web


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, everyone can effortlessly access a range of information on the World Wide Web (WWW). As information resources on the web continue to grow tremendously, it becomes progressively more difficult to meet high expectations of users and find relevant information. Although existing search engine technologies can find valuable information, however, they suffer from the problems of information overload and information mismatch. This paper presents a hybrid Web Information Retrieval approach allowing personalised search using ontology, user profile and collaborative filtering. This approach finds the context of user query with least user’s involvement, using ontology. Simultaneously, this approach uses time-based automatic user profile updating with user’s changing behaviour. Subsequently, this approach uses recommendations from similar users using collaborative filtering technique. The proposed method is evaluated with the FIRE 2010 dataset and manually generated dataset. Empirical analysis reveals that Precision, Recall and F-Score of most of the queries for many users are improved with proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web service and business process technologies are widely adopted to facilitate business automation and collaboration. Given the complexity of business processes, it is a sought-after feature to show a business process with different views to cater for the diverse interests, authority levels, etc., of different users. Aiming to implement such flexible process views in the Web service environment, this paper presents a novel framework named FlexView to support view abstraction and concretisation of WS-BPEL processes. In the FlexView framework, a rigorous view model is proposed to specify the dependency and correlation between structural components of process views with emphasis on the characteristics of WS-BPEL, and a set of rules are defined to guarantee the structural consistency between process views during transformations. A set of algorithms are developed to shift the abstraction and concretisation operations to the operational level. A prototype is also implemented for the proof-of-concept purpose. © 2010 Springer Science+Business Media, LLC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative XPath expressions, although not widely used, should be used in preference to absolute XPath expressions in extracting content from human-created Web documents. Evaluation of robustness covers four thousand queries executed on several hundred webpages. We show that in referencing parts of real world dynamic HTML documents, relative XPath expressions are on average significantly more robust than absolute XPath ones.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tutkimuksen tarkoituksena on selvittää, millaisia tiedonhakustrategioita tiedonhakijatkäyttävät etsiessään tietoa Internetistä. Käyttäjät luokitellaan kolmeen ryhmään tiedonhakustrategiansa mukaan. Haku-suuntautuneet käyttäjät käyttävät enimmäkseen hakukoneita, niin koko Internetin kattavia kuin sivustojen sisäisiäkin. Linkkisuuntautuneet taas joko tietävät tai arvaavat kohdesivuston osoitteen tai käyttävät laajoja hierarkkisia hakemistoja tiedon löytämiseen. He navigoivat mielummin sivustollakin linkkejä käyttäen eivätkä yleensä käytä hakutoimintoa. Eriytyneet käyttäjät eivät säännönmukaisesti suosi kumpaakaan tapaa, vaan valitsevat strategian tehtävän mukaan. Tietoa kerättiin kahdella tavalla: WWW-sivulla olleella kyselylomakkeella ja tiedonhakutestillä, jossa käyttäjille annettiin suoritettavaksi erilaisia tiedonhakutehtäviä. Tiedonhakutehtävät lajiteltiin kolmeen ryhmään sen mukaan, mitä strategiaa ne suosivat: hakustrategiaa suosivat, linkkistrategiaa suosivat ja neutraalit tehtävät. Tutkimusongelmana oli selvittää, kuinka tehtävän tyyppi ja ATK- ja Internet-kokemus vaikuttavat tiedonhakustrategian valintaan. Kävi ilmi, ettei käyttäjien suuntautuneisuus tiettyyn strategiaan vaikuta tiedonhakustrategian valintaan, vaan ainoastaan tehtävän tyyppi oli merkitsevä tekijä. Aikaisemman tutkimustiedon valossa kokeenet suosivat haku-suuntautunutta strategiaa. Tässä tutkimuksessa havaittiin, että kokemus lisäsi molempien strategioiden käyttöä yhtäläisesti, mutta tämä ilmiö oli havaittavissa ainoastaan kysely-lomakkeen pohjalta, ei testeissä. Molempien tiedonhakustrategioiden käyttö lisääntyy kokemuksen myötä, mutta suhteelliset osuudet pysyvät samoina. Syyksi sille, että kokeneet eivät suosineet hakustrategiaa, esitetään sitä, että tehtävät olivat liian helppoja, jolloin kokemus ei pääse auttamaan. Oleellisia eroja suoritusajoissa tai hakustrategian vaihdon tiheydessä ei havaittu suhteessa kokemukseen, ainoastaan suhteessa tehtävän tyyppiin.Tämäkin selitettiin toisentyyppisten tehtävien helppoudella. Tutkimuksessa pohditaan lisäksi asiantuntijuuden syntyä tiedonhakukontekstissa sekä esitetään metatietohypoteesi, jonka mukaan tiedonhakustrategian valintaan vaikuttaa tärkeänä tekijänä käyttäjän metatieto hakupalveluista. Metatietoon kuuluu tieto siitä, mitä hakukoneita on saatavilla, mitä tietoa verkosta kannattaa hakea, millä yrityksillä ja yhteisöillä on sisältörikkaat sivut jne, ja minkä tyyppistä tietoa yleensä on saatavilla. Kaikenkaikkiaan strategian valintaan esitetään taustalle kolmen tason tiedon vaikutusta: 1) oma asiantuntemus haettavasta alasta, 2) metatieto Internetin tiedonhakupalveluista sekä 3) tekninen tieto siitä, kuinka hakukoneet toimivat. Avainsanat: tiedonhaku, tiedonhakustrategia, hakukone, WWW, metatieto, kognitiivinen psykologia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of social phenomena in the World Wide Web has been rather fragmentary, andthere is no coherent, reseach-based theory about sense of community in Web environment. Sense of community means part of one's self-concept that has to do with perceiving oneself belonging to, and feeling affinity to a certain social grouping. The present study aimed to find evidence for sense of community in Web environment, and specifically find out what the most critical psychological factors of sense of community would be. Based on known characteristics of real life communities and sense of community, and few occational studies of Web-communities, it was hypothesized that the following factors would be the most critical ones and that they could be grouped as prerequisites, facilitators and consequences of sense of community: awareness and social presence (prerequisites), criteria for membership and borders, common purpose, social interaction and reciprocity, norms and conformity, common history (facilitators), trust and accountability (consequences). In addition to critical factors, the present study aimed to find out if this kind of grouping would be valid. Furthermore, the effect of Web-community members' background variables to sense of community was of interest. In order to answer the questions, an online-questionnaire was created and tested. It included propositions that reflect factors that precede, facilitate and follow the sense of community in Web environment. A factor analysis was calculated to find out the critical factors and analyses of variance were calculated to see if the grouping to prerequisites, facilitators and consequences was right and how the background variables would affect the sense of community in Web environment. The results indicated that the psychological structure of sense of community in Web environment could not be presented with critical variables grouped as prerequisites, facilitators and consequences. Most factors did facilitate the sense of community, but based on this data it could not be argued that some of the factors chronologically precedesense of community and some follow it. Instead, the factor analysis revealed that the most critical factors in sense of community in Web environment are 1) reciprocal involvement, 2) basic trust for others, 3) similarity and common purpose of members, and 4) shared history of members. The most influencing background variables were the member's own participation activity (indicated with reading and writing messages) and the phase in membership lifecycle (from visitor to leader). The more the member participated and the further in membership life cycle he was, the more he felt sense of community. There are many descreptions of sense of community, but the present study was one of the first to actually measure the phenomenon in Web environment, and that gained well documented, valid results based on large data, proving that sense of community in Web environment is possible, and clarifying its psychological structure, thus enhancing the understanding of sense of community in Web environment. Keywords: sense of community, Web-community, psychology of the Internet

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PDB Goodies is a web-based graphical user interface (GUI) to manipulate the Protein Data Bank file containing the three-dimensional atomic coordinates of protein structures. The program also allows users to save the manipulated three-dimensional atomic coordinate file on their local client system. These fragments are used in various stages of structure elucidation and analysis. This software is incorporated with all the three-dimensional protein structures available in the Protein Data Bank, which presently holds approximately 18 000 structures. In addition, this program works on a three-dimensional atomic coordinate file (Protein Data Bank format) uploaded from the client machine. The program is written using CGI/PERL scripts and is platform independent. The program PDB Goodies can be accessed over the World Wide Web at http:// 144.16.71.11/pdbgoodies/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hydrogen bonds in biological macromolecules play significant structural and functional roles. They are the key contributors to most of the interactions without which no living system exists. In view of this, a web-based computing server, the Hydrogen Bonds Computing Server (HBCS), has been developed to compute hydrogen-bond interactions and their standard deviations for any given macromolecular structure. The computing server is connected to a locally maintained Protein Data Bank (PDB) archive. Thus, the user can calculate the above parameters for any deposited structure, and options have also been provided for the user to upload a structure in PDB format from the client machine. In addition, the server has been interfaced with the molecular viewers Jmol and JSmol to visualize the hydrogen-bond interactions. The proposed server is freely available and accessible via the World Wide Web at http://bioserver1.physics.iisc.ernet.in/hbcs/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

万维网(World Wide Web)是由大量的网页组成的,网页之间由超链接(HyperLink)相互连接。在传统上,人们对网络信息的分析和获取是依靠对网页内容的分析和处理来进行的。例如,传统的网络搜索引擎对网页上文本信息进行分析、索引,并将处理后的信息存储在数据库中,然后根据用户查询输入进行分析,获得查询结果。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Serious concerns have been raised about the ecological effects of industrialized fishing1, 2, 3, spurring a United Nations resolution on restoring fisheries and marine ecosystems to healthy levels4. However, a prerequisite for restoration is a general understanding of the composition and abundance of unexploited fish communities, relative to contemporary ones. We constructed trajectories of community biomass and composition of large predatory fishes in four continental shelf and nine oceanic systems, using all available data from the beginning of exploitation. Industrialized fisheries typically reduced community biomass by 80% within 15 years of exploitation. Compensatory increases in fast-growing species were observed, but often reversed within a decade. Using a meta-analytic approach, we estimate that large predatory fish biomass today is only about 10% of pre-industrial levels. We conclude that declines of large predators in coastal regions5 have extended throughout the global ocean, with potentially serious consequences for ecosystems5, 6, 7. Our analysis suggests that management based on recent data alone may be misleading, and provides minimum estimates for unexploited communities, which could serve as the ‘missing baseline’8 needed for future restoration efforts.