917 resultados para Web Log Data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. RESULTS: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. CONCLUSIONS: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tässä työssä selvitettiin hyviä tapoja ja vakiintuneita käytäntöjä pitkän käyttöiän web-sovelluksen tekemiseksi. Saatiin selville, että sovelluksen elinkaaren aikana suurin osa kustannuksista tulee ylläpidosta. Tavoitteena oli tehdä pitkään käytettävä sovellus, joten ylläpidon kustannusten osuudesta tuli saada mandollisimman pieni. Ohjelmistotuotantoprosessissa mandollisimman aikaisessa vaiheessa havaitut virheet vähentävät korjauskustannuksia oleellisesti verrattuna siihen, että virheet havaittaisiin valmiissa tuotteessa. Siksi tässä työssä tehdyssä web-sovelluksessa panostettiin prosessin alkuvaiheisiin, määrittelyyn ja suunnitteluun. Web-sovelluksen ylläpidettävyyteen ja selkeyteen vaikuttavat oleellisesti hyvät ohjelmistokehitystavat. Käyttämällä valmista sovelluskehystä ja lisäämällä toiminnallisuuksia valmiiden ohjelmistokomponenttien avulla saadaan aikaiseksi hyvien tapojen mukaisesti tehty sovellus. Tässä työssä toteutettu web-sovellus laadittiin käyttämällä sovelluskehystä ja komponenttiarkkitehtuuria. Toteutuksesta saatiin selkeä. Sovellus jaettiin loogisiin kokonaisuuksiin, jotka käsittelevät näkymiä, tietokantaa ja tietojen yhdistämistä näiden välillä. Jokainen kokonaisuus on itsenäisesti toimiva, mikä auttaa sovelluksen ylläpitämisessä ja testaamisessa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a prototype of an interactive web-GIS tool for risk analysis of natural hazards, in particular for floods and landslides, based on open-source geospatial software and technologies. The aim of the presented tool is to assist the experts (risk managers) in analysing the impacts and consequences of a certain hazard event in a considered region, providing an essential input to the decision-making process in the selection of risk management strategies by responsible authorities and decision makers. This tool is based on the Boundless (OpenGeo Suite) framework and its client-side environment for prototype development, and it is one of the main modules of a web-based collaborative decision support platform in risk management. Within this platform, the users can import necessary maps and information to analyse areas at risk. Based on provided information and parameters, loss scenarios (amount of damages and number of fatalities) of a hazard event are generated on the fly and visualized interactively within the web-GIS interface of the platform. The annualized risk is calculated based on the combination of resultant loss scenarios with different return periods of the hazard event. The application of this developed prototype is demonstrated using a regional data set from one of the case study sites, Fella River of northeastern Italy, of the Marie Curie ITN CHANGES project.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Occupational hygiene practitioners typically assess the risk posed by occupational exposure by comparing exposure measurements to regulatory occupational exposure limits (OELs). In most jurisdictions, OELs are only available for exposure by the inhalation pathway. Skin notations are used to indicate substances for which dermal exposure may lead to health effects. However, these notations are either present or absent and provide no indication of acceptable levels of exposure. Furthermore, the methodology and framework for assigning skin notation differ widely across jurisdictions resulting in inconsistencies in the substances that carry notations. The UPERCUT tool was developed in response to these limitations. It helps occupational health stakeholders to assess the hazard associated with dermal exposure to chemicals. UPERCUT integrates dermal quantitative structure-activity relationships (QSARs) and toxicological data to provide users with a skin hazard index called the dermal hazard ratio (DHR) for the substance and scenario of interest. The DHR is the ratio between the estimated 'received' dose and the 'acceptable' dose. The 'received' dose is estimated using physico-chemical data and information on the exposure scenario provided by the user (body parts exposure and exposure duration), and the 'acceptable' dose is estimated using inhalation OELs and toxicological data. The uncertainty surrounding the DHR is estimated with Monte Carlo simulation. Additional information on the selected substances includes intrinsic skin permeation potential of the substance and the existence of skin notations. UPERCUT is the only available tool that estimates the absorbed dose and compares this to an acceptable dose. In the absence of dermal OELs it provides a systematic and simple approach for screening dermal exposure scenarios for 1686 substances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The most suitable method for estimation of size diversity is investigated. Size diversity is computed on the basis of the Shannon diversity expression adapted for continuous variables, such as size. It takes the form of an integral involving the probability density function (pdf) of the size of the individuals. Different approaches for the estimation of pdf are compared: parametric methods, assuming that data come from a determinate family of pdfs, and nonparametric methods, where pdf is estimated using some kind of local evaluation. Exponential, generalized Pareto, normal, and log-normal distributions have been used to generate simulated samples using estimated parameters from real samples. Nonparametric methods include discrete computation of data histograms based on size intervals and continuous kernel estimation of pdf. Kernel approach gives accurate estimation of size diversity, whilst parametric methods are only useful when the reference distribution have similar shape to the real one. Special attention is given for data standardization. The division of data by the sample geometric mean is proposedas the most suitable standardization method, which shows additional advantages: the same size diversity value is obtained when using original size or log-transformed data, and size measurements with different dimensionality (longitudes, areas, volumes or biomasses) may be immediately compared with the simple addition of ln k where kis the dimensionality (1, 2, or 3, respectively). Thus, the kernel estimation, after data standardization by division of sample geometric mean, arises as the most reliable and generalizable method of size diversity evaluation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A visual SLAM system has been implemented and optimised for real-time deployment on an AUV equipped with calibrated stereo cameras. The system incorporates a novel approach to landmark description in which landmarks are local sub maps that consist of a cloud of 3D points and their associated SIFT/SURF descriptors. Landmarks are also sparsely distributed which simplifies and accelerates data association and map updates. In addition to landmark-based localisation the system utilises visual odometry to estimate the pose of the vehicle in 6 degrees of freedom by identifying temporal matches between consecutive local sub maps and computing the motion. Both the extended Kalman filter and unscented Kalman filter have been considered for filtering the observations. The output of the filter is also smoothed using the Rauch-Tung-Striebel (RTS) method to obtain a better alignment of the sequence of local sub maps and to deliver a large-scale 3D acquisition of the surveyed area. Synthetic experiments have been performed using a simulation environment in which ray tracing is used to generate synthetic images for the stereo system

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tietojärjestelmien integraatio on nykypäivänä tärkeä osa alue yritysten toiminnassa ja kilpailukyvyn ylläpitämisessä. Palvelukeskeinen arkkitehtuuri ja Web palvelut on uusi joustava tapa tehdä tietojärjestelmien välinen integraatio. Web palveluiden yksi ydinkomponentti on UDDI, Universal Description, Discovery and Integration. UDDI toimii palvelurekisterin tavoin. UDDI määrittää tavan julkaista, löytää ja ottaa käyttöön Web palveluja. Web palveluja voidaan hakea UDDI:sta erilaisin kriteerein, kuten esimerkiksi palvelun sijainnin, yrityksen nimen ja toimialan perusteella. UDDI on myös itsessään Web palvelu, joka perustuu XML kuvauskieleen ja SOAP protokollaan. Työssä paneudutaan tarkemmin UDDI:in. UDDI:ta käsitellään tarkemmin myös teknisesti. Oleellinen osa UDDI:ta on ollut julkaisijoiden ja käyttäjien mielestä tietoturvan puute, joka on rajoittanut huomattavasti UDDI:n käyttöä ja käyttöönottamista. Työssä tarkastellaankin tarkemmin juuri tietoturvaan liittyviä asioita ja ratkaisuja sekä myös UDDI:n merkitystä yrityksille.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose - This article describes the use of web services to interconnect the GTBib interlibrary loan program with the OCLC WorldShare platform. Design/methodology/approach - We describe the current problem of duplication of procedures in libraries that have added their collections to the OCLC WorldCat catalogue in recent years and are therefore more likely to receive interlibrary loan requests through the WorldShare Platform. Findings - A solution that uses web services to insert and retrieve requests between the two systems is presented. Autonomous agents periodically check the status of the requests and keep them updated and synchronized. These agents also inform the library staff of any variation or inconsistency that is detected. Practical Implications - This technology reduces process management time by making it unnecessary to introduce the request data in both systems. Agents are used to check the consistency of statuses between the two systems, thus avoiding errors and omissions and improving the efficiency of the whole interlibrary loan process. Originality/value - This paper describes in detail the technical aspects of the solution as a reference for the development of future applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Development of methods to explore data from educational settings, to understand better the learning process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to concerns regarding globalisation and sustainable development, corporate social responsibility (CSR) is topical in the business context and in the field of accounting. The main objective of this study was to review previous academic literature in the field of CSR reporting and develop an insight into CSR reporting in the Web-based environment. The main purpose was to find out what Web-based CSR reporting is like and how companies are utilising the Internet to communicate on responsibility issues. I did not, however, collect empirical research data but limited my study into theoretical and descriptive examination. In order to create an insight into Web-based reporting, I examined the development, motives and current practices of CSR reporting. I concluded that the Internet is a unique, interactive communication channel that is used differently compared with annual reports. The amount of companies engaging in Web-based CSR reporting is increasing and the reporting practices in terms of e.g. content and accessibility of information vary. I also concluded that many companies have not yet discovered the true potential of the Web as an interactive communication medium.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web application performance testing is an emerging and important field of software engineering. As web applications become more commonplace and complex, the need for performance testing will only increase. This paper discusses common concepts, practices and tools that lie at the heart of web application performance testing. A pragmatic, hands-on approach is assumed where applicable; real-life examples of test tooling, execution and analysis are presented right next to the underpinning theory. At the client-side, web application performance is primarily driven by the amount of data transmitted over the wire. At the server-side, selection of programming language and platform, implementation complexity and configuration are the primary contributors to web application performance. Web application performance testing is an activity that requires delicate coordination between project stakeholders, developers, system administrators and testers in order to produce reliable and useful results. Proper test definition, execution, reporting and repeatable test results are of utmost importance. Open-source performance analysis tools such as Apache JMeter, Firebug and YSlow can be used to realise effective web application performance tests. A sample case study using these tools is presented in this paper. The sample application was found to perform poorly even under the moderate load incurred by the sample tests.