998 resultados para Orb-web spiders
Resumo:
Understanding the nature of the workloads and system demands created by users of the World Wide Web is crucial to properly designing and provisioning Web services. Previous measurements of Web client workloads have been shown to exhibit a number of characteristic features; however, it is not clear how those features may be changing with time. In this study we compare two measurements of Web client workloads separated in time by three years, both captured from the same computing facility at Boston University. The older dataset, obtained in 1995, is well-known in the research literature and has been the basis for a wide variety of studies. The newer dataset was captured in 1998 and is comparable in size to the older dataset. The new dataset has the drawback that the collection of users measured may no longer be representative of general Web users; however using it has the advantage that many comparisons can be drawn more clearly than would be possible using a new, different source of measurement. Our results fall into two categories. First we compare the statistical and distributional properties of Web requests across the two datasets. This serves to reinforce and deepen our understanding of the characteristic statistical properties of Web client requests. We find that the kinds of distributions that best describe document sizes have not changed between 1995 and 1998, although specific values of the distributional parameters are different. Second, we explore the question of how the observed differences in the properties of Web client requests, particularly the popularity and temporal locality properties, affect the potential for Web file caching in the network. We find that for the computing facility represented by our traces between 1995 and 1998, (1) the benefits of using size-based caching policies have diminished; and (2) the potential for caching requested files in the network has declined.
Resumo:
In this paper, we propose and evaluate an implementation of a prototype scalable web server. The prototype consists of a load-balanced cluster of hosts that collectively accept and service TCP connections. The host IP addresses are advertised using the Round Robin DNS technique, allowing any host to receive requests from any client. Once a client attempts to establish a TCP connection with one of the hosts, a decision is made as to whether or not the connection should be redirected to a different host---namely, the host with the lowest number of established connections. We use the low-overhead Distributed Packet Rewriting (DPR) technique to redirect TCP connections. In our prototype, each host keeps information about connections in hash tables and linked lists. Every time a packet arrives, it is examined to see if it has to be redirected or not. Load information is maintained using periodic broadcasts amongst the cluster hosts.
Resumo:
Under high loads, a Web server may be servicing many hundreds of connections concurrently. In traditional Web servers, the question of the order in which concurrent connections are serviced has been left to the operating system. In this paper we ask whether servers might provide better service by using non-traditional service ordering. In particular, for the case when a Web server is serving static files, we examine the costs and benefits of a policy that gives preferential service to short connections. We start by assessing the scheduling behavior of a commonly used server (Apache running on Linux) with respect to connection size and show that it does not appear to provide preferential service to short connections. We then examine the potential performance improvements of a policy that does favor short connections (shortest-connection-first). We show that mean response time can be improved by factors of four or five under shortest-connection-first, as compared to an (Apache-like) size-independent policy. Finally we assess the costs of shortest-connection-first scheduling in terms of unfairness (i.e., the degree to which long connections suffer). We show that under shortest-connection-first scheduling, long connections pay very little penalty. This surprising result can be understood as a consequence of heavy-tailed Web server workloads, in which most connections are small, but most server load is due to the few large connections. We support this explanation using analysis.
Resumo:
One of the most vexing questions facing researchers interested in the World Wide Web is why users often experience long delays in document retrieval. The Internet's size, complexity, and continued growth make this a difficult question to answer. We describe the Wide Area Web Measurement project (WAWM) which uses an infrastructure distributed across the Internet to study Web performance. The infrastructure enables simultaneous measurements of Web client performance, network performance and Web server performance. The infrastructure uses a Web traffic generator to create representative workloads on servers, and both active and passive tools to measure performance characteristics. Initial results based on a prototype installation of the infrastructure are presented in this paper.
Resumo:
Web caching aims to reduce network traffic, server load, and user-perceived retrieval delays by replicating "popular" content on proxy caches that are strategically placed within the network. While key to effective cache utilization, popularity information (e.g. relative access frequencies of objects requested through a proxy) is seldom incorporated directly in cache replacement algorithms. Rather, other properties of the request stream (e.g. temporal locality and content size), which are easier to capture in an on-line fashion, are used to indirectly infer popularity information, and hence drive cache replacement policies. Recent studies suggest that the correlation between these secondary properties and popularity is weakening due in part to the prevalence of efficient client and proxy caches (which tend to mask these correlations). This trend points to the need for proxy cache replacement algorithms that directly capture and use popularity information. In this paper, we (1) present an on-line algorithm that effectively captures and maintains an accurate popularity profile of Web objects requested through a caching proxy, (2) propose a novel cache replacement policy that uses such information to generalize the well-known GreedyDual-Size algorithm, and (3) show the superiority of our proposed algorithm by comparing it to a host of recently-proposed and widely-used algorithms using extensive trace-driven simulations and a variety of performance metrics.
Resumo:
Temporal locality of reference in Web request streams emerges from two distinct phenomena: the popularity of Web objects and the {\em temporal correlation} of requests. Capturing these two elements of temporal locality is important because it enables cache replacement policies to adjust how they capitalize on temporal locality based on the relative prevalence of these phenomena. In this paper, we show that temporal locality metrics proposed in the literature are unable to delineate between these two sources of temporal locality. In particular, we show that the commonly-used distribution of reference interarrival times is predominantly determined by the power law governing the popularity of documents in a request stream. To capture (and more importantly quantify) both sources of temporal locality in a request stream, we propose a new and robust metric that enables accurate delineation between locality due to popularity and that due to temporal correlation. Using this metric, we characterize the locality of reference in a number of representative proxy cache traces. Our findings show that there are measurable differences between the degrees (and sources) of temporal locality across these traces, and that these differences are effectively captured using our proposed metric. We illustrate the significance of our findings by summarizing the performance of a novel Web cache replacement policy---called GreedyDual*---which exploits both long-term popularity and short-term temporal correlation in an adaptive fashion. Our trace-driven simulation experiments (which are detailed in an accompanying Technical Report) show the superior performance of GreedyDual* when compared to other Web cache replacement policies.
Resumo:
The relative importance of long-term popularity and short-term temporal correlation of references for Web cache replacement policies has not been studied thoroughly. This is partially due to the lack of accurate characterization of temporal locality that enables the identification of the relative strengths of these two sources of temporal locality in a reference stream. In [21], we have proposed such a metric and have shown that Web reference streams differ significantly in the prevalence of these two sources of temporal locality. These finding underscore the importance of a Web caching strategy that can adapt in a dynamic fashion to the prevalence of these two sources of temporal locality. In this paper, we propose a novel cache replacement algorithm, GreedyDual*, which is a generalization of GreedyDual-Size. GreedyDual* uses the metrics proposed in [21] to adjust the relative worth of long-term popularity versus short-term temporal correlation of references. Our trace-driven simulation experiments show the superior performance of GreedyDual* when compared to other Web cache replacement policies proposed in the literature.
Resumo:
Much work on the performance of Web proxy caching has focused on high-level metrics such as hit rate and byte hit rate, but has ignored all the information related to the cachability of Web objects. Uncachable objects include those fetched by dynamic requests, objects with uncachable HTTP status code, objects with the uncachable HTTP header, objects with an HTTP 1.0 cookie, and objects without a last-modified header. Although some researchers filter the Web traces before they use them for analysis or simulation,many do not have a comprehensive understanding of the cachability of Web objects. In this paper we evaluate all the reasons that a Web object might be uncachable. We use traces from NLANR. Since these traces do not contain HTTP header information, we replay them using request generator to get the response header information. We find that between 15% and 40% of Web objects in our traces can not be cached by a Web proxy server. We use a LRU simulator to show the performance gap when the cachability is either considered or not. We show the characteristics of the cachable data set and find that all its characteristics are fairly similar to that of total data set. Finally, we present some additional results for the cachable and total data set: (1) The main reasons for uncachability are: dynamic requests, responses without last-modified header, responses with HTTP "302 Moved Temporarily" status code, and responses with a HTTP/1.0 cookie. (2) The cachability of Web objects can not be ignored in simulation because uncachable objects comprise a huge percentage of the total trace. Simulations without cachability consideration will be misleading.
Resumo:
This paper presents the design and implementation of an infrastructure that enables any Web application, regardless of its current state, to be stopped and uninstalled from a particular server, transferred to a new server, then installed, loaded, and resumed, with all these events occurring "on the fly" and totally transparent to clients. Such functionalities allow entire applications to fluidly move from server to server, reducing the overhead required to administer the system, and increasing its performance in a number of ways: (1) Dynamic replication of new instances of applications to several servers to raise throughput for scalability purposes, (2) Moving applications to servers to achieve load balancing or other resource management goals, (3) Caching entire applications on servers located closer to clients.
Resumo:
We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.
Resumo:
Global biodiversity is eroding at an alarming rate, through a combination of anthropogenic disturbance and environmental change. Ecological communities are bewildering in their complexity. Experimental ecologists strive to understand the mechanisms that drive the stability and structure of these complex communities in a bid to inform nature conservation and management. Two fields of research have had high profile success at developing theories related to these stabilising structures and testing them through controlled experimentation. Biodiversity-ecosystem functioning (BEF) research has explored the likely consequences of biodiversity loss on the functioning of natural systems and the provision of important ecosystem services. Empirical tests of BEF theory often consist of simplified laboratory and field experiments, carried out on subsets of ecological communities. Such experiments often overlook key information relating to patterns of interactions, important relationships, and fundamental ecosystem properties. The study of multi-species predator-prey interactions has also contributed much to our understanding of how complex systems are structured, particularly through the importance of indirect effects and predator suppression of prey populations. A growing number of studies describe these complex interactions in detailed food webs, which encompass all the interactions in a community. This has led to recent calls for an integration of BEF research with the comprehensive study of food web properties and patterns, to help elucidate the mechanisms that allow complex communities to persist in nature. This thesis adopts such an approach, through experimentation at Lough Hyne marine reserve, in southwest Ireland. Complex communities were allowed to develop naturally in exclusion cages, with only the diversity of top trophic levels controlled. Species removals were carried out and the resulting changes to predator-prey interactions, ecosystem functioning, food web properties, and stability were studied in detail. The findings of these experiments contribute greatly to our understanding of the stability and structure of complex natural communities.
Resumo:
Understanding how dynamic ecological communities respond to anthropogenic drivers of change such as habitat loss and fragmentation, climate change and the introduction of alien species requires that there is a theoretical framework able to predict community dynamics. At present there is a lack of empirical data that can be used to inform and test predictive models, which means that much of our knowledge regarding the response of ecological communities to perturbations is obtained from theoretical analyses and simulations. This thesis is composed of two strands of research: an empirical experiment conducted to inform the scaling of intraspecific and interspecific interaction strengths in a three species food chain and a series of theoretical analyses on the changes to equilibrium biomass abundances following press perturbations. The empirical experiment is a consequence of the difficulties faced when parameterising the intraspecific interaction strengths in a Lotka-Volterra model. A modification of the dynamic index is used alongside the original dynamic index to estimate intraspecific interactions and interspecific interaction strengths in a three species food. The theoretical analyses focused on the effect of press perturbations to focal species on the equilibrium biomass densities of all species in the community; these perturbations allow for the quantification of a species total net effect. It was found that there is a strong and consistent positive relationship between a species body size and its total net effect for a set of 97 synthetic food webs and also for the Ythan Estuary and Tuesday Lake food webs (empirically described food webs). It is shown that ecological constraints (due to allometric scaling) on the magnitude of entries in the community matrix cause the patterns observed in the inverse community matrix and thus explain the relationship between a species body mass and its total net effect in a community.
Resumo:
OBJECTIVE: The Veterans Health Administration has developed My HealtheVet (MHV), a Web-based portal that links veterans to their care in the veteran affairs (VA) system. The objective of this study was to measure diabetic veterans' access to and use of the Internet, and their interest in using MHV to help manage their diabetes. MATERIALS AND METHODS: Cross-sectional mailed survey of 201 patients with type 2 diabetes and hemoglobin A(1c) > 8.0% receiving primary care at any of five primary care clinic sites affiliated with a VA tertiary care facility. Main measures included Internet usage, access, and attitudes; computer skills; interest in using the Internet; awareness of and attitudes toward MHV; demographics; and socioeconomic status. RESULTS: A majority of respondents reported having access to the Internet at home. Nearly half of all respondents had searched online for information about diabetes, including some who did not have home Internet access. More than a third obtained "some" or "a lot" of their health-related information online. Forty-one percent reported being "very interested" in using MHV to help track their home blood glucose readings, a third of whom did not have home Internet access. Factors associated with being "very interested" were as follows: having access to the Internet at home (p < 0.001), "a lot/some" trust in the Internet as a source of health information (p = 0.002), lower age (p = 0.03), and some college (p = 0.04). Neither race (p = 0.44) nor income (p = 0.25) was significantly associated with interest in MHV. CONCLUSIONS: This study found that a diverse sample of older VA patients with sub-optimally controlled diabetes had a level of familiarity with and access to the Internet comparable to an age-matched national sample. In addition, there was a high degree of interest in using the Internet to help manage their diabetes.
Resumo:
BACKGROUND: Web-based decision aids are increasingly important in medical research and clinical care. However, few have been studied in an intensive care unit setting. The objectives of this study were to develop a Web-based decision aid for family members of patients receiving prolonged mechanical ventilation and to evaluate its usability and acceptability. METHODS: Using an iterative process involving 48 critical illness survivors, family surrogate decision makers, and intensivists, we developed a Web-based decision aid addressing goals of care preferences for surrogate decision makers of patients with prolonged mechanical ventilation that could be either administered by study staff or completed independently by family members (Development Phase). After piloting the decision aid among 13 surrogate decision makers and seven intensivists, we assessed the decision aid's usability in the Evaluation Phase among a cohort of 30 surrogate decision makers using the Systems Usability Scale (SUS). Acceptability was assessed using measures of satisfaction and preference for electronic Collaborative Decision Support (eCODES) versus the original printed decision aid. RESULTS: The final decision aid, termed 'electronic Collaborative Decision Support', provides a framework for shared decision making, elicits relevant values and preferences, incorporates clinical data to personalize prognostic estimates generated from the ProVent prediction model, generates a printable document summarizing the user's interaction with the decision aid, and can digitally archive each user session. Usability was excellent (mean SUS, 80 ± 10) overall, but lower among those 56 years and older (73 ± 7) versus those who were younger (84 ± 9); p = 0.03. A total of 93% of users reported a preference for electronic versus printed versions. CONCLUSIONS: The Web-based decision aid for ICU surrogate decision makers can facilitate highly individualized information sharing with excellent usability and acceptability. Decision aids that employ an electronic format such as eCODES represent a strategy that could enhance patient-clinician collaboration and decision making quality in intensive care.
Resumo:
This study evaluated the effect of an online diet-tracking tool on college students’ self-efficacy regarding fruit and vegetable intake. A convenience sample of students completed online self-efficacy surveys before and after a six-week intervention in which they tracked dietary intake with an online tool. Group one (n=22 fall, n=43 spring) accessed a tracking tool without nutrition tips; group two (n=20 fall, n=33 spring) accessed the tool and weekly nutrition tips. The control group (n=36 fall, n=60 spring) had access to neither. Each semester there were significant changes in self-efficacy from pre- to post-test for men and for women when experimental groups were combined (p<0.05 for all); however, these changes were inconsistent. Qualitative data showed that participants responded well to the simplicity of the tool, the immediacy of feedback, and the customized database containing foods available on campus. Future models should improve user engagement by increasing convenience, potentially by automation.