Biblioteca Digital

35 resultados para Web data

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast

Web Data Extraction from Query Result Pages Based on Visual and Content Features

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.

Automatically Annotating Structured Web Data Using a SVM-Based Multiclass Classifier

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.

Manipulating Interaction Strengths and the Consequences for Trivariate Patterns in a Marine Food Web

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We are experiencing a global extinction crisis as a result of climate change and human-induced alteration of natural habitats, with large predators at high trophic levels in food webs being particularly vulnerable. Unfortunately, there is a scarcity of food web data that can be used to assess how species extinctions alter the structure and stability of temporally and spatially replicated networks. We established a series of large experimental mesocosms in a shallow subtidal benthic marine system and constructed food webs for each replicate. After 6 months of community assembly, we removed large predators from the core communities of 20 experimental food webs, based on the strength of their trophic interactions, and monitored the changes in the networks' structure and stability over an 8-month period. Our analyses revealed the importance of allometric relationships and size-structuring in natural communities as a means of preserving food web structure and sustainability, despite significant changes in the diversity, stability and productivity of the system.

Macroecological patterns and niche structure in a new marine food web

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The integration of detailed information on feeding interactions with measures of abundance and body mass of individuals provides a powerful platform for understanding ecosystem organisation. Metabolism and, by proxy, body mass constrain the flux, turnover and storage of energy and biomass in food webs. Here, we present the first food web data for Lough Hyne, a species rich Irish Sea Lough. Through the application of individual-and size-based analysis of the abundance-body mass relationship, we tested predictions derived from the metabolic theory of ecology. We found that individual body mass constrained the flux of biomass and determined its distribution within the food web. Body mass was also an important determinant of diet width and niche overlap, and predator diets were nested hierarchically, such that diet width increased with body mass. We applied a novel measure of predator-prey biomass flux which revealed that most interactions in Lough Hyne were weak, whereas only a few were strong. Further, the patterning of interaction strength between prey sharing a common predator revealed that strong interactions were nearly always coupled with weak interactions. Our findings illustrate that important insights into the organisation, structure and stability of ecosystems can be achieved through the theoretical exploration of detailed empirical data.

Use of emerging technologies to assess differences in outdoor physical activity in St. Louis, Missouri.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduction: Abundant evidence shows that regular physical activity (PA) is an effective strategy for preventing obesity in people of diverse socioeconomic status (SES) and racial groups. The proportion of PA performed in parks and how this differs by proximate neighborhood SES has not been thoroughly investigated. The present project analyzes online public web data feeds to assess differences in outdoor PA by neighborhood SES in St. Louis, MO, USA.
Methods: First, running and walking routes submitted by users of the website MapMyRun.com were downloaded. The website enables participants to plan, map, record, and share their exercise routes and outdoor activities like runs, walks, and hikes in an online database. Next, the routes were visually illustrated using geographic information systems. Thereafter, using park data and 2010 Missouri census poverty data, the odds of running and walking routes traversing a low-SES neighborhood, and traversing a park in a low-SES neighborhood were examined in comparison to the odds of routes traversing higher-SES neighborhoods and higher-SES parks.
Results: Results show that a majority of running and walking routes occur in or at least traverse through a park. However, this finding does not hold when comparing low-SES neighborhoods to higher-SES neighborhoods in St. Louis. The odds of running in a park in a low-SES neighborhood were 54% lower than running in a park in a higher-SES neighborhood (OR = 0.46, CI = 0.17-1.23). The odds of walking in a park in a low-SES neighborhood were 17% lower than walking in a park in a higher-SES neighborhood (OR = 0.83, CI = 0.26-2.61).
Conclusion: The novel methods of this study include the use of inexpensive, unobtrusive, and publicly available web data feeds to examine PA in parks and differences by neighborhood SES. Emerging technologies like MapMyRun.com present significant advantages to enhance tracking of user-defined PA across large geographic and temporal settings.

Visually extracting data records from the deep web

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Web sites that rely on databases for their content are now ubiquitous. Query result pages are dynamically generated from these databases in response to user-submitted queries. Automatically extracting structured data from query result pages is a challenging problem, as the structure of the data is not explicitly represented. While humans have shown good intuition in visually understanding data records on a query result page as displayed by a web browser, no existing approach to data record extraction has made full use of this intuition. We propose a novel approach, in which we make use of the common sources of evidence that humans use to understand data records on a displayed query result page. These include structural regularity, and visual and content similarity between data records displayed on a query result page. Based on these observations we propose new techniques that can identify each data record individually, while ignoring noise items, such as navigation bars and adverts. We have implemented these techniques in a software prototype, rExtractor, and tested it using two datasets. Our experimental results show that our approach achieves significantly higher accuracy than previous approaches. Furthermore, it establishes the case for use of vision-based algorithms in the context of data extraction from web sites.

Evaluation of information extraction techniques to label extracted data from e-commerce web page

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Automatically determining and assigning shared and meaningful text labels to data extracted from an e-Commerce web page is a challenging problem. An e-Commerce web page can display a list of data records, each of which can contain a combination of data items (e.g. product name and price) and explicit labels, which describe some of these data items. Recent advances in extraction techniques have made it much easier to precisely extract individual data items and labels from a web page, however, there are two open problems: 1. assigning an explicit label to a data item, and 2. determining labels for the remaining data items. Furthermore, improvements in the availability and coverage of vocabularies, especially in the context of e-Commerce web sites, means that we now have access to a bank of relevant, meaningful and shared labels which can be assigned to extracted data items. However, there is a need for a technique which will take as input a set of extracted data items and assign automatically to them the most relevant and meaningful labels from a shared vocabulary. We observe that the Information Extraction (IE) community has developed a great number of techniques which solve problems similar to our own. In this work-in-progress paper we propose our intention to theoretically and experimentally evaluate different IE techniques to ascertain which is most suitable to solve this problem.

Birthweight and the risk of childhood-onset type 1 diabetes: a meta-analysis of observational studies using individual patient data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims/hypothesis: We investigated whether children who are heavier at birth have an increased risk of type 1 diabetes. Methods: Relevant studies published before February 2009 were identified from literature searches using MEDLINE, Web of Science and EMBASE. Authors of all studies containing relevant data were contacted and asked to provide individual patient data or conduct pre-specified analyses. Risk estimates of type 1 diabetes by category of birthweight were calculated for each study, before and after adjustment for potential confounders. Meta-analysis techniques were then used to derive combined ORs and investigate heterogeneity between studies. Results: Data were available for 29 predominantly European studies (five cohort, 24 case-control studies), including 12,807 cases of type 1 diabetes. Overall, studies consistently demonstrated that children with birthweight from 3.5 to 4 kg had an increased risk of diabetes of 6% (OR 1.06 [95% CI 1.01-1.11]; p=0.02) and children with birthweight over 4 kg had an increased risk of 10% (OR 1.10 [95% CI 1.04-1.19]; p=0.003), compared with children weighing 3.0 to 3.5 kg at birth. This corresponded to a linear increase in diabetes risk of 3% per 500 g increase in birthweight (OR 1.03 [95% CI 1.00-1.06]; p=0.03). Adjustments for potential confounders such as gestational age, maternal age, birth order, Caesarean section, breastfeeding and maternal diabetes had little effect on these findings. Conclusions/interpretation: Children who are heavier at birth have a significant and consistent, but relatively small increase in risk of type 1 diabetes. © 2010 Springer-Verlag.

--------------------------------------------------------------------------------

Reaxys Database Information|

--------------------------------------------------------------------------------

Relocation, reproduction and remaining alive in the orb-web spider

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When mortality is high, animals run a risk if they wait to accumulate resources for improved reproduction so they may trade-off the time of reproduction with number and size of offspring. Animals may attempt to improve food acquisition by relocation, even in 'sit and wait' predators. We examine these factors in an isolated population of an orb-web spider Zygiella x-notata. The population was monitored for 200 days from first egg laying until all adults had died. Large females produced their first clutch earlier than did small females and there was a positive correlation between female size and the number and size of eggs produced. Many females, presumably without eggs, abandoned their web site and relocated their web position. This is presumed because female Zygiella typically guard their eggs. In total, c. 25% of females reproduced but those that relocated were less likely to do so, and if they did, they produced the clutch at a later date than those that remained. When the date of lay was controlled there was no effect of relocation on egg number but relocated females produced smaller eggs. The data are consistent with the idea that females in resource-poor sites are more likely to relocate. Relocation seems to be a gamble to find a more productive site but one that achieves only a late clutch of small eggs and few achieve that.

Predatory fish loss affects the structure and functioning of a model marine food web

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rate of species loss is increasing on a global scale and predators are most at risk from human-induced extinction. The effects of losing predators are difficult to predict, even with experimental single species removals, because different combinations of species interact in unpredictable ways. We tested the effects of the loss of groups of common predators on herbivore and algal assemblages in a model benthic marine system. The predator groups were fish, shrimp and crabs. Each group was represented by at least two characteristic species based on data collected at local field sites. We examined the effects of the loss of predators while controlling for the loss of predator biomass. The identity, not the number of predator groups, affected herbivore abundance and assemblage structure. Removing fish led to a large increase in the abundance of dominant herbivores, such as Ampithoids and Caprellids. Predator identity also affected algal assemblage structure. It did not, however, affect total algal mass. Removing fish led to an increase in the final biomass of the least common taxa (red algae) and reduced the mass of the dominant taxa (brown algae). This compensatory shift in the algal assemblage appeared to facilitate the maintenance of a constant total algal biomass. In the absence of fish, shrimp at higher than ambient densities had a similar effect on herbivore abundance, showing that other groups could partially compensate for the loss of dominant predators. Crabs had no effect on herbivore or algal populations, possibly because they were not at carrying capacity in our experimental system. These findings show that contrary to the assumptions of many food web models, predators cannot be classified into a single functional group and their role in food webs depends on their identity and density in 'real' systems and carrying capacities.

Using specialist software for qualitative data analysis

Relevância:

30.00% 30.00%

Publicador:

Part-whole relations between food webs and the validity of local food-web descriptions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work analyzes the relationship between large food webs describing potential feeding relations between species and smaller sub-webs thereof describing relations actually realized in local communities of various sizes. Special attention is given to the relationships between patterns of phylogenetic correlations encountered in large webs and sub-webs. Based on the current theory of food-web topology as implemented in the matching model, it is shown that food webs are scale invariant in the following sense: given a large web described by the model, a smaller, randomly sampled sub-web thereof is described by the model as well. A stochastic analysis of model steady states reveals that such a change in scale goes along with a re-normalization of model parameters. Explicit formulae for the renormalized parameters are derived. Thus, the topology of food webs at all scales follows the same patterns, and these can be revealed by data and models referring to the local scale alone. As a by-product of the theory, a fast algorithm is derived which yields sample food webs from the exact steady state of the matching model for a high-dimensional trophic niche space in finite time. (C) 2008 Elsevier B.V. All rights reserved.

Estimating trophic link density from quantitative but incomplete diet data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The trophic link density and the stability of food webs are thought to be related, but the nature of this relation is controversial. This article introduces a method for estimating the link density from diet tables which do not cover the complete food web and do not resolve all diet items to species level. A simple formula for the error of this estimate is derived. Link density is determined as a function of a threshold diet fraction below which diet items are ignored (

Some properties of the speciation model for food-web structure - Mechanisms for degree distributions and intervality

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a mathematical analysis of the speciation model for food-web structure, which had in previous work been shown to yield a good description of empirical data of food-web topology. The degree distributions of the network are derived. Properties of the speciation model are compared to those of other models that successfully describe empirical data. It is argued that the speciation model unities the underlying ideas of previous theories. In particular, it offers a mechanistic explanation for the success of the niche model of Williams and Martinez and the frequent observation of intervality in empirical food webs. (c) 2005 Elsevier Ltd. All rights reserved.

The first WASP public data release

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The WASP (wide angle search for planets) project is an exoplanet transit survey that has been automatically taking wide field images since 2004. Two instruments, one in La Palma and the other in South Africa, continually monitor the night sky, building up light curves of millions of unique objects. These light curves are used to search for the characteristics of exoplanetary transits. This first public data release (DR1) of the WASP archive makes available all the light curve data and images from 2004 up to 2008 in both the Northern and Southern hemispheres. A web interface () to the data allows easy access over the Internet. The data set contains 3 631 972 raw images and 17 970 937 light curves. In total the light curves have 119 930 299 362 data points available between them.

«
1
2
3
»