995 resultados para Web as a Corpus
Resumo:
Automatically determining and assigning shared and meaningful text labels to data extracted from an e-Commerce web page is a challenging problem. An e-Commerce web page can display a list of data records, each of which can contain a combination of data items (e.g. product name and price) and explicit labels, which describe some of these data items. Recent advances in extraction techniques have made it much easier to precisely extract individual data items and labels from a web page, however, there are two open problems: 1. assigning an explicit label to a data item, and 2. determining labels for the remaining data items. Furthermore, improvements in the availability and coverage of vocabularies, especially in the context of e-Commerce web sites, means that we now have access to a bank of relevant, meaningful and shared labels which can be assigned to extracted data items. However, there is a need for a technique which will take as input a set of extracted data items and assign automatically to them the most relevant and meaningful labels from a shared vocabulary. We observe that the Information Extraction (IE) community has developed a great number of techniques which solve problems similar to our own. In this work-in-progress paper we propose our intention to theoretically and experimentally evaluate different IE techniques to ascertain which is most suitable to solve this problem.
Resumo:
In this paper, we propose a new learning approach to Web data annotation, where a support vector machine-based multiclass classifier is trained to assign labels to data items. For data record extraction, a data section re-segmentation algorithm based on visual and content features is introduced to improve the performance of Web data record extraction. We have implemented the proposed approach and tested it with a large set of Web query result pages in different domains. Our experimental results show that our proposed approach is highly effective and efficient.
Resumo:
Repeat proteins have become increasingly important due to their capability to bind to almost any proteins and the potential as alternative therapy to monoclonal antibodies. In the past decade repeat proteins have been designed to mediate specific protein-protein interactions. The tetratricopeptide and ankyrin repeat proteins are two classes of helical repeat proteins that form different binding pockets to accommodate various partners. It is important to understand the factors that define folding and stability of repeat proteins in order to prioritize the most stable designed repeat proteins to further explore their potential binding affinities. Here we developed distance-dependant statistical potentials using two classes of alpha-helical repeat proteins, tetratricopeptide and ankyrin repeat proteins respectively, and evaluated their efficiency in predicting the stability of repeat proteins. We demonstrated that the repeat-specific statistical potentials based on these two classes of repeat proteins showed paramount accuracy compared with non-specific statistical potentials in: 1) discriminate correct vs. incorrect models 2) rank the stability of designed repeat proteins. In particular, the statistical scores correlate closely with the equilibrium unfolding free energies of repeat proteins and therefore would serve as a novel tool in quickly prioritizing the designed repeat proteins with high stability. StaRProtein web server was developed for predicting the stability of repeat proteins.
Resumo:
Carbon (C) and nitrogen (N) stable isotope analysis (SIA) has been used to identify the terrestrial subsidy of freshwater food webs. However, SIA fails to differentiate between the contributions of old and recently fixed terrestrial C and consequently cannot fully determine the source, age, and biochemical quality of terrestrial carbon. Natural abundance radiocarbon (∆14C) was used to examine the age and origin of carbon in Lower Lough Erne, Northern Ireland. 14C and stable isotope values were obtained from invertebrate, algae, and fish samples, and the results indicate that terrestrial organic C is evident at all trophic levels. High winter δ15N values in calanoid zooplankton (δ15N = 24‰) relative to phytoplankton and particulate organic matter (δ15N = 6‰ and 12‰, respectively) may reflect several microbial trophic levels between terrestrial C and calanoid invertebrates. Winter and summer calanoid ∆14C values show a seasonal switch between autochthonous and terrestrial carbon sources. Fish ∆14C values indicate terrestrial support at the highest trophic levels in littoral and pelagic food webs. 14C therefore is useful in attributing the source of carbon in freshwater in addition to tracing the pathway of terrestrial carbon through the food web.
Resumo:
BACKGROUND: Web-based programs are a potential medium for supporting weight loss because of their accessibility and wide reach. Research is warranted to determine the shorter- and longer-term effects of these programs in relation to weight loss and other health outcomes.
OBJECTIVE: The aim was to evaluate the effects of a Web-based component of a weight loss service (Imperative Health) in an overweight/obese population at risk of cardiovascular disease (CVD) using a randomized controlled design and a true control group.
METHODS: A total of 65 overweight/obese adults at high risk of CVD were randomly allocated to 1 of 2 groups. Group 1 (n=32) was provided with the Web-based program, which supported positive dietary and physical activity changes and assisted in managing weight. Group 2 continued with their usual self-care (n=33). Assessments were conducted face-to-face. The primary outcome was between-group change in weight at 3 months. Secondary outcomes included between-group change in anthropometric measurements, blood pressure, lipid measurements, physical activity, and energy intake at 3, 6, and 12 months. Interviews were conducted to explore participants' views of the Web-based program.
RESULTS: Retention rates for the intervention and control groups at 3 months were 78% (25/32) vs 97% (32/33), at 6 months were 66% (21/32) vs 94% (31/33), and at 12 months were 53% (17/32) vs 88% (29/33). Intention-to-treat analysis, using baseline observation carried forward imputation method, revealed that the intervention group lost more weight relative to the control group at 3 months (mean -3.41, 95% CI -4.70 to -2.13 kg vs mean -0.52, 95% CI -1.55 to 0.52 kg, P<.001), at 6 months (mean -3.47, 95% CI -4.95 to -1.98 kg vs mean -0.81, 95% CI -2.23 to 0.61 kg, P=.02), but not at 12 months (mean -2.38, 95% CI -3.48 to -0.97 kg vs mean -1.80, 95% CI -3.15 to -0.44 kg, P=.77). More intervention group participants lost ≥5% of their baseline body weight at 3 months (34%, 11/32 vs 3%, 1/33, P<.001) and 6 months (41%, 13/32 vs 18%, 6/33, P=.047), but not at 12 months (22%, 7/32 vs 21%, 7/33, P=.95) versus control group. The intervention group showed improvements in total cholesterol, triglycerides, and adopted more positive dietary and physical activity behaviors for up to 3 months verus control; however, these improvements were not sustained.
CONCLUSIONS: Although the intervention group had high attrition levels, this study provides evidence that this Web-based program can be used to initiate clinically relevant weight loss and lower CVD risk up to 3-6 months based on the proportion of intervention group participants losing ≥5% of their body weight versus control group. It also highlights a need for augmenting Web-based programs with further interventions, such as in-person support to enhance engagement and maintain these changes.
Resumo:
An orchestration is a multi-threaded computation that invokes a number of remote services. In practice, the responsiveness of a web-service fluctuates with demand; during surges in activity service responsiveness may be degraded, perhaps even to the point of failure. An uncertainty profile formalizes a user's perception of the effects of stress on an orchestration of web-services; it describes a strategic situation, modelled by a zero-sum angel–daemon game. Stressed web-service scenarios are analysed, using game theory, in a realistic way, lying between over-optimism (services are entirely reliable) and over-pessimism (all services are broken). The ‘resilience’ of an uncertainty profile can be assessed using the valuation of its associated zero-sum game. In order to demonstrate the validity of the approach, we consider two measures of resilience and a number of different stress models. It is shown how (i) uncertainty profiles can be ordered by risk (as measured by game valuations) and (ii) the structural properties of risk partial orders can be analysed.
Resumo:
Globally lakes bury and remineralise significant quantities of terrestrial C, and the associated flux of terrestrial C strongly influences their functioning. Changing deposition chemistry, land use and climate induced impacts on hydrology will affect soil biogeochemistry and terrestrial C export1 and hence lake ecology with potential feedbacks for regional and global C cycling. C and nitrogen stable isotope analysis (SIA) has identified the terrestrial subsidy of freshwater food webs. The approach relies on different 13C fractionation in aquatic and terrestrial primary producers, but also that inorganic C demands of aquatic primary producers are partly met by 13C depleted C from respiration of terrestrial C, and ‘old’ C derived from weathering of catchment geology. SIA thus fails to differentiate between the contributions of old and recently fixed terrestrial C. Natural abundance 14C can be used as an additional biomarker to untangle riverine food webs2 where aquatic and terrestrial δ 13C overlap, but may also be valuable for examining the age and origin of C in the lake. Primary production in lakes is based on dissolved inorganic C (DIC). DIC in alkaline lakes is partially derived from weathering of carbonaceous bedrock, a proportion of which is14C-free. The low 14C activity yields an artificial age offset leading samples to appear hundreds to thousands of years older than their actual age. As such, 14C can be used to identify the proportion of autochthonous C in the food-web. With terrestrial C inputs likely to increase, the origin and utilisation of ‘fossil’ or ‘recent’ allochthonous C in the food-web can also be determined. Stable isotopes and 14C were measured for biota, particulate organic matter (POM), DIC and dissolved organic carbon (DOC) from Lough Erne, Northern Ireland, a humic alkaline lake. Temporal and spatial variation was evident in DIC, DOC and POM C isotopes with implications for the fluctuation in terrestrial export processes. Ramped pyrolysis of lake surface sediment indicates the burial of two C components. 14C activity (507 ± 30 BP) of sediment combusted at 400˚C was consistent with algal values and younger than bulk sediment values (1097 ± 30 BP). The sample was subsequently combusted at 850˚C, yielding 14C values (1471 ± 30 BP) older than the bulk sediment age, suggesting that fossil terrestrial carbon is also buried in the sediment. Stable isotopes in the food web indicate that terrestrial organic C is also utilised by lake organisms. High winter δ 15N values in calanoid zooplankton (δ 15N = 24%¸) relative to phytoplankton and POM (δ 15N = 6h and 12h respectively) may reflect several microbial trophic levels between terrestrial C and calanoids. Furthermore winter calanoid 14C ages are consistent with DOC from an inflowing river (75 ± 24 BP), not phytoplankton (367 ± 70 BP). Summer calanoid δ 13C, δ 15N and 14C (345 ± 80 BP) indicate greater reliance on phytoplankton.
1 Monteith, D.T et al., (2007) Dissolved organic carbon trends resulting from changes in atmospheric deposition chemistry. Nature, 450:537-535
2 Caraco, N., et al.,(2010) Millennial-aged organic carbon subsidies to a modern river food web. Ecology,91: 2385-2393.
Resumo:
This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise). The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise. Third, we present an iterative algorithm for improved speech estimates. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement. Index Terms: corpus-based speech model, longest matching segment, speech enhancement, speech recognition
Resumo:
The continued use of traditional lecturing across Higher Education as the main teaching and learning approach in many disciplines must be challenged. An increasing number of studies suggest that this approach, compared to more active learning methods, is the least effective. In counterargument, the use of traditional lectures are often justified as necessary given a large student population. By analysing the implementation of a web based broadcasting approach which replaced the traditional lecture within a programming-based module, and thereby removed the student population rationale, it was hoped that the student learning experience would become more active and ultimately enhance learning on the module. The implemented model replaces the traditional approach of students attending an on-campus lecture theatre with a web-based live broadcast approach that focuses on students being active learners rather than passive recipients. Students ‘attend’ by viewing a live broadcast of the lecturer, presented as a talking head, and the lecturer’s desktop, via a web browser. Video and audio communication is primarily from tutor to students, with text-based comments used to provide communication from students to tutor. This approach promotes active learning by allowing student to perform activities on their own computer rather than the passive viewing and listening common encountered in large lecture classes. By analysing this approach over two years (n = 234 students) results indicate that 89.6% of students rated the approach as offering a highly positive learning experience. Comparing student performance across three academic years also indicates a positive change. A small data analytic analysis was conducted into student participation levels and suggests that the student cohort's willingness to engage with the broadcast lectures material is high.
Resumo:
We consider the problem of linking web search queries to entities from a knowledge base such as Wikipedia. Such linking enables converting a user’s web search session to a footprint in the knowledge base that could be used to enrich the user profile. Traditional methods for entity linking have been directed towards finding entity mentions in text documents such as news reports, each of which are possibly linked to multiple entities enabling the usage of measures like entity set coherence. Since web search queries are very small text fragments, such criteria that rely on existence of a multitude of mentions do not work too well on them. We propose a three-phase method for linking web search queries to wikipedia entities. The first phase does IR-style scoring of entities against the search query to narrow down to a subset of entities that are expanded using hyperlink information in the second phase to a larger set. Lastly, we use a graph traversal approach to identify the top entities to link the query to. Through an empirical evaluation on real-world web search queries, we illustrate that our methods significantly enhance the linking accuracy over state-of-the-art methods.