943 resultados para Data Utility
Resumo:
Server consolidation using virtualization technology has become an important technology to improve the energy efficiency of data centers. Virtual machine placement is the key in the server consolidation technology. In the past few years, many approaches to the virtual machine placement have been proposed. However, existing virtual machine placement approaches consider the energy consumption by physical machines only, but do not consider the energy consumption in communication network, in a data center. However, the energy consumption in the communication network in a data center is not trivial, and therefore should be considered in the virtual machine placement. In our preliminary research, we have proposed a genetic algorithm for a new virtual machine placement problem that considers the energy consumption in both physical machines and the communication network in a data center. Aiming at improving the performance and efficiency of the genetic algorithm, this paper presents a hybrid genetic algorithm for the energy-efficient virtual machine placement problem. Experimental results show that the hybrid genetic algorithm significantly outperforms the original genetic algorithm, and that the hybrid genetic algorithm is scalable.
Resumo:
OBJECTIVES: Four randomized phase II/III trials investigated the addition of cetuximab to platinum-based, first-line chemotherapy in patients with advanced non-small cell lung cancer (NSCLC). A meta-analysis was performed to examine the benefit/risk ratio for the addition of cetuximab to chemotherapy. MATERIALS AND METHODS: The meta-analysis included individual patient efficacy data from 2018 patients and individual patient safety data from 1970 patients comprising respectively the combined intention-to-treat and safety populations of the four trials. The effect of adding cetuximab to chemotherapy was measured by hazard ratios (HRs) obtained using a Cox proportional hazards model and odds ratios calculated by logistic regression. Survival rates at 1 year were calculated. All applied models were stratified by trial. Tests on heterogeneity of treatment effects across the trials and sensitivity analyses were performed for all endpoints. RESULTS: The meta-analysis demonstrated that the addition of cetuximab to chemotherapy significantly improved overall survival (HR 0.88, p=0.009, median 10.3 vs 9.4 months), progression-free survival (HR 0.90, p=0.045, median 4.7 vs 4.5 months) and response (odds ratio 1.46, p<0.001, overall response rate 32.2% vs 24.4%) compared with chemotherapy alone. The safety profile of chemotherapy plus cetuximab in the meta-analysis population was confirmed as manageable. Neither trials nor patient subgroups defined by key baseline characteristics showed significant heterogeneity for any endpoint. CONCLUSION: The addition of cetuximab to platinum-based, first-line chemotherapy for advanced NSCLC significantly improved outcome for all efficacy endpoints with an acceptable safety profile, indicating a favorable benefit/risk ratio.
Resumo:
Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.
Resumo:
This paper uses innovative content analysis techniques to map how the death of Oscar Pistorius' girlfriend, Reeva Steenkamp, was framed on Twitter conversations. Around 1.5 million posts from a two-week timeframe are analyzed with a combination of syntactic and semantic methods. This analysis is grounded in the frame analysis perspective and is different than sentiment analysis. Instead of looking for explicit evaluations, such as “he is guilty” or “he is innocent”, we showcase through the results how opinions can be identified by complex articulations of more implicit symbolic devices such as examples and metaphors repeatedly mentioned. Different frames are adopted by users as more information about the case is revealed: from a more episodic one, highly used in the very beginning, to more systemic approaches, highlighting the association of the event with urban violence, gun control issues, and violence against women. A detailed timeline of the discussions is provided.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
After nearly fifteen years of the open access (OA) movement and its hard-fought struggle for a more open scholarly communication system, publishers are realizing that business models can be both open and profitable. Making journal articles available on an OA license is becoming an accepted strategy for maximizing the value of content to both research communities and the businesses that serve them. The first blog in this two-part series celebrating Data Innovation Day looks at the role that data-innovation is playing in the shift to open access for journal articles.
Resumo:
OBJECTIVE Interest is growing in promoting utility cycling (i.e., for transport) as a means of incorporating daily physical activity (PA) into people’s lives, but little is known about correlates of utility cycling. Our primary aim was to examine cross-sectional relationships between socio-economic characteristics, neighborhood environment perceptions and psychological disposition with utility cycling (with or without additional recreational cycling). A secondary aim was to compare these relationships with those for recreation-only cycling. METHOD Baseline survey data (2007) from 10,233 participants in HABITAT, a multilevel longitudinal study of PA, sedentary behavior, and health in Brisbane adults aged 40-65 years, were analysed using multinomial regression modelling. RESULTS Greater income, habitual PA, and positive beliefs about PA were associated with utility and recreation-only cycling (p<0.05). Always having vehicle access and not in the labor force were associated with recreation-only cycling (p<0.05). Some or no vehicle access, part-time employment, and perceived environmental factors (little crime, few cul-de-sacs, nearby transport and recreational destinations) were associated with utility cycling (p<0.05). CONCLUSION Our findings suggest differences in associations between socio-economic, neighborhood perceptions and psychological factors and utility and recreation-only cycling in Brisbane residents aged 40-65 years. Tailored approaches appear to be required to promote utility and recreational cycling.
Resumo:
Recent studies have linked the ability of novice (CS1) programmers to read and explain code with their ability to write code. This study extends earlier work by asking CS2 students to explain object-oriented data structures problems that involve recursion. Results show a strong correlation between ability to explain code at an abstract level and performance on code writing and code reading test problems for these object-oriented data structures problems. The authors postulate that there is a common set of skills concerned with reasoning about programs that explains the correlation between writing code and explaining code. The authors suggest that an overly exclusive emphasis on code writing may be detrimental to learning to program. Non-code writing learning activities (e.g., reading and explaining code) are likely to improve student ability to reason about code and, by extension, improve student ability to write code. A judicious mix of code-writing and code-reading activities is recommended.
Resumo:
Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.
Resumo:
The quality of data collection methods selected and the integrity of the data collected are integral tot eh success of a study. This chapter focuses on data collection and study validity. After reading the chapter, readers should be able to define types of data collection methods in quantitative research; list advantages and disadvantages of each method; discuss factors related to internal and external validity; critically evaluate data collection methods and discuss the need to operationalise variables of interest for data collection.
Resumo:
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.
Resumo:
A spatial process observed over a lattice or a set of irregular regions is usually modeled using a conditionally autoregressive (CAR) model. The neighborhoods within a CAR model are generally formed deterministically using the inter-distances or boundaries between the regions. An extension of CAR model is proposed in this article where the selection of the neighborhood depends on unknown parameter(s). This extension is called a Stochastic Neighborhood CAR (SNCAR) model. The resulting model shows flexibility in accurately estimating covariance structures for data generated from a variety of spatial covariance models. Specific examples are illustrated using data generated from some common spatial covariance functions as well as real data concerning radioactive contamination of the soil in Switzerland after the Chernobyl accident.
Resumo:
Environmental monitoring is becoming critical as human activity and climate change place greater pressures on biodiversity, leading to an increasing need for data to make informed decisions. Acoustic sensors can help collect data across large areas for extended periods making them attractive in environmental monitoring. However, managing and analysing large volumes of environmental acoustic data is a great challenge and is consequently hindering the effective utilization of the big dataset collected. This paper presents an overview of our current techniques for collecting, storing and analysing large volumes of acoustic data efficiently, accurately, and cost-effectively.
Resumo:
Objectives: To establish injury rates among a population of elite athletes, to provide normative data for psychological variables hypothesised to be predictive of sport injuries, and to establish relations between measures of mood, perceived life stress, and injury characteristics as a precursor to introducing a psychological intervention to ameliorate the injury problem. Methods: As part of annual screening procedures, athletes at the Queensland Academy of Sport report medical and psychological status. Data from 845 screenings (433 female and 412 male athletes) were reviewed. Population specific tables of normative data were established for the Brunel mood scale and the perceived stress scale. Results: About 67% of athletes were injured each year, and about 18% were injured at the time of screening. Fifty percent of variance in stress scores could be predicted from mood scores, especially for vigour, depression, and tension. Mood and stress scores collectively had significant utility in predicting injury characteristics. Injury status (current, healed, no injury) was correctly classified with 39% accuracy, and back pain with 48% accuracy. Among a subset of 233 uninjured athletes (116 female and 117 male), five mood dimensions (anger, confusion, fatigue, tension, depression) were significantly related to orthopaedic incidents over the preceding 12 months, with each mood dimension explaining 6–7% of the variance. No sex differences in these relations were found. Conclusions: The findings support suggestions that psychological measures have utility in predicting athletic injury, although the relatively modest explained variance highlights the need to also include underlying physiological indicators of allostatic load, such as stress hormones, in predictive models.
Resumo:
This chapter addresses data modelling as a means of promoting statistical literacy in the early grades. Consideration is first given to the importance of increasing young children’s exposure to statistical reasoning experiences and how data modelling can be a rich means of doing so. Selected components of data modelling are then reviewed, followed by a report on some findings from the third-year of a three-year longitudinal study across grades one through three.