21 resultados para on-disk data layout


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper assesses the impact of regional technological diversification on the emergence of new innovators across EU regions. Integrating analyses from regional economics, economic geography and technological change literatures, we explore the role that the regional embeddedness of actors characterised by diverse technological competencies may have in fostering novel and sustained interactions leading to new technological combinations. In particular, we test whether greater technological diversification improve regional ‘combinatorial’ opportunities leading to the emergence of new innovators. The analysis is based on panel data obtained merging regional economic data from Eurostat and patent data from the CRIOS-PATSTAT database over the period 1997–2006, covering 178 regions across 10 EU Countries. Accounting for different measures of economic and innovative activity at the NUTS2 level, our findings suggest that the regional co-location of diverse technological competencies contributes to the entry of new innovators, thereby shaping technological change and industry dynamics. Thus, this paper brings to the fore a better understanding of the relationship between regional diversity and technological change.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

GraphChi is the first reported disk-based graph engine that can handle billion-scale graphs on a single PC efficiently. GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs. With the novel technique of parallel sliding windows (PSW) to load subgraph from disk to memory for vertices and edges updating, it can achieve data processing performance close to and even better than those of mainstream distributed graph engines. GraphChi mentioned that its memory is not effectively utilized with large dataset, which leads to suboptimal computation performances. In this paper we are motivated by the concepts of 'pin ' from TurboGraph and 'ghost' from GraphLab to propose a new memory utilization mode for GraphChi, which is called Part-in-memory mode, to improve the GraphChi algorithm performance. The main idea is to pin a fixed part of data inside the memory during the whole computing process. Part-in-memory mode is successfully implemented with only about 40 additional lines of code to the original GraphChi engine. Extensive experiments are performed with large real datasets (including Twitter graph with 1.4 billion edges). The preliminary results show that Part-in-memory mode memory management approach effectively reduces the GraphChi running time by up to 60% in PageRank algorithm. Interestingly it is found that a larger portion of data pinned in memory does not always lead to better performance in the case that the whole dataset cannot be fitted in memory. There exists an optimal portion of data which should be kept in the memory to achieve the best computational performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark suite istested on both local computer cluster and virtual machines oncloud. By varying the number of computers and memory weexamine the scalability of the computing engines with increasingcomputing resources (such as CPU and memory). We also runcross-evaluation of generic and graph based analytic algorithmsover graph processing and generic platforms to identify thepotential performance degradation if only one processing engineis available. It is observed that both computing engines showgood scalability with increase of computing resources. WhileGraphLab largely outperforms Spark for graph algorithms, ithas close running time performance as Spark for non-graphalgorithms. Additionally the running time with Spark for graphalgorithms over cloud virtual machines is observed to increaseby almost 100% compared to over local computer clusters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The representation of serial position in sequences is an important topic in a variety of cognitive areas including the domains of language, memory, and motor control. In the neuropsychological literature, serial position data have often been normalized across different lengths, and an improved procedure for this has recently been reported by Machtynger and Shallice (2009). Effects of length and a U-shaped normalized serial position curve have been criteria for identifying working memory deficits. We present simulations and analyses to illustrate some of the issues that arise when relating serial position data to specific theories. We show that critical distinctions are often difficult to make based on normalized data. We suggest that curves for different lengths are best presented in their raw form and that binomial regression can be used to answer specific questions about the effects of length, position, and linear or nonlinear shape that are critical to making theoretical distinctions. © 2010 Psychology Press.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: There have been no published studies observing what happens to children post hospital discharge and if medication discrepancies occurred between the hospital and General Practitioner (GP) interface.1 Objectives: To identify the type of discrepancies between hospital discharge prescription and the patient's medicines after their first GP prescription. Method: Over a 3 month period (March–June 2012) across two London NHS hospital sites, parents of children on long term medications aged 18 years and under, were approached and consented prior to discharge from the ward. The patients were followed up 21 days after discharge by telephone call or home visit depending on their preference. The parent was asked if they had contacted their GP for further medications during the follow up, and if not the follow up was rescheduled. The parents were interviewed to find out if there were any discrepancies that occurred post discharge by comparing the patient's hospital discharge letter and medication at follow up. All this information was captured on a data collection form. Results: Eighty-eight patients were consented and 60 patients (68%; 60/88) were followed up by telephone call 21 days post discharge. A total of 317 medications were ordered at discharge among the 60 patients. Of the 60 that were followed up, nine were lost to follow up, one died post discharge, one was excluded from the study, and 11 had not contacted the GP and were to be followed up at a later date. Of the 38 patients who were followed up, 254 medications were ordered. Of the 38 patients there were 12 (32%) patients who had discrepancies that occurred between the discharge letter and GP, 19 (50%) had no issues, and seven (18%) mentioned issues to do with post discharge that were not discrepancies. Of the 12 patients who had at least one medication discrepancy (total 34 medications, range 1–7 discrepancies per patient), six patients had GP discrepancies, four had discrepancies resulting from a hospital outpatient appointment, one related to the discharge letter order and one was a complex discrepancy. An example: a patient was discharged on amiodarone liquid 16.5 mg daily as opposed to 65 mg daily of amiodarone from the GP. Upon interview the parent used volume units to communicate dose as opposed to the actual dose itself and the strengths of liquid had changed. Conclusions: The preliminary results from the study have shown that discrepancies due to several causes occur when paediatric patients leave hospital.