945 resultados para Graph databases


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: There are a lack of reliable data on the epidemiology and associated burden and costs of asthma. We sought to provide the first UK-wide estimates of the epidemiology, healthcare utilisation and costs of asthma. 

Methods: We obtained and analysed asthma-relevant data from 27 datasets: these comprised national health surveys for 2010-11, and routine administrative, health and social care datasets for 2011-12; 2011-12 costs were estimated in pounds sterling using economic modelling. 

Results: The prevalence of asthma depended on the definition and data source used. The UK lifetime prevalence of patient-reported symptoms suggestive of asthma was 29.5 % (95 % CI, 27.7-31.3; n = 18.5 million (m) people) and 15.6 % (14.3-16.9, n = 9.8 m) for patient-reported clinician-diagnosed asthma. The annual prevalence of patient-reported clinician-diagnosed-and-treated asthma was 9.6 % (8.9-10.3, n = 6.0 m) and of clinician-reported, diagnosed-and-treated asthma 5.7 % (5.7-5.7; n = 3.6 m). Asthma resulted in at least 6.3 m primary care consultations, 93,000 hospital in-patient episodes, 1800 intensive-care unit episodes and 36,800 disability living allowance claims. The costs of asthma were estimated at least £1.1 billion: 74 % of these costs were for provision of primary care services (60 % prescribing, 14 % consultations), 13 % for disability claims, and 12 % for hospital care. There were 1160 asthma deaths. 

Conclusions: Asthma is very common and is responsible for considerable morbidity, healthcare utilisation and financial costs to the UK public sector. Greater policy focus on primary care provision is needed to reduce the risk of asthma exacerbations, hospitalisations and deaths, and reduce costs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Verbal fluency is the ability to produce a satisfying sequence of spoken words during a given time interval. The core of verbal fluency lies in the capacity to manage the executive aspects of language. The standard scores of the semantic verbal fluency test are broadly used in the neuropsychological assessment of the elderly, and different analytical methods are likely to extract even more information from the data generated in this test. Graph theory, a mathematical approach to analyze relations between items, represents a promising tool to understand a variety of neuropsychological states. This study reports a graph analysis of data generated by the semantic verbal fluency test by cognitively healthy elderly (NC), patients with Mild Cognitive Impairment – subtypes amnestic(aMCI) and amnestic multiple domain (a+mdMCI) - and patients with Alzheimer’s disease (AD). Sequences of words were represented as a speech graph in which every word corresponded to a node and temporal links between words were represented by directed edges. To characterize the structure of the data we calculated 13 speech graph attributes (SGAs). The individuals were compared when divided in three (NC – MCI – AD) and four (NC – aMCI – a+mdMCI – AD) groups. When the three groups were compared, significant differences were found in the standard measure of correct words produced, and three SGA: diameter, average shortest path, and network density. SGA sorted the elderly groups with good specificity and sensitivity. When the four groups were compared, the groups differed significantly in network density, except between the two MCI subtypes and NC and aMCI. The diameter of the network and the average shortest path were significantly different between the NC and AD, and between aMCI and AD. SGA sorted the elderly in their groups with good specificity and sensitivity, performing better than the standard score of the task. These findings provide support for a new methodological frame to assess the strength of semantic memory through the verbal fluency task, with potential to amplify the predictive power of this test. Graph analysis is likely to become clinically relevant in neurology and psychiatry, and may be particularly useful for the differential diagnosis of the elderly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A parallel method for the dynamic partitioning of unstructured meshes is described. The method introduces a new iterative optimisation technique known as relative gain optimisation which both balances the workload and attempts to minimise the interprocessor communications overhead. Experiments on a series of adaptively refined meshes indicate that the algorithm provides partitions of an equivalent or higher quality to static partitioners (which do not reuse the existing partition) and much more rapidly. Perhaps more importantly, the algorithm results in only a small fraction of the amount of data migration compared to the static partitioners.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an investigation of a simple generic hyper-heuristic approach upon a set of widely used constructive heuristics (graph coloring heuristics) in timetabling. Within the hyperheuristic framework, a Tabu Search approach is employed to search for permutations of graph heuristics which are used for constructing timetables in exam and course timetabling problems. This underpins a multi-stage hyper-heuristic where the Tabu Search employs permutations upon a different number of graph heuristics in two stages. We study this graph-based hyper-heuristic approach within the context of exploring fundamental issues concerning the search space of the hyper-heuristic (the heuristic space) and the solution space. Such issues have not been addressed in other hyper-heuristic research. These approaches are tested on both exam and course benchmark timetabling problems and are compared with the fine-tuned bespoke state-of-the-art approaches. The results are within the range of the best results reported in the literature. The approach described here represents a significantly more generally applicable approach than the current state of the art in the literature. Future work will extend this hyper-heuristic framework by employing methodologies which are applicable on a wider range of timetabling and scheduling problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper is concerned with the hybridization of two graph coloring heuristics (Saturation Degree and Largest Degree), and their application within a hyperheuristic for exam timetabling problems. Hyper-heuristics can be seen as algorithms which intelligently select appropriate algorithms/heuristics for solving a problem. We developed a Tabu Search based hyper-heuristic to search for heuristic lists (of graph heuristics) for solving problems and investigated the heuristic lists found by employing knowledge discovery techniques. Two hybrid approaches (involving Saturation Degree and Largest Degree) including one which employs Case Based Reasoning are presented and discussed. Both the Tabu Search based hyper-heuristic and the hybrid approaches are tested on random and real-world exam timetabling problems. Experimental results are comparable with the best state-of-the-art approaches (as measured against established benchmark problems). The results also demonstrate an increased level of generality in our approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An earlier Case-based Reasoning (CBR) approach developed by the authors for educational course timetabling problems employed structured cases to represent the complex relationships between courses. Previous solved cases represented by attribute graphs were organized hierarchically into a decision tree. The retrieval searches for graph isomorphism among these attribute graphs. In this paper, the approach is further developed to solve a wider range of problems. We also attempt to retrieve those graphs that have common similar structures but also have some differences. Costs that are assigned to these differences have an input upon the similarity measure. A large number of experiments are performed consisting of different randomly produced timetabling problems and the results presented here strongly indicate that a CBR approach could provide a significant step forward in the development of automated system to solve difficult timetabling problems. They show that using relatively little effort, we can retrieve these structurally similar cases to provide high quality timetables for new timetabling problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an investigation of a simple generic hyper-heuristic approach upon a set of widely used constructive heuristics (graph coloring heuristics) in timetabling. Within the hyperheuristic framework, a Tabu Search approach is employed to search for permutations of graph heuristics which are used for constructing timetables in exam and course timetabling problems. This underpins a multi-stage hyper-heuristic where the Tabu Search employs permutations upon a different number of graph heuristics in two stages. We study this graph-based hyper-heuristic approach within the context of exploring fundamental issues concerning the search space of the hyper-heuristic (the heuristic space) and the solution space. Such issues have not been addressed in other hyper-heuristic research. These approaches are tested on both exam and course benchmark timetabling problems and are compared with the fine-tuned bespoke state-of-the-art approaches. The results are within the range of the best results reported in the literature. The approach described here represents a significantly more generally applicable approach than the current state of the art in the literature. Future work will extend this hyper-heuristic framework by employing methodologies which are applicable on a wider range of timetabling and scheduling problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we use concepts from graph theory and cellular biology represented as ontologies, to carry out semantic mining tasks on signaling pathway networks. Specifically, the paper describes the semantic enrichment of signaling pathway networks. A cell signaling network describes the basic cellular activities and their interactions. The main contribution of this paper is in the signaling pathway research area, it proposes a new technique to analyze and understand how changes in these networks may affect the transmission and flow of information, which produce diseases such as cancer and diabetes. Our approach is based on three concepts from graph theory (modularity, clustering and centrality) frequently used on social networks analysis. Our approach consists into two phases: the first uses the graph theory concepts to determine the cellular groups in the network, which we will call them communities; the second uses ontologies for the semantic enrichment of the cellular communities. The measures used from the graph theory allow us to determine the set of cells that are close (for example, in a disease), and the main cells in each community. We analyze our approach in two cases: TGF-β and the Alzheimer Disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis, we define the spectrum problem for packings (coverings) of G to be the problem of finding all graphs H such that a maximum G-packing (minimum G- covering) of the complete graph with the leave (excess) graph H exists. The set of achievable leave (excess) graphs in G-packings (G-coverings) of the complete graph is called the spectrum of leave (excess) graphs for G. Then, we consider this problem for trees with up to five edges. We will prove that for any tree T with up to five edges, if the leave graph in a maximum T-packing of the complete graph Kn has i edges, then the spectrum of leave graphs for T is the set of all simple graphs with i edges. In fact, for these T and i and H any simple graph with i edges, we will construct a maximum T-packing of Kn with the leave graph H. We will also show that for any tree T with k ≤ 5 edges, if the excess graph in a minimum T-covering of the complete graph Kn has i edges, then the spectrum of excess graphs for T is the set of all simple graphs and multigraphs with i edges, except for the case that T is a 5-star, for which the graph formed by four multiple edges is not achievable when n = 12.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation investigates the connection between spectral analysis and frame theory. When considering the spectral properties of a frame, we present a few novel results relating to the spectral decomposition. We first show that scalable frames have the property that the inner product of the scaling coefficients and the eigenvectors must equal the inverse eigenvalues. From this, we prove a similar result when an approximate scaling is obtained. We then focus on the optimization problems inherent to the scalable frames by first showing that there is an equivalence between scaling a frame and optimization problems with a non-restrictive objective function. Various objective functions are considered, and an analysis of the solution type is presented. For linear objectives, we can encourage sparse scalings, and with barrier objective functions, we force dense solutions. We further consider frames in high dimensions, and derive various solution techniques. From here, we restrict ourselves to various frame classes, to add more specificity to the results. Using frames generated from distributions allows for the placement of probabilistic bounds on scalability. For discrete distributions (Bernoulli and Rademacher), we bound the probability of encountering an ONB, and for continuous symmetric distributions (Uniform and Gaussian), we show that symmetry is retained in the transformed domain. We also prove several hyperplane-separation results. With the theory developed, we discuss graph applications of the scalability framework. We make a connection with graph conditioning, and show the in-feasibility of the problem in the general case. After a modification, we show that any complete graph can be conditioned. We then present a modification of standard PCA (robust PCA) developed by Cand\`es, and give some background into Electron Energy-Loss Spectroscopy (EELS). We design a novel scheme for the processing of EELS through robust PCA and least-squares regression, and test this scheme on biological samples. Finally, we take the idea of robust PCA and apply the technique of kernel PCA to perform robust manifold learning. We derive the problem and present an algorithm for its solution. There is also discussion of the differences with RPCA that make theoretical guarantees difficult.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Graphs are powerful tools to describe social, technological and biological networks, with nodes representing agents (people, websites, gene, etc.) and edges (or links) representing relations (or interactions) between agents. Examples of real-world networks include social networks, the World Wide Web, collaboration networks, protein networks, etc. Researchers often model these networks as random graphs. In this dissertation, we study a recently introduced social network model, named the Multiplicative Attribute Graph model (MAG), which takes into account the randomness of nodal attributes in the process of link formation (i.e., the probability of a link existing between two nodes depends on their attributes). Kim and Lesckovec, who defined the model, have claimed that this model exhibit some of the properties a real world social network is expected to have. Focusing on a homogeneous version of this model, we investigate the existence of zero-one laws for graph properties, e.g., the absence of isolated nodes, graph connectivity and the emergence of triangles. We obtain conditions on the parameters of the model, so that these properties occur with high or vanishingly probability as the number of nodes becomes unboundedly large. In that regime, we also investigate the property of triadic closure and the nodal degree distribution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One way to do a bibliometric study is to examine each of the records that make up a database, each record and extract key areas that may disclose relevant information about the use of the database and documents in the collection . This article shows how a reference database allows to obtain important data that can reach conclusions that in some cases surprising. For this study we used the following fields of Database Control Documentary Indigenous Nationalities of Costa Rica 1979-2003: author, place of publication, publisher, year, language and support. The database analyzed has two thousand records and was developed in the Winisis. Moreover, analysis of documents was made after processing of the data, which was to export records to Excel software Winisis. After this information extracted from their chosen fields and are held by their respective separate chart or graph to present the results obtained. Furthermore, we show the application of different methods to learn more about the scientific aspects as: the Price Index, the Index of Collaboration This contribution will, first, for (as) students in the course of the race Metric Studies of Library and Information Science, National University, demonstrate and practice what you learned in this area. They may also benefit the (as) professionals from different areas, such as anthropologists (as), sociologists (as), linguists and librarians (as), among others (as).