Biblioteca Digital

738 resultados para Annotation de génomes

La exploración del litoral atlántico norteafricano según el periplo de Hannón de Cartago

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El periplo de Hannón, frente a las propuestas que lo interpretan como una obra literaria, creemos que recoge un periplo auténtico, que sólo alcanzó cabo Juby y algunas de las Islas Canarias. Las refundaciones cartaginesas fueron todas en la Mauretania fértil, en los 7 primeros días de la expedición. Desde el islote de Kérne, en la expedición primó una primera exploración de evaluación, indicativo de que se trataba de apenas 2 o 3 barcos, con una tripulación limitada, que evitaban enfrentamientos con la población local. Los intérpretes Lixítai parecen conocer todos los puntos explorados, el río Chrétes, los etíopes del Alto Atlas costero, el gran golfo caluroso que finalizaba en el Hespérou Kéras, el volcán Theôn Óchema, o las gentes salvajes que denominaban Goríllai. Probablemente la mayor sorpresa fuese encontrar un volcán activo, emitiendo lava, que pudo ser la razón última para redactar este periplo. La falta de agua, alimentos y caza como razón para finalizar la expedición exploratoria sólo es comprensible en un trayecto corto que alcanzó hasta el inicio del desierto del Sahara. Otro tanto sucede con la ausencia de ríos importantes al Sur del río Chrétes, una clara prueba de que no se alcanzaron latitudes ecuatoriales y que los barcos se fueron alejando de la costa norteafricana.

Grammar in dictionaries revisited: the case of verbs with se

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper is a study about the way in which se structures are represented in 20 verb entries of nine dictionaries of Spanish language. There is a large number of these structures and they are problematic for native and non native speakers. Verbs of the analysis are middle-high frequency and, in the most part of the cases, very polysemous, and this allows to observe interconnections between the different se structures and the different meanings of each verb. Data of the lexicographic analysis are cross-checked with corpus analysis of the same units. As a result, it is observed that there is a large variety in the data which are offered in each dictionary and in the way they are offered, inter and intradictionary. The reasons range from the theoretical overall of each Project to practical performance. This leads to the conclusion that it is necessary to further progress in the dictionary model it is being handled, in order to offer lexico-grammatical phenomenon such as se verbs in an accurate, clear and exhaustive way.

Genome-wide association studies in oesophageal adenocarcinoma and Barrett's oesophagus: a large-scale meta-analysis

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Esophageal adenocarcinoma (EA) is one of the fastest rising cancers in western countries. Barrett’s Esophagus (BE) is the premalignant precursor of EA. However, only a subset of BE patients develop EA, which complicates the clinical management in the absence of valid predictors. Genetic risk factors for BE and EA are incompletely understood. This study aimed to identify novel genetic risk factors for BE and EA.Methods: Within an international consortium of groups involved in the genetics of BE/EA, we performed the first meta-analysis of all genome-wide association studies (GWAS) available, involving 6,167 BE patients, 4,112 EA patients, and 17,159 representative controls, all of European ancestry, genotyped on Illumina high-density SNP-arrays, collected from four separate studies within North America, Europe, and Australia. Meta-analysis was conducted using the fixed-effects inverse variance-weighting approach. We used the standard genome-wide significant threshold of 5×10-8 for this study. We also conducted an association analysis following reweighting of loci using an approach that investigates annotation enrichment among the genome-wide significant loci. The entire GWAS-data set was also analyzed using bioinformatics approaches including functional annotation databases as well as gene-based and pathway-based methods in order to identify pathophysiologically relevant cellular pathways.Findings: We identified eight new associated risk loci for BE and EA, within or near the CFTR (rs17451754, P=4·8×10-10), MSRA (rs17749155, P=5·2×10-10), BLK (rs10108511, P=2·1×10-9), KHDRBS2 (rs62423175, P=3·0×10-9), TPPP/CEP72 (rs9918259, P=3·2×10-9), TMOD1 (rs7852462, P=1·5×10-8), SATB2 (rs139606545, P=2·0×10-8), and HTR3C/ABCC5 genes (rs9823696, P=1·6×10-8). A further novel risk locus at LPA (rs12207195, posteriori probability=0·925) was identified after re-weighting using significantly enriched annotations. This study thereby doubled the number of known risk loci. The strongest disease pathways identified (P<10-6) belong to muscle cell differentiation and to mesenchyme development/differentiation, which fit with current pathophysiological BE/EA concepts. To our knowledge, this study identified for the first time an EA-specific association (rs9823696, P=1·6×10-8) near HTR3C/ABCC5 which is independent of BE development (P=0·45).Interpretation: The identified disease loci and pathways reveal new insights into the etiology of BE and EA. Furthermore, the EA-specific association at HTR3C/ABCC5 may constitute a novel genetic marker for the prediction of transition from BE to EA. Mutations in CFTR, one of the new risk loci identified in this study, cause cystic fibrosis (CF), the most common recessive disorder in Europeans. Gastroesophageal reflux (GER) belongs to the phenotypic CF-spectrum and represents the main risk factor for BE/EA. Thus, the CFTR locus may trigger a common GER-mediated pathophysiology.

A general method for human activity recognition in video

Relevância:

10.00% 10.00%

Publicador:

samExploreR: exploring reproducibility and robustness of RNA-seq results based on SAM files

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (a) a cost efficient and (b) an optimal experimental design leading to a compromise, e.g., in the sequencing depth of experiments.

RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes.

AVAILABILITY: Availability: samExploreR is available as an R package from Bioconductor (after acceptance of the paper, download link: http://www.bio-complexity.com/samExploreR_1.0.0.tar.gz).

Automated Equation Formulation for Causal Loop Diagrams

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The annotation of Business Dynamics models with parameters and equations, to simulate the system under study and further evaluate its simulation output, typically involves a lot of manual work. In this paper we present an approach for automated equation formulation of a given Causal Loop Diagram (CLD) and a set of associated time series with the help of neural network evolution (NEvo). NEvo enables the automated retrieval of surrogate equations for each quantity in the given CLD, hence it produces a fully annotated CLD that can be used for later simulations to predict future KPI development. In the end of the paper, we provide a detailed evaluation of NEvo on a business use-case to demonstrate its single step prediction capabilities.

Improving photofermentative hydrogen production through metabolic engineering and DOE (Design of Experiments)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A l’heure actuelle, les biocarburants renouvelables et qui ne nuit pas à l'environnement sont à l'étude intensive en raison de l'augmentation des problèmes de santé et de la diminution des combustibles fossiles. H2 est l'un des candidats les plus prometteurs en raison de ses caractéristiques uniques, telles que la densité d'énergie élevée et la génération faible ou inexistante de polluants. Une façon attrayante pour produire la H2 est par les bactéries photosynthétiques qui peuvent capter l'énergie lumineuse pour actionner la production H2 avec leur système de nitrogénase. L'objectif principal de cette étude était d'améliorer le rendement de H2 des bactéries photosynthétiques pourpres non sulfureuses utilisant une combinaison de génie métabolique et le plan des expériences. Une hypothèse est que le rendement en H2 pourrait être améliorée par la redirection de flux de cycle du Calvin-Benson-Bassham envers du système de nitrogénase qui catalyse la réduction des protons en H2. Ainsi, un PRK, phosphoribulose kinase, mutant « knock-out » de Rhodobacter capsulatus JP91 a été créé. L’analyse de la croissance sur des différentes sources de carbone a montré que ce mutant ne peut croître qu’avec l’acétate, sans toutefois produire d' H2. Un mutant spontané, YL1, a été récupéré qui a retenu l'cbbP (codant pour PRK) mutation d'origine, mais qui avait acquis la capacité de se développer sur le glucose et produire H2. Une étude de la production H2 sous différents niveaux d'éclairage a montré que le rendement d’YL1 était de 20-40% supérieure à la souche type sauvage JP91. Cependant, il n'y avait pas d'amélioration notable du taux de production de H2. Une étude cinétique a montré que la croissance et la production d'hydrogène sont fortement liées avec des électrons à partir du glucose principalement dirigés vers la production de H2 et la formation de la biomasse. Sous des intensités lumineuses faibles à intermédiaires, la production d'acides organiques est importante, ce qui suggère une nouvelle amélioration additionnel du rendement H2 pourrait être possible grâce à l'optimisation des processus. Dans une série d'expériences associées, un autre mutant spontané, YL2, qui a un phénotype similaire à YL1, a été testé pour la croissance dans un milieu contenant de l'ammonium. Les résultats ont montré que YL2 ne peut croître que avec de l'acétate comme source de carbone, encore une fois, sans produire de H2. Une incubation prolongée dans les milieux qui ne supportent pas la croissance de YL2 a permis l'isolement de deux mutants spontanés secondaires intéressants, YL3 et YL4. L'analyse par empreint du pied Western a montré que les deux souches ont, dans une gamme de concentrations d'ammonium, l'expression constitutive de la nitrogénase. Les génomes d’YL2, YL3 et YL4 ont été séquencés afin de trouver les mutations responsables de ce phénomène. Fait intéressant, les mutations de nifA1 et nifA2 ont été trouvés dans les deux YL3 et YL4. Il est probable qu'un changement conformationnel de NifA modifie l'interaction protéine-protéine entre NifA et PII protéines (telles que GlnB ou GlnK), lui permettant d'échapper à la régulation par l'ammonium, et donc d'être capable d'activer la transcription de la nitrogénase en présence d'ammonium. On ignore comment le nitrogénase synthétisé est capable de maintenir son activité parce qu’en théorie, il devrait également être soumis à une régulation post-traductionnelle par ammonium. Une autre preuve pourrait être obtenue par l'étude du transcriptome d’YL3 et YL4. Une première étude sur la production d’ H2 par YL3 et YL4 ont montré qu'ils sont capables d’une beaucoup plus grande production d'hydrogène que JP91 en milieu d'ammonium, qui ouvre la porte pour les études futures avec ces souches en utilisant des déchets contenant de l'ammonium en tant que substrats. Enfin, le reformage biologique de l'éthanol à H2 avec la bactérie photosynthétique, Rhodopseudomonas palustris CGA009 a été examiné. La production d'éthanol avec fermentation utilisant des ressources renouvelables microbiennes a été traitée comme une technique mature. Cependant, la plupart des études du reformage de l'éthanol à H2 se sont concentrés sur le reformage chimique à la vapeur, ce qui nécessite généralement une haute charge énergetique et résultats dans les émissions de gaz toxiques. Ainsi le reformage biologique de l'éthanol à H2 avec des bactéries photosynthétiques, qui peuvent capturer la lumière pour répondre aux besoins énergétiques de cette réaction, semble d’être plus prometteuse. Une étude précédente a démontré la production d'hydrogène à partir d'éthanol, toutefois, le rendement ou la durée de cette réaction n'a pas été examiné. Une analyse RSM (méthode de surface de réponse) a été réalisée dans laquelle les concentrations de trois facteurs principaux, l'intensité lumineuse, de l'éthanol et du glutamate ont été variés. Nos résultats ont montré que près de 2 moles de H2 peuvent être obtenus à partir d'une mole d'éthanol, 33% de ce qui est théoriquement possible.

Étude du génome chloroplastique des algues vertes de la classe Chlorophyceae

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Les algues unicellulaires de la classe Chlorophyceae sont particulièrement étudiées pour leur potentiel économique dans la production de biocarburant. La première taxonomie de cette classe a été faite avec l’avènement de la microscopie électronique et par la suite avec des phylogénies moléculaires. Cette lignée se divise en deux groupes : OCC (Oedogoniales + Chaetophorales + Chaetopeltidales) et CS (Chlamydomonadales + Sphaeropleales). Il existe de profondes incertitudes sur les positions phylogénétiques des organismes à la base du groupe CS. Afin de renforcer la phylogénie de ces organismes, les génomes chloroplastiques de cinq algues basales ont été séquencés à l’aide de la technologie de nouvelle génération 454 et assemblés de novo. Une analyse phylogénétique de 69 séquences de protéines a permis de montrer que trois des cinq organismes classés dans l’ordre Chlamydomonadales par la littérature actuelle sont en fait basaux dans l’ordre Sphaeropleales. Ce reclassement phylogénétique implique de nouvelles hypothèses sur l’évolution des corps flagellaires.

Trans-ethnic fine mapping highlights kidney-function genes linked to salt sensitivity

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We analyzed genome-wide association studies (GWASs), including data from 71,638 individuals from four ancestries, for estimated glomerular filtration rate (eGFR), a measure of kidney function used to define chronic kidney disease (CKD). We identified 20 loci attaining genome-wide-significant evidence of association (p < 5 × 10(-8)) with kidney function and highlighted that allelic effects on eGFR at lead SNPs are homogeneous across ancestries. We leveraged differences in the pattern of linkage disequilibrium between diverse populations to fine-map the 20 loci through construction of "credible sets" of variants driving eGFR association signals. Credible variants at the 20 eGFR loci were enriched for DNase I hypersensitivity sites (DHSs) in human kidney cells. DHS credible variants were expression quantitative trait loci for NFATC1 and RGS14 (at the SLC34A1 locus) in multiple tissues. Loss-of-function mutations in ancestral orthologs of both genes in Drosophila melanogaster were associated with altered sensitivity to salt stress. Renal mRNA expression of Nfatc1 and Rgs14 in a salt-sensitive mouse model was also reduced after exposure to a high-salt diet or induced CKD. Our study (1) demonstrates the utility of trans-ethnic fine mapping through integration of GWASs involving diverse populations with genomic annotation from relevant tissues to define molecular mechanisms by which association signals exert their effect and (2) suggests that salt sensitivity might be an important marker for biological processes that affect kidney function and CKD in humans.

Learning to Interpret and Generate Instructional Recipes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Étude de l'intégrité du génome chloroplastique de l'orge (Hordeum vulgare) en culture de microspores isolées

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Un enjeu actuel en biotechnologie est d’obtenir des plantes haploïdes doublées par la technique de la culture de microspores isolées (CMI). Pourtant, la CMI génère parfois une proportion importante de plantes albinos, laquelle peut atteindre 100 % chez certains cultivars. Des travaux antérieurs ont indiqué que des remaniements du génome chloroplastique seraient à l’origine de cet albinisme. Afin de mieux comprendre ce processus menant à l’albinisme, nous avons entrepris d’étudier l’intégrité du génome chloroplastique au sein de microspores d’orge et de plantes albinos via une approche de séquençage à grande échelle. L’ADN total extrait de microspores à un stade précoce de la CMI, d’une feuille de la plante-mère (témoin), et de feuilles albinos, a été séquencé et les séquences chloroplastiques ont été analysées. Ceci nous a permis de documenter pour la première fois une diminution de l’ADN chloroplastique chez les microspores. De plus une étude de variations structurales a démontré un abaissement généralisé de la quantité de génomes chloroplastiques chez les microspores. Enfin, d’importants remaniements du génome chloroplastique ont été observés chez les plantes albinos, révélant une forte abondance de génomes chloroplastiques altérés de forme linéaire.

Analyse transcriptomique de l'interaction tripartite "Pseudozyma flocculosa-Blumeria graminis f.sp. hordei-Hordeum vulgare"

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Afin d’améliorer nos pratiques agricoles dans le contexte d’une agriculture durable, plusieurs agents de lutte biologique (ALB) ont été développés, testés et sont maintenant utilisés dans le monde pour combattre les pertes de rendements causées par les maladies. Blumeria graminis f. sp. hordei ( Bgh) est l’agent pathogène responsable du blanc de l’orge et peut réduire les rendements de cette culture jusqu’à 40%. Un champignon épiphyte, Pseudozyma flocculosa, a été découvert et identifié en 1987 en association étroite avec le blanc du trèfle. Les chercheurs ont alors remarqué que ce champignon exhibait une forte activité antagoniste contre le blanc en détruisant les structures de l’agent pathogène. Suite à d’autres travaux, il est apparu que ce comportement antagoniste était dirigé contre tous les membres des Erysiphales et semblait lié à la synthèse d’un glycolipide antifongique soit la flocculosine. Toutefois, on n’est toujours pas parvenus à associer l’efficacité de l’ALB avec la production de ce glycolipide. Ces observations suggèrent que d’autres facteurs seraient impliqués lorsque les deux protagonistes, l’ALB et le blanc, sont en contact. L’objectif principal de ce projet était donc de chercher d’autres mécanismes moléculaires pouvant expliquer l’interaction P. flocculosa-blanc et orge, en faisant une analyse transcriptomique complète des trois protagonistes en même temps. L’interaction tripartite a été échantillonnée à différents temps suivant l’inoculation de P. flocculosa sur des feuilles d’orge présentant déjà une intensité de blanc d’environ 50%. Les échantillons de feuilles prélevés ont ensuite été utilisés pour l’extraction de l’ARN qui ont été ensuite transformés en ADNc pour la préparation des librairies. Cinq répliquats ont été effectués pour chaque temps et le tout a été séquencé à l’aide de séquençage par synthèse Illumina HiSeq. Les séquences obtenues (reads) ont ensuite été analysées à l’aide du logiciel CLC Genomics Workbench. Brièvement, les séquences obtenues ont été cartographiées sur les trois génomes de référence. Suite à la cartographie, les analyses d’expression ont été conduites et les gènes exprimés de façon différentielle ont été recherchés. Cette étape a été conduite en portant une attention particulière aux gènes codant pour un groupe de protéines appelées CSEP pour “candidate secreted effector proteins” qui seraient possiblement impliquées dans l’interaction tripartite. Parmi les protéines exprimées de façon différentielle en présence du blanc ou en absence de ce dernier, nous avons pu constater que certaines CSEP étaient fortement exprimées en présence du blanc. Ces résultats sont prometteurs et nous offrent une piste certaine pour l’élucidation des mécanismes impliqués dans cette interaction tripartite.

Étude de la plasticité génomique des algues vertes de l’ordre Chlamydomonadales

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Les récents progrès en génomique ont conforté la complexité de l’origine des algues; d’un point de vue de la phylogénie des hôtes de l’endosymbiose, les algues forment un groupe évolutif polyphylétique. Les algues vertes forment deux embranchements majeurs : les Streptophyta et les Chlorophyta. Les chlorophytes comprennent la majorité des algues vertes connues et se regroupent en quatre classes. La première, les Prasinophyceae, occupe la position la plus basale, tandis que l’ordre d’embranchement des trois autres classes (Ulvophyceae, Trebouxiophyceae et Chlorophyceae) demeure encore incertain. Pour clarifier les relations évolutives chez les Clorophyceae, huit génomes chloroplastiques appartenant à la lignée des Chlamydomonadales, lignée majeure des Chlorophyceae, ont été séquencés et analysés. Des études phylogénétiques ont confirmé les classifications préétablies et de nouveaux clades se sont vus formés. Les génomes de ces algues chlorophycéennes ont révélé une architecture conservée avec un certain nombre de caractères spécifiques à la classe des Chlamydomonadales. L’analyse de leurs caractères moléculaires a révélé des génomes marqués par la réduction ou le réarrangement de leur répertoire génomique comparativement aux génomes chloroplastiques des algues vertes plus ancestrales.

Transactions and schema evolution in a persistent object-oriented programming system

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Applications are subject of a continuous evolution process with a profound impact on their underlining data model, hence requiring frequent updates in the applications' class structure and database structure as well. This twofold problem, schema evolution and instance adaptation, usually known as database evolution, is addressed in this thesis. Additionally, we address concurrency and error recovery problems with a novel meta-model and its aspect-oriented implementation. Modern object-oriented databases provide features that help programmers deal with object persistence, as well as all related problems such as database evolution, concurrency and error handling. In most systems there are transparent mechanisms to address these problems, nonetheless the database evolution problem still requires some human intervention, which consumes much of programmers' and database administrators' work effort. Earlier research works have demonstrated that aspect-oriented programming (AOP) techniques enable the development of flexible and pluggable systems. In these earlier works, the schema evolution and the instance adaptation problems were addressed as database management concerns. However, none of this research was focused on orthogonal persistent systems. We argue that AOP techniques are well suited to address these problems in orthogonal persistent systems. Regarding the concurrency and error recovery, earlier research showed that only syntactic obliviousness between the base program and aspects is possible. Our meta-model and framework follow an aspect-oriented approach focused on the object-oriented orthogonal persistent context. The proposed meta-model is characterized by its simplicity in order to achieve efficient and transparent database evolution mechanisms. Our meta-model supports multiple versions of a class structure by applying a class versioning strategy. Thus, enabling bidirectional application compatibility among versions of each class structure. That is to say, the database structure can be updated because earlier applications continue to work, as well as later applications that have only known the updated class structure. The specific characteristics of orthogonal persistent systems, as well as a metadata enrichment strategy within the application's source code, complete the inception of the meta-model and have motivated our research work. To test the feasibility of the approach, a prototype was developed. Our prototype is a framework that mediates the interaction between applications and the database, providing them with orthogonal persistence mechanisms. These mechanisms are introduced into applications as an {\it aspect} in the aspect-oriented sense. Objects do not require the extension of any super class, the implementation of an interface nor contain a particular annotation. Parametric type classes are also correctly handled by our framework. However, classes that belong to the programming environment must not be handled as versionable due to restrictions imposed by the Java Virtual Machine. Regarding concurrency support, the framework provides the applications with a multithreaded environment which supports database transactions and error recovery. The framework keeps applications oblivious to the database evolution problem, as well as persistence. Programmers can update the applications' class structure because the framework will produce a new version for it at the database metadata layer. Using our XML based pointcut/advice constructs, the framework's instance adaptation mechanism is extended, hence keeping the framework also oblivious to this problem. The potential developing gains provided by the prototype were benchmarked. In our case study, the results confirm that mechanisms' transparency has positive repercussions on the programmer's productivity, simplifying the entire evolution process at application and database levels. The meta-model itself also was benchmarked in terms of complexity and agility. Compared with other meta-models, it requires less meta-object modifications in each schema evolution step. Other types of tests were carried out in order to validate prototype and meta-model robustness. In order to perform these tests, we used an OO7 small size database due to its data model complexity. Since the developed prototype offers some features that were not observed in other known systems, performance benchmarks were not possible. However, the developed benchmark is now available to perform future performance comparisons with equivalent systems. In order to test our approach in a real world scenario, we developed a proof-of-concept application. This application was developed without any persistence mechanisms. Using our framework and minor changes applied to the application's source code, we added these mechanisms. Furthermore, we tested the application in a schema evolution scenario. This real world experience using our framework showed that applications remains oblivious to persistence and database evolution. In this case study, our framework proved to be a useful tool for programmers and database administrators. Performance issues and the single Java Virtual Machine concurrent model are the major limitations found in the framework.

Top-K Query Processing in Edge-Labeled Graph Data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.

«
1
2
...
42
43
44
45
46
47
48
49
50
»