952 resultados para BIOINFORMATICS DATABASES
Resumo:
Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.
Resumo:
The etiology of central nervous system tumors (CNSTs) is mainly unknown. Aside from extremely rare genetic conditions, such as neurofibromatosis and tuberous sclerosis, the only unequivocally identified risk factor is exposure to ionizing radiation, and this explains only a very small fraction of cases. Using meta-analysis, gene networking and bioinformatics methods, this dissertation explored the hypothesis that environmental exposures produce genetic and epigenetic alterations that may be involved in the etiology of CNSTs. A meta-analysis of epidemiological studies of pesticides and pediatric brain tumors revealed a significantly increased risk of brain tumors among children whose mothers had farm-related exposures during pregnancy. A dose response was recognized when this risk estimate was compared to those for risk of brain tumors from maternal exposure to non-agricultural pesticides during pregnancy, and risk of brain tumors among children exposed to agricultural activities. Through meta-analysis of several microarray studies which compared normal tissue to astrocytomas, we were able to identify a list of 554 genes which were differentially expressed in the majority of astrocytomas. Many of these genes have in fact been implicated in development of astrocytoma, including EGFR, HIF-1α, c-Myc, WNT5A, and IDH3A. Reverse engineering of these 554 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I-IV), and ‘key genes’ within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme (GBM) were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. Lastly, bioinformatics analysis of environmental databases and curated published results on GBM was able to identify numerous potential pathways and geneenvironment interactions that may play key roles in astrocytoma development. Findings from this research have strong potential to advance our understanding of the etiology and susceptibility to CNSTs. Validation of our ‘key genes’ and pathways could potentially lead to useful tools for early detection and novel therapeutic options for these tumors.
Resumo:
Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^
Resumo:
Current technology permits connecting local networks via high-bandwidth telephone lines. Central coordinator nodes may use Intelligent Networks to manage data flow over dialed data lines, e.g. ISDN, and to establish connections between LANs. This dissertation focuses on cost minimization and on establishing operational policies for query distribution over heterogeneous, geographically distributed databases. Based on our study of query distribution strategies, public network tariff policies, and database interface standards we propose methods for communication cost estimation, strategies for the reduction of bandwidth allocation, and guidelines for central to node communication protocols. Our conclusion is that dialed data lines offer a cost effective alternative for the implementation of distributed database query systems, and that existing commercial software may be adapted to support query processing in heterogeneous distributed database systems. ^
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
During the summer of 2016, Duke University Libraries staff began a project to update the way that research databases are displayed on the library website. The new research databases page is a customized version of the default A-Z list that Springshare provides for its LibGuides content management system. Duke Libraries staff made adjustments to the content and interface of the page. In order to see how Duke users navigated the new interface, usability testing was conducted on August 9th, 2016.
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
Background: Esophageal adenocarcinoma (EA) is one of the fastest rising cancers in western countries. Barrett’s Esophagus (BE) is the premalignant precursor of EA. However, only a subset of BE patients develop EA, which complicates the clinical management in the absence of valid predictors. Genetic risk factors for BE and EA are incompletely understood. This study aimed to identify novel genetic risk factors for BE and EA.Methods: Within an international consortium of groups involved in the genetics of BE/EA, we performed the first meta-analysis of all genome-wide association studies (GWAS) available, involving 6,167 BE patients, 4,112 EA patients, and 17,159 representative controls, all of European ancestry, genotyped on Illumina high-density SNP-arrays, collected from four separate studies within North America, Europe, and Australia. Meta-analysis was conducted using the fixed-effects inverse variance-weighting approach. We used the standard genome-wide significant threshold of 5×10-8 for this study. We also conducted an association analysis following reweighting of loci using an approach that investigates annotation enrichment among the genome-wide significant loci. The entire GWAS-data set was also analyzed using bioinformatics approaches including functional annotation databases as well as gene-based and pathway-based methods in order to identify pathophysiologically relevant cellular pathways.Findings: We identified eight new associated risk loci for BE and EA, within or near the CFTR (rs17451754, P=4·8×10-10), MSRA (rs17749155, P=5·2×10-10), BLK (rs10108511, P=2·1×10-9), KHDRBS2 (rs62423175, P=3·0×10-9), TPPP/CEP72 (rs9918259, P=3·2×10-9), TMOD1 (rs7852462, P=1·5×10-8), SATB2 (rs139606545, P=2·0×10-8), and HTR3C/ABCC5 genes (rs9823696, P=1·6×10-8). A further novel risk locus at LPA (rs12207195, posteriori probability=0·925) was identified after re-weighting using significantly enriched annotations. This study thereby doubled the number of known risk loci. The strongest disease pathways identified (P<10-6) belong to muscle cell differentiation and to mesenchyme development/differentiation, which fit with current pathophysiological BE/EA concepts. To our knowledge, this study identified for the first time an EA-specific association (rs9823696, P=1·6×10-8) near HTR3C/ABCC5 which is independent of BE development (P=0·45).Interpretation: The identified disease loci and pathways reveal new insights into the etiology of BE and EA. Furthermore, the EA-specific association at HTR3C/ABCC5 may constitute a novel genetic marker for the prediction of transition from BE to EA. Mutations in CFTR, one of the new risk loci identified in this study, cause cystic fibrosis (CF), the most common recessive disorder in Europeans. Gastroesophageal reflux (GER) belongs to the phenotypic CF-spectrum and represents the main risk factor for BE/EA. Thus, the CFTR locus may trigger a common GER-mediated pathophysiology.
Resumo:
Background: There are a lack of reliable data on the epidemiology and associated burden and costs of asthma. We sought to provide the first UK-wide estimates of the epidemiology, healthcare utilisation and costs of asthma.
Methods: We obtained and analysed asthma-relevant data from 27 datasets: these comprised national health surveys for 2010-11, and routine administrative, health and social care datasets for 2011-12; 2011-12 costs were estimated in pounds sterling using economic modelling.
Results: The prevalence of asthma depended on the definition and data source used. The UK lifetime prevalence of patient-reported symptoms suggestive of asthma was 29.5 % (95 % CI, 27.7-31.3; n = 18.5 million (m) people) and 15.6 % (14.3-16.9, n = 9.8 m) for patient-reported clinician-diagnosed asthma. The annual prevalence of patient-reported clinician-diagnosed-and-treated asthma was 9.6 % (8.9-10.3, n = 6.0 m) and of clinician-reported, diagnosed-and-treated asthma 5.7 % (5.7-5.7; n = 3.6 m). Asthma resulted in at least 6.3 m primary care consultations, 93,000 hospital in-patient episodes, 1800 intensive-care unit episodes and 36,800 disability living allowance claims. The costs of asthma were estimated at least £1.1 billion: 74 % of these costs were for provision of primary care services (60 % prescribing, 14 % consultations), 13 % for disability claims, and 12 % for hospital care. There were 1160 asthma deaths.
Conclusions: Asthma is very common and is responsible for considerable morbidity, healthcare utilisation and financial costs to the UK public sector. Greater policy focus on primary care provision is needed to reduce the risk of asthma exacerbations, hospitalisations and deaths, and reduce costs.
Resumo:
The last decades of the 20th century defined the genetic engineering advent, climaxing in the development of techniques, such as PCR and Sanger sequencing. This, permitted the appearance of new techniques to sequencing whole genomes, identified as next-generation sequencing. One of the many applications of these techniques is the in silico search for new secondary metabolites, synthesized by microorganisms exhibiting antimicrobial properties. The peptide antibiotics compounds can be classified in two classes, according to their biosynthesis, in ribosomal or nonribosomal peptides. Lanthipeptides are the most studied ribosomal peptides and are characterized by the presence of lanthionine and methylanthionine that result from posttranslational modifications. Lanthipeptides are divided in four classes, depending on their biosynthetic machinery. In class I, a LanB enzyme dehydrate serine and threonine residues in the C-terminus precursor peptide. Then, these residues undergo a cyclization step performed by a LanC enzyme, forming the lanthionine rings. The cleavage and the transport of the peptide is achieved by the LanP and LanT enzymes, respectively. Although, in class II only one enzyme, LanM, is responsible for the dehydration and cyclization steps and also only one enzyme performs the cleavage and transport, LanT. Pedobacter sp. NL19 is a Gram-negative bacterium, isolated from sludge of an abandon uranium mine, in Viseu (Portugal). Antibacterial activity in vitro was detected against several Gram-positive and Gram-negative bacteria. Sequencing and in silico analysis of NL19 genome revealed the presence of 21 biosynthetic clusters for secondary metabolites, including nonribosomal and ribosomal peptides biosynthetic clusters. Four lanthipeptides clusters were predicted, comprising the precursor peptides, the modifying enzymes (LanB and LanC), and also a bifunctional LanT. This result revealed the hybrid nature of the clusters, comprising characteristics from two distinct classes, which are poorly described in literature. The phylogenetic analysis of their enzymes showed that they clustered within the bacteroidetes clade. Furthermore, hybrid gene clusters were also found in other species of this phylum, revealing that it is a common characteristic in this group. Finally, the analysis of NL19 colonies by MALDI-TOF MS allowed the identification of a 3180 Da mass that corresponds to the predicted mass of a lanthipeptide encoded in one of the clusters. However, this result is not fully conclusive and further experiments are needed to understand the full potential of the compounds encoded in this type of clusters. In conclusion, it was determined that NL19 strain has the potential to produce diverse secondary metabolites, including lanthipeptides that were not functionally characterized so far.
Resumo:
Database schemas, in many organizations, are considered one of the critical assets to be protected. From database schemas, it is not only possible to infer the information being collected but also the way organizations manage their businesses and/or activities. One of the ways to disclose database schemas is through the Create, Read, Update and Delete (CRUD) expressions. In fact, their use can follow strict security rules or be unregulated by malicious users. In the first case, users are required to master database schemas. This can be critical when applications that access the database directly, which we call database interface applications (DIA), are developed by third party organizations via outsourcing. In the second case, users can disclose partially or totally database schemas following malicious algorithms based on CRUD expressions. To overcome this vulnerability, we propose a new technique where CRUD expressions cannot be directly manipulated by DIAs any more. Whenever a DIA starts-up, the associated database server generates a random codified token for each CRUD expression and sends it to the DIA that the database servers can use to execute the correspondent CRUD expression. In order to validate our proposal, we present a conceptual architectural model and a proof of concept.
Resumo:
Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present 'AnnoTALE', a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities. © 2016, Nature Publishing Group. All rights reserved.
Resumo:
Means to automate the fact replace the man in their job functions for a man and machines automatic mechanism, ie documentary specialists in computer and computers are the cornerstone of any modern system of documentation and information. From this point of view immediately raises the problem of deciding what resources should be applied to solve the specific problem in each specific case. We will not let alone to propose quick fixes or recipes in order to decide what to do in any case. The solution depends on repeat for each particular problem. What we want is to move some points that can serve as a basis for reflection to help find the best solution possible, once the problem is defined correctly. The first thing to do before starting any automated system project is to define exactly the domain you want to cover and assess with greater precision possible importance.
Resumo:
Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.
Resumo:
En el campo de la medicina clínica es crucial poder determinar la seguridad y la eficacia de los fármacos actuales y además acelerar el descubrimiento de nuevos compuestos activos. Para ello se llevan a cabo ensayos de laboratorio, que son métodos muy costosos y que requieren mucho tiempo. Sin embargo, la bioinformática puede facilitar enormemente la investigación clínica para los fines mencionados, ya que proporciona la predicción de la toxicidad de los fármacos y su actividad en enfermedades nuevas, así como la evolución de los compuestos activos descubiertos en ensayos clínicos. Esto se puede lograr gracias a la disponibilidad de herramientas de bioinformática y métodos de cribado virtual por ordenador (CV) que permitan probar todas las hipótesis necesarias antes de realizar los ensayos clínicos, tales como el docking estructural, mediante el programa BINDSURF. Sin embargo, la precisión de la mayoría de los métodos de CV se ve muy restringida a causa de las limitaciones presentes en las funciones de afinidad o scoring que describen las interacciones biomoleculares, e incluso hoy en día estas incertidumbres no se conocen completamente. En este trabajo abordamos este problema, proponiendo un nuevo enfoque en el que las redes neuronales se entrenan con información relativa a bases de datos de compuestos conocidos (proteínas diana y fármacos), y se aprovecha después el método para incrementar la precisión de las predicciones de afinidad del método de CV BINDSURF.