906 resultados para database searching
Resumo:
BACKGROUND: Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION: The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION: Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.
Resumo:
Résumé: L'automatisation du séquençage et de l'annotation des génomes, ainsi que l'application à large échelle de méthodes de mesure de l'expression génique, génèrent une quantité phénoménale de données pour des organismes modèles tels que l'homme ou la souris. Dans ce déluge de données, il devient très difficile d'obtenir des informations spécifiques à un organisme ou à un gène, et une telle recherche aboutit fréquemment à des réponses fragmentées, voir incomplètes. La création d'une base de données capable de gérer et d'intégrer aussi bien les données génomiques que les données transcriptomiques peut grandement améliorer la vitesse de recherche ainsi que la qualité des résultats obtenus, en permettant une comparaison directe de mesures d'expression des gènes provenant d'expériences réalisées grâce à des techniques différentes. L'objectif principal de ce projet, appelé CleanEx, est de fournir un accès direct aux données d'expression publiques par le biais de noms de gènes officiels, et de représenter des données d'expression produites selon des protocoles différents de manière à faciliter une analyse générale et une comparaison entre plusieurs jeux de données. Une mise à jour cohérente et régulière de la nomenclature des gènes est assurée en associant chaque expérience d'expression de gène à un identificateur permanent de la séquence-cible, donnant une description physique de la population d'ARN visée par l'expérience. Ces identificateurs sont ensuite associés à intervalles réguliers aux catalogues, en constante évolution, des gènes d'organismes modèles. Cette procédure automatique de traçage se fonde en partie sur des ressources externes d'information génomique, telles que UniGene et RefSeq. La partie centrale de CleanEx consiste en un index de gènes établi de manière hebdomadaire et qui contient les liens à toutes les données publiques d'expression déjà incorporées au système. En outre, la base de données des séquences-cible fournit un lien sur le gène correspondant ainsi qu'un contrôle de qualité de ce lien pour différents types de ressources expérimentales, telles que des clones ou des sondes Affymetrix. Le système de recherche en ligne de CleanEx offre un accès aux entrées individuelles ainsi qu'à des outils d'analyse croisée de jeux de donnnées. Ces outils se sont avérés très efficaces dans le cadre de la comparaison de l'expression de gènes, ainsi que, dans une certaine mesure, dans la détection d'une variation de cette expression liée au phénomène d'épissage alternatif. Les fichiers et les outils de CleanEx sont accessibles en ligne (http://www.cleanex.isb-sib.ch/). Abstract: The automatic genome sequencing and annotation, as well as the large-scale gene expression measurements methods, generate a massive amount of data for model organisms. Searching for genespecific or organism-specific information througout all the different databases has become a very difficult task, and often results in fragmented and unrelated answers. The generation of a database which will federate and integrate genomic and transcriptomic data together will greatly improve the search speed as well as the quality of the results by allowing a direct comparison of expression results obtained by different techniques. The main goal of this project, called the CleanEx database, is thus to provide access to public gene expression data via unique gene names and to represent heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and crossdataset comparisons. A consistent and uptodate gene nomenclature is achieved by associating each single gene expression experiment with a permanent target identifier consisting of a physical description of the targeted RNA population or the hybridization reagent used. These targets are then mapped at regular intervals to the growing and evolving catalogues of genes from model organisms, such as human and mouse. The completely automatic mapping procedure relies partly on external genome information resources such as UniGene and RefSeq. The central part of CleanEx is a weekly built gene index containing crossreferences to all public expression data already incorporated into the system. In addition, the expression target database of CleanEx provides gene mapping and quality control information for various types of experimental resources, such as cDNA clones or Affymetrix probe sets. The Affymetrix mapping files are accessible as text files, for further use in external applications, and as individual entries, via the webbased interfaces . The CleanEx webbased query interfaces offer access to individual entries via text string searches or quantitative expression criteria, as well as crossdataset analysis tools, and crosschip gene comparison. These tools have proven to be very efficient in expression data comparison and even, to a certain extent, in detection of differentially expressed splice variants. The CleanEx flat files and tools are available online at: http://www.cleanex.isbsib. ch/.
Resumo:
Using free text and controlled vocabulary in Medline and CINAHL
Resumo:
We re-evaluated the larval support for families within majoids using the Wilcoxon signed-rank test with emphasis on Inachoididae. To accomplish our objectives, we added 10 new taxa, two of which are traditionally assigned to the family of special interest, to a previous larval database for majoids, and re-appraised the larval characters used in earlier studies. Phylogenetic analysis was performed with PAUP* using the heuristic search with 50 replicates or the branch-and-bound algorithm when possible. Multi-state transformation series were considered unordered; initially characters were equally weighted followed by successive weighting, and trees were rooted at the Oregoniidae node. Ten different topological constraints were enforced for families to evaluate tree length under the assumption of monophyly for each taxonomic entity. Our results showed that the tree length of most constrained topologies was not considerably greater than that of unconstrained analysis in which most families nested as paraphyletic taxa. This may indicate that the present larval database does not provide strong support for paraphyly of the taxa in question. For Inachoididae, although the Wilcoxon signed-rank test rejected a significant difference between unconstrained and constrained cladograms, we were unable to provide a single synapomorphy for this clade. Except for the conflicting position of Leurocyclus and Stenorhynchus, the two clades correspond to the traditional taxonomic arrangement. Among inachoidids, the clade (Anasimus (Paradasygyius (Collodes + Pyromaia))) is supported, whereas for inachids, the clade (Inachus (Macropodia + Achaeus)) is one of the most supported clades within majids. As often stated, only additional characters will provide a better test for the monophyly of Inachoididae and other families within Majoidea.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
We report a morphology-based approach for the automatic identification of outlier neurons, as well as its application to the NeuroMorpho.org database, with more than 5,000 neurons. Each neuron in a given analysis is represented by a feature vector composed of 20 measurements, which are then projected into a two-dimensional space by applying principal component analysis. Bivariate kernel density estimation is then used to obtain the probability distribution for the group of cells, so that the cells with highest probabilities are understood as archetypes while those with the smallest probabilities are classified as outliers. The potential of the methodology is illustrated in several cases involving uniform cell types as well as cell types for specific animal species. The results provide insights regarding the distribution of cells, yielding single and multi-variate clusters, and they suggest that outlier cells tend to be more planar and tortuous. The proposed methodology can be used in several situations involving one or more categories of cells, as well as for detection of new categories and possible artifacts.
Resumo:
Partendo dal campione di AGN presente nella survey di XMM-COSMOS, abbiamo cercato la sua controparte ottica nel database DR10 della Sloan Digital Sky Survey (SDSS), ed il match ha portato ad una selezione di 200 oggetti, tra cui stelle, galassie e quasar. A partire da questo campione, abbiamo selezionato tutti gli oggetti con un redshift z<0.86 per limitare l’analisi agli AGN di tipo 2, quindi siamo giunti alla selezione finale di un campione di 30 sorgenti. L’analisi spettrale è stata fatta tramite il task SPECFIT, presente in IRAF. Abbiamo creato due tipi di modelli: nel primo abbiamo considerato un’unica componente per ogni riga di emissione, nel secondo invece è stata introdotta un’ulteriore com- ponente limitando la FWHM della prima ad un valore inferiore a 500 km\s. Le righe di emissione di cui abbiamo creato un modello sono le seguenti: Hβ, [NII]λλ 6548,6581, Hα, [SII]λλ 6716,6731 e [OIII]λλ 4959,5007. Nei modelli costruiti abbiamo tenuto conto della fisica atomica per quel che riguarda i rapporti dei flussi teorici dei doppietti dell’azoto e dell’ossigeno, fissandoli a 1:3 per entrambi; nel caso del modello ad una componente abbiamo fissato le FWHM delle righe di emissione; mentre nel caso a due componenti abbiamo fissato le FWHM delle componenti strette e larghe, separatamente. Tenendo conto del chi-quadro ottenuto da ogni fit e dei residui, è stato possibile scegliere tra i due modelli per ogni sorgente. Considerato che la nostra attenzione è focalizzata sulla cinematica dell’ossigeno, abbiamo preso in considerazione solo le sorgenti i cui spettri mostravano la riga suddetta, cioè 25 oggetti. Su questa riga è stata fatta un’analisi non parametrica in modo da utilizzare il metodo proposto da Harrison et al. (2014) per caratterizzare il profilo di riga. Sono state determinate quantità utili come il 2 e il 98 percentili, corrispondenti alle velocità massime proiettate del flusso di materia, e l’ampiezza di riga contenente l’80% dell’emissione. Per indagare sull’eventuale ruolo che ha l’AGN nel guidare questi flussi di materia verso l’esterno, abbiamo calcolato la massa del gas ionizzato presente nel flusso e il tasso di energia cinetica, tenendo conto solo delle componenti larghe della riga di [OIII] λ5007. Per la caratterizzazione energetica abbiamo considerato l’approccio di Cano-Diaz et al (2012) e di Heckman (1990) in modo da poter ottenere un limite inferiore e superiore della potenza cinetica, adottando una media geometrica tra questi due come valore indicativo dell’energetica coinvolta. Confrontando la potenza del flusso di gas con la luminosità bolometrica dell’AGN, si è trovato che l’energia cinetica del flusso di gas è circa lo 0.3-30% della luminosità dell’AGN, consistente con i modelli che considerano l’AGN come principale responsabile nel guidare questi flussi di gas.
Resumo:
In this thesis, the author presents a query language for an RDF (Resource Description Framework) database and discusses its applications in the context of the HELM project (the Hypertextual Electronic Library of Mathematics). This language aims at meeting the main requirements coming from the RDF community. in particular it includes: a human readable textual syntax and a machine-processable XML (Extensible Markup Language) syntax both for queries and for query results, a rigorously exposed formal semantics, a graph-oriented RDF data access model capable of exploring an entire RDF graph (including both RDF Models and RDF Schemata), a full set of Boolean operators to compose the query constraints, fully customizable and highly structured query results having a 4-dimensional geometry, some constructions taken from ordinary programming languages that simplify the formulation of complex queries. The HELM project aims at integrating the modern tools for the automation of formal reasoning with the most recent electronic publishing technologies, in order create and maintain a hypertextual, distributed virtual library of formal mathematical knowledge. In the spirit of the Semantic Web, the documents of this library include RDF metadata describing their structure and content in a machine-understandable form. Using the author's query engine, HELM exploits this information to implement some functionalities allowing the interactive and automatic retrieval of documents on the basis of content-aware requests that take into account the mathematical nature of these documents.
Resumo:
In the late 19th century, F.A. FOREL led investigations of the Rhone River delta area of Lake Geneva that resulted in the dis- covery of a textbook example of a river-fed delta system containing impressive subaquatic channels. Well ahead of the marine counterparts, scientific observations and interpretations of water currents shaping the delta edifice for the first time documented how underflow currents carry cold, suspension-laden waters from the river mouth all the way to the deep basin. These early investigations of the Rhone delta laid the basis for follow-up studies in the 20th and 21th centuries. Sediment coring, water-column measurements, manned submersible diving, seismic reflection profiling and bathymetric sur- veying eventually provided a rich database to unravel the key erosional and depositional processes, further documenting the impact of human-induced changes in the catchment. With the merging of old and new scientific knowledge, today a comprehensive understanding prevails of how a delta changes through time, how its channels are formed, and what potential natural hazards may be related to its evolution. New and efficient bathymetric techniques, paired with novel coring operations, provided a time-series of morphologic evolution showing and quantifying the high dynamics of the delta/channel evolution in an unprecedented temporal and spatial reso- lution. Future investigations will continue to further quantify these dynamic processes and to link the evolution of the subaquatic domain with changes and processes in the catchment and with natural hazards. Its size, easy access, and large variety of states and processes will continue to make the Rhone delta area a perfect ‘laboratory’ in which general processes can be studied that could be upscaled or downscaled to other marine and lacustrine deltas.
Resumo:
Background Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures). Results Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. Conclusions 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch webcite and should provide useful assistance to drug discovery projects.
Resumo:
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
Resumo:
Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively.
Resumo:
A new thermodynamic database for normal and modified nucleic acids has been developed. This Thermodynamic Database for Nucleic Acids (NTDB) includes sequence, structure and thermodynamic information as well as experimental methods and conditions. In this release, there are 1851 sequences containing both normal and modified nucleic acids. A user-friendly web-based interface has been developed to allow data searching under different conditions. Useful thermodynamic tools for the study of nucleic acids have been collected and linked for easy usage. NTDB is available at http://ntdb.chem.cuhk.edu.hk.
Resumo:
There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/
Resumo:
CyBase is a curated database and information source for backbone-cyclized proteins. The database incorporates naturally occurring cyclic proteins as well as synthetic derivatives, grafted analogues and acyclic permutants. The database provides a centralized repository of information on all aspects of cyclic protein biology and addresses issues pertaining to the management and searching of topologically circular sequences. The database is freely available at http://research.imb.uq.edu.au/cybase.