28 resultados para Genomic data integration

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’anàlisi de l’efecte dels gens i els factors ambientals en el desenvolupament de malalties complexes és un gran repte estadístic i computacional. Entre les diverses metodologies de mineria de dades que s’han proposat per a l’anàlisi d’interaccions una de les més populars és el mètode Multifactor Dimensionality Reduction, MDR, (Ritchie i al. 2001). L’estratègia d’aquest mètode és reduir la dimensió multifactorial a u mitjançant l’agrupació dels diferents genotips en dos grups de risc: alt i baix. Tot i la seva utilitat demostrada, el mètode MDR té alguns inconvenients entre els quals l’agrupació excessiva de genotips pot fer que algunes interaccions importants no siguin detectades i que no permet ajustar per efectes principals ni per variables confusores. En aquest article il•lustrem les limitacions de l’estratègia MDR i d’altres aproximacions no paramètriques i demostrem la conveniència d’utilitzar metodologies parametriques per analitzar interaccions en estudis cas-control on es requereix l’ajust per variables confusores i per efectes principals. Proposem una nova metodologia, una versió paramètrica del mètode MDR, que anomenem Model-Based Multifactor Dimensionality Reduction (MB-MDR). La metodologia proposada té com a objectiu la identificació de genotips específics que estiguin associats a la malaltia i permet ajustar per efectes marginals i variables confusores. La nova metodologia s’il•lustra amb dades de l’Estudi Espanyol de Cancer de Bufeta.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A newspaper content management system has to deal with a very heterogeneous information space as the experience in the Diari Segre newspaper has shown us. The greatest problem is to harmonise the different ways the involved users (journalist, archivists...) structure the newspaper information space, i.e. news, topics, headlines, etc. Our approach is based on ontology and differentiated universes of discourse (UoD). Users interact with the system and, from this interaction, integration rules are derived. These rules are based on Description Logic ontological relations for subsumption and equivalence. They relate the different UoD and produce a shared conceptualisation of the newspaper information domain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

About 50% of living species are holometabolan insects. Therefore, unraveling the ori- gin of insect metamorphosis from the hemimetabolan (gradual metamorphosis) to the holometabolan (sudden metamorphosis at the end of the life cycle) mode is equivalent to explaining how all this biodiversity originated. One of the problems with studying the evolution from hemimetaboly to holometaboly is that most information is available only in holometabolan species. Within the hemimetabolan group, our model, the cock- roach Blattella germanica, is the most studied species. However, given that the study of adult morphogenesis at organismic level is still complex, we focused on the study of the tergal gland (TG) as a minimal model of metamorphosis. The TG is formed in tergites 7 and 8 (T7-8) in the last days of the last nymphal instar (nymph 6). The comparative study of four T7-T8 transcriptomes provided us with crucial keys of TG formation, but also essential information about the mechanisms and circuitry that allows the shift from nymphal to adult morphogenesis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La recent revolució en les tècniques de generació de dades genòmiques ha portat a una situació de creixement exponencial de la quantitat de dades generades i fa més necessari que mai el treball en la optimització de la gestió i maneig d'aquesta informació. En aquest treball s'han atacat tres vessants del problema: la disseminació de la informació, la integració de dades de diverses fonts i finalment la seva visualització. Basant-nos en el Sistema d'Anotacions Distribuides, DAS, hem creat un aplicatiu per a la creació automatitzada de noves fonts de dades en format estandaritzat i accessible programàticament a partir de fitxers de dades simples. Aquest progrtamari, easyDAS, està en funcionament a l'Institut Europeu de Bioinformàtica. Aquest sistema facilita i encoratja la compartició i disseminació de dades genòmiques en formats usables. jsDAS és una llibreria client de DAS que permet incorporar dades DAS en qualsevol aplicatiu web de manera senzilla i ràpida. Aprofitant els avantatges que ofereix DAS és capaç d'integrar dades de múltiples fonts de manera coherent i robusta. GenExp és el prototip de navegador genòmic basat en web altament interactiu i que facilita l'exploració dels genomes en temps real. És capaç d'integrar dades de quansevol font DAS i crear-ne una representació en client usant els últims avenços en tecnologies web.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The present paper advocates for the creation of a federated, hybrid database in the cloud, integrating law data from all available public sources in one single open access system - adding, in the process, relevant meta-data to the indexed documents, including the identification of social and semantic entities and the relationships between them, using linked open data techniques and standards such as RDF. Examples of potential benefits and applications of this approach are also provided, including, among others, experiences from of our previous research, in which data integration, graph databases and social and semantic networks analysis were used to identify power relations, litigation dynamics and cross-references patterns both intra and inter-institutionally, covering most of the World international economic courts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En la presente memoria se detallan con exactitud los pasos y procesos realizados para construir una aplicación que posibilite el cruce de datos genéticos a partir de información contenida en bases de datos remotas. Desarrolla un estudio en profundidad del contenido y estructura de las bases de datos remotas del NCBI y del KEGG, documentando una minería de datos con el objetivo de extraer de ellas la información necesaria para desarrollar la aplicación de cruce de datos genéticos. Finalmente se establecen los programas, scripts y entornos gráficos que han sido implementados para la construcción y posterior puesta en marcha de la aplicación que proporciona la funcionalidad de cruce de la que es objeto este proyecto fin de carrera.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este proyecto está orientado a ayudar a un grupo de investigadores del Departamento de Ciencia Animal y de los Alimentos (UAB), que se dedican a recopilar datos genómicos obtenidos en experimentos. La aplicación constará de dos partes. La primera es la parte de los usuarios, donde se podrá crear proyectos, insertar, modificar o eliminar datos y consultar la información ya existente en la aplicación. La segunda parte es la del administrador, que como tal podrá dar de alta a nuevos usuarios, restaurar versiones anteriores de la base de datos, eliminar usuarios y consultar las acciones realizadas por los usuarios.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El objetivo principal del TFC es la construcción y explotación de un almacén de datos. El proceso de trabajo se basa en la ejecución de un caso práctico, en el cual se presenta un escenario en el que se necesita desarrollar un almacén de datos para la Fundació d'Estudis per a la Conducció Responsable, la cual desea estudiar la evolución del número de desplazamientos en vehículo de motor en Cataluña así como analizar las posibles correlaciones entre medios de locomoción, perfiles de conductores y algunas variables de seguridad vial.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Report for the scientific sojourn carried out at the University of New South Wales from February to June the 2007. Two different biogeochemical models are coupled to a three dimensional configuration of the Princeton Ocean Model (POM) for the Northwestern Mediterranean Sea (Ahumada and Cruzado, 2007). The first biogeochemical model (BLANES) is the three-dimensional version of the model described by Bahamon and Cruzado (2003) and computes the nitrogen fluxes through six compartments using semi-empirical descriptions of biological processes. The second biogeochemical model (BIOMEC) is the biomechanical NPZD model described in Baird et al. (2004), which uses a combination of physiological and physical descriptions to quantify the rates of planktonic interactions. Physical descriptions include, for example, the diffusion of nutrients to phytoplankton cells and the encounter rate of predators and prey. The link between physical and biogeochemical processes in both models is expressed by the advection-diffusion of the non-conservative tracers. The similarities in the mathematical formulation of the biogeochemical processes in the two models are exploited to determine the parameter set for the biomechanical model that best fits the parameter set used in the first model. Three years of integration have been carried out for each model to reach the so called perpetual year run for biogeochemical conditions. Outputs from both models are averaged monthly and then compared to remote sensing images obtained from sensor MERIS for chlorophyll.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article presents recent WMR (wheeled mobile robot) navigation experiences using local perception knowledge provided by monocular and odometer systems. A local narrow perception horizon is used to plan safety trajectories towards the objective. Therefore, monocular data are proposed as a way to obtain real time local information by building two dimensional occupancy grids through a time integration of the frames. The path planning is accomplished by using attraction potential fields, while the trajectory tracking is performed by using model predictive control techniques. The results are faced to indoor situations by using the lab available platform consisting in a differential driven mobile robot

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Studies of large sets of SNP data have proven to be a powerful tool in the analysis of the genetic structure of human populations. In this work, we analyze genotyping data for 2,841 SNPs in 12 Sub-Saharan African populations, including a previously unsampled region of south-eastern Africa (Mozambique). We show that robust results in a world-wide perspective can be obtained when analyzing only 1,000 SNPs. Our main results both confirm the results of previous studies, and show new and interesting features in Sub-Saharan African genetic complexity. There is a strong differentiation of Nilo-Saharans, much beyond what would be expected by geography. Hunter-gatherer populations (Khoisan and Pygmies) show a clear distinctiveness with very intrinsic Pygmy (and not only Khoisan) genetic features. Populations of the West Africa present an unexpected similarity among them, possibly the result of a population expansion. Finally, we find a strong differentiation of the south-eastern Bantu population from Mozambique, which suggests an assimilation of a pre-Bantu substrate by Bantu speakers in the region.