889 resultados para Open Data, Dati Aperti, Open Government Data


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background Several mathematical and statistical methods have been proposed in the last few years to analyze microarray data. Most of those methods involve complicated formulas, and software implementations that require advanced computer programming skills. Researchers from other areas may experience difficulties when they attempting to use those methods in their research. Here we present an user-friendly toolbox which allows large-scale gene expression analysis to be carried out by biomedical researchers with limited programming skills. Results Here, we introduce an user-friendly toolbox called GEDI (Gene Expression Data Interpreter), an extensible, open-source, and freely-available tool that we believe will be useful to a wide range of laboratories, and to researchers with no background in Mathematics and Computer Science, allowing them to analyze their own data by applying both classical and advanced approaches developed and recently published by Fujita et al. Conclusion GEDI is an integrated user-friendly viewer that combines the state of the art SVR, DVAR and SVAR algorithms, previously developed by us. It facilitates the application of SVR, DVAR and SVAR, further than the mathematical formulas present in the corresponding publications, and allows one to better understand the results by means of available visualizations. Both running the statistical methods and visualizing the results are carried out within the graphical user interface, rendering these algorithms accessible to the broad community of researchers in Molecular Biology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease. Results We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox. Conclusion Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the integration of information between Digital Library of Historical Cartography and Bibliographical Database (DEDALUS), both of the University of São Paulo (USP), to guarantee open, public access by Internet to the maps in the collection and make them available to users everywhere. This digital library was designed by Historical Cartography Studies Laboratory team (LECH/USP), and provides maps images on the Web, of high resolution, as well as such information on these maps as technical-scientific data (projection, scale, coordinates), printing techniques and material support that have made their circulation and cultural consumption possible. The Digital Library of Historical Cartography is accessible not only to the historical cartography researchers, but also to students and the general public. Beyond being a source of information about maps, the Digital Library of Historical Cartography seeks to be interactive, exchanging information and seeking dialogue with different branches of knowledge

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Degree in Marine Sciences. Faculty of Marine Sciences, University of Las Palmas de Gran Canaria. Institut de Ciències del Mar, Consejo Superior de Investigaciones Científicas

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To understand a city and its urban structure it is necessary to study its history. This is feasible through GIS (Geographical Information Systems) and its by-products on the web. Starting from a cartographic view they allow an initial understanding of, and a comparison between, present and past data together with an easy and intuitive access to database information. The research done led to the creation of a GIS for the city of Bologna. It is based on varied data such as historical map, vector and alphanumeric historical data, etc.. After providing information about GIS we thought of spreading and sharing the collected data on the Web after studying two solutions available on the market: Web Mapping and WebGIS. In this study we discuss the stages, beginning with the development of Historical GIS of Bologna, which led to the making of a WebGIS Open Source (MapServer and Chameleon) and the Web Mapping services (Google Earth, Google Maps and OpenLayers).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Subduction zones are the favorite places to generate tsunamigenic earthquakes, where friction between oceanic and continental plates causes the occurrence of a strong seismicity. The topics and the methodologies discussed in this thesis are focussed to the understanding of the rupture process of the seismic sources of great earthquakes that generate tsunamis. The tsunamigenesis is controlled by several kinematical characteristic of the parent earthquake, as the focal mechanism, the depth of the rupture, the slip distribution along the fault area and by the mechanical properties of the source zone. Each of these factors plays a fundamental role in the tsunami generation. Therefore, inferring the source parameters of tsunamigenic earthquakes is crucial to understand the generation of the consequent tsunami and so to mitigate the risk along the coasts. The typical way to proceed when we want to gather information regarding the source process is to have recourse to the inversion of geophysical data that are available. Tsunami data, moreover, are useful to constrain the portion of the fault area that extends offshore, generally close to the trench that, on the contrary, other kinds of data are not able to constrain. In this thesis I have discussed the rupture process of some recent tsunamigenic events, as inferred by means of an inverse method. I have presented the 2003 Tokachi-Oki (Japan) earthquake (Mw 8.1). In this study the slip distribution on the fault has been inferred by inverting tsunami waveform, GPS, and bottom-pressure data. The joint inversion of tsunami and geodetic data has revealed a much better constrain for the slip distribution on the fault rather than the separate inversions of single datasets. Then we have studied the earthquake occurred on 2007 in southern Sumatra (Mw 8.4). By inverting several tsunami waveforms, both in the near and in the far field, we have determined the slip distribution and the mean rupture velocity along the causative fault. Since the largest patch of slip was concentrated on the deepest part of the fault, this is the likely reason for the small tsunami waves that followed the earthquake, pointing out how much the depth of the rupture plays a crucial role in controlling the tsunamigenesis. Finally, we have presented a new rupture model for the great 2004 Sumatra earthquake (Mw 9.2). We have performed the joint inversion of tsunami waveform, GPS and satellite altimetry data, to infer the slip distribution, the slip direction, and the rupture velocity on the fault. Furthermore, in this work we have presented a novel method to estimate, in a self-consistent way, the average rigidity of the source zone. The estimation of the source zone rigidity is important since it may play a significant role in the tsunami generation and, particularly for slow earthquakes, a low rigidity value is sometimes necessary to explain how a relatively low seismic moment earthquake may generate significant tsunamis; this latter point may be relevant for explaining the mechanics of the tsunami earthquakes, one of the open issues in present day seismology. The investigation of these tsunamigenic earthquakes has underlined the importance to use a joint inversion of different geophysical data to determine the rupture characteristics. The results shown here have important implications for the implementation of new tsunami warning systems – particularly in the near-field – the improvement of the current ones, and furthermore for the planning of the inundation maps for tsunami-hazard assessment along the coastal area.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Il citofluorimetro è uno strumento impiegato in biologia genetica per analizzare dei campioni cellulari: esso, analizza individualmente le cellule contenute in un campione ed estrae, per ciascuna cellula, una serie di proprietà fisiche, feature, che la descrivono. L’obiettivo di questo lavoro è mettere a punto una metodologia integrata che utilizzi tali informazioni modellando, automatizzando ed estendendo alcune procedure che vengono eseguite oggi manualmente dagli esperti del dominio nell’analisi di alcuni parametri dell’eiaculato. Questo richiede lo sviluppo di tecniche biochimiche per la marcatura delle cellule e tecniche informatiche per analizzare il dato. Il primo passo prevede la realizzazione di un classificatore che, sulla base delle feature delle cellule, classifichi e quindi consenta di isolare le cellule di interesse per un particolare esame. Il secondo prevede l'analisi delle cellule di interesse, estraendo delle feature aggregate che possono essere indicatrici di certe patologie. Il requisito è la generazione di un report esplicativo che illustri, nella maniera più opportuna, le conclusioni raggiunte e che possa fungere da sistema di supporto alle decisioni del medico/biologo.