19 resultados para Data access

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Access to fluoridated water is a known protective factor against dental caries. In 1974, fluoridation of the public water supply became mandatory by law in Brazil, resulting in improved coverage, especially in more developed regions of the country. Coverage increased across the country as a priority under the national oral health policy. This article systematizes information on the implementation and expansion of fluoridation in Sao Paulo State from 1956 to 2009, using secondary data from technical reports, official documents, and the Information System for Surveillance of Water Quality for Human Consumption (SISAGUA). In 2009, fluoridation covered 546 of 645 counties in Sao Paulo State (84.7%), reaching 85.1% of the total population and 93.5% of the population with access to the public water supply. The results indicate that fluoridation has been consolidated as part of State health policy. However, the challenge remains to implement and maintain fluoridation in 99 counties, benefiting 6.2 million inhabitants that are still excluded from this service.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Amazon basin is a region of constant scientific interest due to its environmental importance and its biodiversity and climate on a global scale. The seasonal variations in water volume are one of the examples of topics studied nowadays. In general, the variations in river levels depend primarily on the climate and physics characteristics of the corresponding basins. The main factor which influences the water level in the Amazon Basin is the intensive rainfall over this region as a consequence of the humidity of the tropical climate. Unfortunately, the Amazon basin is an area with lack of water level information due to difficulties in access for local operations. The purpose of this study is to compare and evaluate the Equivalent Water Height (Ewh) from GRACE (Gravity Recovery And Climate Experiment) mission, to study the connection between water loading and vertical variations of the crust due to the hydrologic. In order to achieve this goal, the Ewh is compared with in-situ information from limnimeter. For the analysis it was computed the correlation coefficients, phase and amplitude of GRACE Ewh solutions and in-situ data, as well as the timing of periods of drought in different parts of the basin. The results indicated that vertical variations of the lithosphere due to water mass loading could reach 7 to 5 cm per year, in the sedimentary and flooded areas of the region, where water level variations can reach 10 to 8 m.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives To analyse the perspective of clinical research stakeholders concerning post-trial access to study medication. Methods Questionnaires and informed consents were sent through e-mail to 599 ethics committee (EC) members, 290 clinical investigators (HIV/AIDS and Diabetes) and 53 sponsors in Brazil. Investigators were also asked to submit the questionnaire to their research patients. Two reminders were sent to participants. Results The response rate was 21%, 20% and 45% in EC, investigators and sponsors' groups, respectively. 54 patients answered the questionnaire through their doctors. The least informative item in the consent form was how to obtain the study medication after trial. If a benefit were demonstrated in the study, 60% of research participants and 35% of EC answered that all patients should continue receiving study medication after trial; 43% of investigators believed the medication should be given to participants, and 40% to subjects who participated and benefited from treatment. For 50% of the sponsors, study medication should be assured to participants who had benefited from treatment. The majority of responders answered that medication should be provided free by sponsors; investigators and sponsors believed the medication should be kept until available in the public health sector; EC members said that the patient should keep the benefit; patients answered that benefits should be assured for life. Conclusions Due to the study limitations, the results cannot be generalised; however, the data can contribute to discussion of this complex topic through analysing the views of stakeholders in clinical research in Brazil.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tuberculosis remains a pubic health challenge. Uncountable efforts are made to control the disease, and patient treatment and accessibility to healthcare can hinder reaching a cure. The objective of this article is to analyze the satisfaction of tuberculosis patients regarding tuberculosis control services. This is an epidemiological, prospective study, using both a quantitative and qualitative approach. Data were collected using a semi-structured questionnaire. Participants included 77 patients. The quantitative data were positively evaluated, and the qualitative data permitted an understanding of the patients' experience regarding their accessibility and treatment. Aspects such as the criteria for performing Directly Observed Treatment and the proximity of the healthcare facility to the patients' residence affected their satisfaction, which implies the need to reorganize healthcare services in order to provide more appropriate care to tuberculosis patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Health Department of Sao Paulo, Brazil, has developed a Health Necessities Index (HNI) to identify priority areas for providing health assistance. In 2008, a survey of the status of oral health was conducted. The objective of this ecological study was to analyze the status of oral health in relation to the HNI. The variables, stratified by the age of 5, 12 and 15 years old were: percentage of individuals with difficulty of access to dental care services; DMFT and DMFS; prevalence of the need for tooth extraction and treatment of dental caries. Data were analyzed for the 25 Health Technical Supervision Units (HTS). The Statistical Covariance Test was used as well as the Pearson correlation coefficient and linear regression model. A positive correlation was observed between high scores of the HNI and difficulty of access to services. In the HTS with high scores of HNI a higher incidence of dental caries was observed, a greater need for tooth extractions and low caries-free incidence. In order to improve health conditions of the population it is mandatory to prioritize actions in areas of social deprivation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. Results BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. Conclusion The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Several mathematical and statistical methods have been proposed in the last few years to analyze microarray data. Most of those methods involve complicated formulas, and software implementations that require advanced computer programming skills. Researchers from other areas may experience difficulties when they attempting to use those methods in their research. Here we present an user-friendly toolbox which allows large-scale gene expression analysis to be carried out by biomedical researchers with limited programming skills. Results Here, we introduce an user-friendly toolbox called GEDI (Gene Expression Data Interpreter), an extensible, open-source, and freely-available tool that we believe will be useful to a wide range of laboratories, and to researchers with no background in Mathematics and Computer Science, allowing them to analyze their own data by applying both classical and advanced approaches developed and recently published by Fujita et al. Conclusion GEDI is an integrated user-friendly viewer that combines the state of the art SVR, DVAR and SVAR algorithms, previously developed by us. It facilitates the application of SVR, DVAR and SVAR, further than the mathematical formulas present in the corresponding publications, and allows one to better understand the results by means of available visualizations. Both running the statistical methods and visualizing the results are carried out within the graphical user interface, rendering these algorithms accessible to the broad community of researchers in Molecular Biology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease. Results We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox. Conclusion Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.