952 resultados para BIOINFORMATICS DATABASES
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
This paper will propose that, rather than sitting on silos of data, historians that utilise quantitative methods should endeavour to make their data accessible through databases, and treat this as a new form of bibliographic entry. Of course in many instances historical data does not lend itself easily to the creation of such data sets. With this in mind some of the issues regarding normalising raw historical data will be looked at with reference to current work on nineteenth century Irish trade. These issues encompass (but are not limited to) measurement systems, geographic locations, and potential problems that may arise in attempting to unify disaggregated sources. It will discuss the need for a concerted effort by historians to define what is required from digital resources for them to be considered accurate, and to what extent the normalisation requirements for database systems may conflict with the desire for accuracy. Many of the issues that the historian may encounter engaging with databases will be common to all historians, and there would be merit in having defined standards for referencing items, such as people, places, locations, and measurements.
Resumo:
This paper presents a distributed hierarchical multiagent architecture for detecting SQL injection attacks against databases. It uses a novel strategy, which is supported by a Case-Based Reasoning mechanism, which provides to the classifier agents with a great capacity of learning and adaptation to face this type of attack. The architecture combines strategies of intrusion detection systems such as misuse detection and anomaly detection. It has been tested and the results are presented in this paper.
Resumo:
The world of Computational Biology and Bioinformatics presently integrates many different expertise, including computer science and electronic engineering. A major aim in Data Science is the development and tuning of specific computational approaches to interpret the complexity of Biology. Molecular biologists and medical doctors heavily rely on an interdisciplinary expert capable of understanding the biological background to apply algorithms for finding optimal solutions to their problems. With this problem-solving orientation, I was involved in two basic research fields: Cancer Genomics and Enzyme Proteomics. For this reason, what I developed and implemented can be considered a general effort to help data analysis both in Cancer Genomics and in Enzyme Proteomics, focusing on enzymes which catalyse all the biochemical reactions in cells. Specifically, as to Cancer Genomics I contributed to the characterization of intratumoral immune microenvironment in gastrointestinal stromal tumours (GISTs) correlating immune cell population levels with tumour subtypes. I was involved in the setup of strategies for the evaluation and standardization of different approaches for fusion transcript detection in sarcomas that can be applied in routine diagnostic. This was part of a coordinated effort of the Sarcoma working group of "Alleanza Contro il Cancro". As to Enzyme Proteomics, I generated a derived database collecting all the human proteins and enzymes which are known to be associated to genetic disease. I curated the data search in freely available databases such as PDB, UniProt, Humsavar, Clinvar and I was responsible of searching, updating, and handling the information content, and computing statistics. I also developed a web server, BENZ, which allows researchers to annotate an enzyme sequence with the corresponding Enzyme Commission number, the important feature fully describing the catalysed reaction. More to this, I greatly contributed to the characterization of the enzyme-genetic disease association, for a better classification of the metabolic genetic diseases.
Resumo:
In the brain, mutations in SLC25A12 gene encoding AGC1 cause an ultra-rare genetic disease reported as a developmental and epileptic encephalopathy associated with global cerebral hypomyelination. Symptoms of the disease include diffused hypomyelination, arrested psychomotor development, severe hypotonia, seizures and are common to other neurological and developmental disorders. Amongst the biological components believed to be most affected by AGC1 deficiency are oligodendrocytes, glial cells responsible for myelination. Recent studies (Poeta et al, 2022) have also shown how altered levels of transcription factors and epigenetic modifications greatly affect proliferation and differentiation in oligodendrocyte precursor cells (OPCs). In this study we explore the transcriptomic landscape of Agc1 in two different system models: OPCs silenced for Agc1 and iPSCs from human patients differentiated to neural progenitors. Analyses range from differential expression analysis, alternative splicing, master regulator analysis. ATAC-seq results on OPCs were integrated with results from RNA-Seq to assess the activity of a TF based on the accessibility data from its putative targets, which allows to integrate RNA-Seq data to infer their role as either activators or repressors. All the findings for this model were also integrated with early data from iPSCs RNA-seq results, looking for possible commonalities between the two different system models, among which we find a downregulation in genes encoding for SREBP, a transcription factor regulating fatty acids biosynthesis, a key process for myelination which could explain the hypomyelinated state of patients. We also find that in both systems cells tend to form more neurites, likely losing their ability to differentiate, considering their progenitor state. We also report several alterations in the chromatin state of cells lacking Agc1, which confirms the hypothesis for which Agc1 is not a disease restricted only to metabolic alterations in the cells, but there is a profound shift of the regulatory state of these cells.
Resumo:
Xanthomonas citri subsp. citri (X. citri) is the causative agent of the citrus canker, a disease that affects several citrus plants in Brazil and across the world. Although many studies have demonstrated the importance of genes for infection and pathogenesis in this bacterium, there are no data related to phosphate uptake and assimilation pathways. To identify the proteins that are involved in the phosphate response, we performed a proteomic analysis of X. citri extracts after growth in three culture media with different phosphate concentrations. Using mass spectrometry and bioinformatics analysis, we showed that X. citri conserved orthologous genes from Pho regulon in Escherichia coli, including the two-component system PhoR/PhoB, ATP binding cassette (ABC transporter) Pst for phosphate uptake, and the alkaline phosphatase PhoA. Analysis performed under phosphate starvation provided evidence of the relevance of the Pst system for phosphate uptake, as well as both periplasmic binding proteins, PhoX and PstS, which were formed in high abundance. The results from this study are the first evidence of the Pho regulon activation in X. citri and bring new insights for studies related to the bacterial metabolism and physiology. Biological significance Using proteomics and bioinformatics analysis we showed for the first time that the phytopathogenic bacterium X. citri conserves a set of proteins that belong to the Pho regulon, which are induced during phosphate starvation. The most relevant in terms of conservation and up-regulation were the periplasmic-binding proteins PstS and PhoX from the ABC transporter PstSBAC for phosphate, the two-component system composed by PhoR/PhoB and the alkaline phosphatase PhoA.
Resumo:
Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.
Resumo:
The objective of this study was to review the growth curves for Turner syndrome, evaluate the methodological and statistical quality, and suggest potential growth curves for clinical practice guidelines. The search was carried out in the databases Medline and Embase. Of 1006 references identified, 15 were included. Studies constructed curves for weight, height, weight/height, body mass index, head circumference, height velocity, leg length, and sitting height. The sample ranged between 47 and 1,565 (total = 6,273) girls aged 0 to 24 y, born between 1950 and 2006. The number of measures ranged from 580 to 9,011 (total = 28,915). Most studies showed strengths such as sample size, exclusion of the use of growth hormone and androgen, and analysis of confounding variables. However, the growth curves were restricted to height, lack of information about selection bias, limited distributional properties, and smoothing aspects. In conclusion, we observe the need to construct an international growth reference for girls with Turner syndrome, in order to provide support for clinical practice guidelines.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.
Resumo:
New DNA-based predictive tests for physical characteristics and inference of ancestry are highly informative tools that are being increasingly used in forensic genetic analysis. Two eye colour prediction models: a Bayesian classifier - Snipper and a multinomial logistic regression (MLR) system for the Irisplex assay, have been described for the analysis of unadmixed European populations. Since multiple SNPs in combination contribute in varying degrees to eye colour predictability in Europeans, it is likely that these predictive tests will perform in different ways amongst admixed populations that have European co-ancestry, compared to unadmixed Europeans. In this study we examined 99 individuals from two admixed South American populations comparing eye colour versus ancestry in order to reveal a direct correlation of light eye colour phenotypes with European co-ancestry in admixed individuals. Additionally, eye colour prediction following six prediction models, using varying numbers of SNPs and based on Snipper and MLR, were applied to the study populations. Furthermore, patterns of eye colour prediction have been inferred for a set of publicly available admixed and globally distributed populations from the HGDP-CEPH panel and 1000 Genomes databases with a special emphasis on admixed American populations similar to those of the study samples.
Resumo:
Frailty is a syndrome that leads to practical harm in the lives of elders, since it is related to increased risk of dependency, falls, hospitalization, institutionalization, and death. The objective of this systematic review was to identify the socio-demographic, psycho-behavioral, health-related, nutritional, and lifestyle factors associated with frailty in the elderly. A total of 4,183 studies published from 2001 to 2013 were detected in the databases, and 182 complete articles were selected. After a comprehensive reading and application of selection criteria, 35 eligible articles remained for analysis. The main factors associated with frailty were: age, female gender, black race/color, schooling, income, cardiovascular diseases, number of comorbidities/diseases, functional incapacity, poor self-rated health, depressive symptoms, cognitive function, body mass index, smoking, and alcohol use. Knowledge of the complexity of determinants of frailty can assist the formulation of measures for prevention and early intervention, thereby contributing to better quality of life for the elderly.
Resumo:
The overall prevalence of infertility was estimated to be 3.5-16.7% in developing countries and 6.9-9.3% in developed countries. Furthermore, according to reports from some regions of sub-Saharan Africa, the prevalence rate is 30-40%. The consequences of infertility and how it affects the lives of women in poor-resource settings, particularly in developing countries, has become an important issue to be discussed in reproductive health. In some societies, the inability to fulfill the desire to have children makes life difficult for the infertile couple. In many regions, infertility is considered a tragedy that affects not only the infertile couple or woman, but the entire family. This is a position paper which encompasses a review of the needs of low-income infertile couples, mainly those living in developing countries, regarding access to infertility care, including ART and initiatives to provide ART at low or affordable cost. Information was gathered from the databases MEDLINE, CENTRAL, POPLINE, EMBASE, LILACS, and ICTRP with the key words: infertility, low income, assisted reproductive technologies, affordable cost, low cost. There are few initiatives geared toward implementing ART procedures at low cost or at least at affordable cost in low-income populations. Nevertheless, from recent studies, possibilities have emerged for new low-cost initiatives that can help millions of couples to achieve the desire of having a biological child. It is necessary for healthcare professionals and policymakers to take into account these new initiatives in order to implement ART in resource-constrained settings.
Resumo:
The objective of this study was to analyze the prevalence of diabetes in older people and the adopted control measures. Data regarding older diabetic individuals who participated in the Health Surveys conducted in the Municipality of Sao Paulo, SP, ISA-Capital, in 2003 and 2008, which were cross-sectional studies, were analyzed. Prevalences and confidence intervals were compared between 2003 and 2008, according to sociodemographic variables. The combination of the databases was performed when the confidence intervals overlapped. The Chi-square (level of significance of 5%) and the Pearson's Chi-square (Rao-Scott) tests were performed. The variables without overlap between the confidence intervals were not tested. The age of the older adults was 60-69 years. The majority were women, Caucasian, with an income of between > 0.5 and 2.5 times the minimum salary and low levels of schooling. The prevalence of diabetes was 17.6% (95%CI 14.9;20.6) in 2003 and 20.1% (95%CI 17.3;23.1) in 2008, which indicates a growth over this period (p at the limit of significance). The most prevalent measure adopted by the older adults to control diabetes was hypoglycemic agents, followed by diet. Physical activity was not frequent, despite the significant differences observed between 2003 and 2008 results. The use of public health services to control diabetes was significantly higher in older individuals with lower income and lower levels of education. Diabetes is a complex and challenging disease for patients and the health systems. Measures that encourage health promotion practices are necessary because they presented a smaller proportion than the use of hypoglycemic agents. Public health policies should be implemented, and aimed mainly at older individuals with low income and schooling levels. These changes are essential to improve the health condition of older diabetic patients.
Resumo:
Cyclosporine, a drug used in immunosuppression protocols for hematopoietic stem cell transplantation that has a narrow therapeutic index, may cause various adverse reactions, including nephrotoxicity. This has a direct clinical impact on the patient. This study aims to summarize available evidence in the scientific literature on the use of cyclosporine in respect to its risk factor for the development of nephrotoxicity in patients submitted to hematopoietic stem cell transplantation. A systematic review was made with the following electronic databases: PubMed, Web of Science, Embase, Scopus, CINAHL, LILACS, SciELO and Cochrane BVS. The keywords used were: bone marrow transplantation OR stem cell transplantation OR grafting, bone marrow AND cyclosporine OR cyclosporin OR risk factors AND acute kidney injury OR acute kidney injuries OR acute renal failure OR acute renal failures OR nephrotoxicity. The level of scientific evidence of the studies was classified according to the Oxford Centre for Evidence Based Medicine. The final sample was composed of 19 studies, most of which (89.5%) had an observational design, evidence level 2B and pointed to an incidence of nephrotoxicity above 30%. The available evidence, considered as good quality and appropriate for the analyzed event, indicates that cyclosporine represents a risk factor for the occurrence of nephrotoxicity, particularly when combined with amphotericin B or aminoglycosides, agents commonly used in hematopoietic stem cell transplantation recipients.