982 resultados para Biology, Bioinformatics|Computer Science
Resumo:
The world of Computational Biology and Bioinformatics presently integrates many different expertise, including computer science and electronic engineering. A major aim in Data Science is the development and tuning of specific computational approaches to interpret the complexity of Biology. Molecular biologists and medical doctors heavily rely on an interdisciplinary expert capable of understanding the biological background to apply algorithms for finding optimal solutions to their problems. With this problem-solving orientation, I was involved in two basic research fields: Cancer Genomics and Enzyme Proteomics. For this reason, what I developed and implemented can be considered a general effort to help data analysis both in Cancer Genomics and in Enzyme Proteomics, focusing on enzymes which catalyse all the biochemical reactions in cells. Specifically, as to Cancer Genomics I contributed to the characterization of intratumoral immune microenvironment in gastrointestinal stromal tumours (GISTs) correlating immune cell population levels with tumour subtypes. I was involved in the setup of strategies for the evaluation and standardization of different approaches for fusion transcript detection in sarcomas that can be applied in routine diagnostic. This was part of a coordinated effort of the Sarcoma working group of "Alleanza Contro il Cancro". As to Enzyme Proteomics, I generated a derived database collecting all the human proteins and enzymes which are known to be associated to genetic disease. I curated the data search in freely available databases such as PDB, UniProt, Humsavar, Clinvar and I was responsible of searching, updating, and handling the information content, and computing statistics. I also developed a web server, BENZ, which allows researchers to annotate an enzyme sequence with the corresponding Enzyme Commission number, the important feature fully describing the catalysed reaction. More to this, I greatly contributed to the characterization of the enzyme-genetic disease association, for a better classification of the metabolic genetic diseases.
Resumo:
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
Resumo:
Diabetes mellitus (DM) is a disease that affects a large number of people, and the number of problems associated with the disease has been increasing in the past few decades. These problems include cardiovascular disorders, blindness and the eventual need to amputate limbs. Therefore, the quality of life for people living with DM is less than it is for healthy people. In several cases, metabolic syndrome (MS), which can be considered a disturbance of the lipid metabolism, is associated with DM. In this work, two drugs used to treat DM, pioglitazone and rosiglitazone, were studied using theoretical methods, and their molecular properties were related to the biological activity of these drugs. From the results, it was possible to correlate the properties of each substance-particularly electronic properties-with the biological interactions that are linked to their pharmacological effects. These results suggest that there are future prospects for designing or developing new drugs based on the correlation between theoretical and experimental properties.
Resumo:
A study on the possible sites of oxidation and epoxidation of nortriptyline was performed using electrochemical and quantum chemical methods; these sites are involved in the biological responses (for example, hepatotoxicity) of nortriptyline and other similar antidepressants. Quantum chemical studies and electrochemical experiments demonstrated that the oxidation and epoxidation sites are located on the apolar region of nortriptyline, which will useful for understanding the molecule`s activity. Also, for the determination of the compound in biological fluids or in pharmaceutical formulations, we propose a useful analytical methodology using a graphite-polyurethane composite electrode, which exhibited the best performance when compared with boron-doped diamond or glassy carbon surfaces.
Resumo:
The TCP/IP architecture was consolidated as a standard to the distributed systems. However, there are several researches and discussions about alternatives to the evolution of this architecture and, in this study area, this work presents the Title Model to contribute with the application needs support by the cross layer ontology use and the horizontal addressing, in a next generation Internet. For a practical viewpoint, is showed the network cost reduction for the distributed programming example, in networks with layer 2 connectivity. To prove the title model enhancement, it is presented the network analysis performed for the message passing interface, sending a vector of integers and returning its sum. By this analysis, it is confirmed that the current proposal allows, in this environment, a reduction of 15,23% over the total network traffic, in bytes.
Resumo:
The increasing adoption of information systems in healthcare has led to a scenario where patient information security is more and more being regarded as a critical issue. Allowing patient information to be in jeopardy may lead to irreparable damage, physically, morally, and socially to the patient, potentially shaking the credibility of the healthcare institution. Medical images play a crucial role in such context, given their importance in diagnosis, treatment, and research. Therefore, it is vital to take measures in order to prevent tampering and determine their provenance. This demands adoption of security mechanisms to assure information integrity and authenticity. There are a number of works done in this field, based on two major approaches: use of metadata and use of watermarking. However, there still are limitations for both approaches that must be properly addressed. This paper presents a new method using cryptographic means to improve trustworthiness of medical images, providing a stronger link between the image and the information on its integrity and authenticity, without compromising image quality to the end user. Use of Digital Imaging and Communications in Medicine structures is also an advantage for ease of development and deployment.
Resumo:
Computer viruses are an important risk to computational systems endangering either corporations of all sizes or personal computers used for domestic applications. Here, classical epidemiological models for disease propagation are adapted to computer networks and, by using simple systems identification techniques a model called SAIC (Susceptible, Antidotal, Infectious, Contaminated) is developed. Real data about computer viruses are used to validate the model. (c) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Histamine is an important biogenic amine, which acts with a group of four G-protein coupled receptors (GPCRs), namely H(1) to H(4) (H(1)R - H(4)R) receptors. The actions of histamine at H(4)R are related to immunological and inflammatory processes, particularly in pathophysiology of asthma, and H(4)R ligands having antagonistic properties could be helpful as antiinflammatory agents. In this work, molecular modeling and QSAR studies of a set of 30 compounds, indole and benzimidazole derivatives, as H(4)R antagonists were performed. The QSAR models were built and optimized using a genetic algorithm function and partial least squares regression (WOLF 5.5 program). The best QSAR model constructed with training set (N = 25) presented the following statistical measures: r (2) = 0.76, q (2) = 0.62, LOF = 0.15, and LSE = 0.07, and was validated using the LNO and y-randomization techniques. Four of five compounds of test set were well predicted by the selected QSAR model, which presented an external prediction power of 80%. These findings can be quite useful to aid the designing of new anti-H(4) compounds with improved biological response.
Resumo:
Phospholipases A(2) (PLA(2)) are enzymes commonly found in snake venoms from Viperidae and Elaphidae families, which are major components thereof. Many plants are used in traditional medicine its active agents against various effects induced by snakebite. This article presents the PLA(2) BthTX-I structure prediction based on homology modeling. In addition, we have performed virtual screening in a large database yielding a set of potential bioactive inhibitors. A flexible docking program was used to investigate the interactions between the receptor and the new ligands. We have performed molecular interaction fields (MIFs) calculations with the phospholipase model. Results confirm the important role of Lys49 for binding ligands and suggest three additional residues as well. We have proposed a theoretically nontoxic, drug-like, and potential novel BthTX-I inhibitor. These calculations have been used to guide the design of novel phospholipase inhibitors as potential lead compounds that may be optimized for future treatment of snakebite victims as well as other human diseases in which PLA(2) enzymes are involved.
Resumo:
Time-averaged conformations of (+/-)-1-[3,4-(methylenedioxy)phenyl]-2-methylaminopropane hydrochloride (MDMA, ""ecstasy"") in D(2)O, and of its free base and trifluoroacetate in CDCl(3), were deduced from their (1)H NMR spectra and used to calculate their conformer distribution. Their rotational potential energy surface (PES) was calculated at the RHF/6-31G(d,p), 133LYP/6-31G(d,p), B3LYP/cc-pVDZ and AM1 levels. Solvent effects were evaluated using the polarizable continuum model. The NMR and theoretical studies showed that, in the free base, the N-methyl group and the ring are preferentially trans. This preference is stronger in the salts and corresponds to the X-ray structure of the hydrochloride. However, the energy barriers separating these forms are very low. The X-ray diffraction crystal structures of the anhydrous salt and its monohydrate differed mainly in the trans or cis relationship of the N-methyl group to the a-methyl, although these two forms interconvert freely in solution. (C) 2007 Elsevier Inc. All rights reserved.
Resumo:
The new technologies for Knowledge Discovery from Databases (KDD) and data mining promise to bring new insights into a voluminous growing amount of biological data. KDD technology is complementary to laboratory experimentation and helps speed up biological research. This article contains an introduction to KDD, a review of data mining tools, and their biological applications. We discuss the domain concepts related to biological data and databases, as well as current KDD and data mining developments in biology.
Resumo:
Peptides that induce and recall T-cell responses are called T-cell epitopes. T-cell epitopes may be useful in a subunit vaccine against malaria. Computer models that simulate peptide binding to MHC are useful for selecting candidate T-cell epitopes since they minimize the number of experiments required for their identification. We applied a combination of computational and immunological strategies to select candidate T-cell epitopes. A total of 86 experimental binding assays were performed in three rounds of identification of HLA-All binding peptides from the six preerythrocytic malaria antigens. Thirty-six peptides were experimentally confirmed as binders. We show that the cyclical refinement of the ANN models results in a significant improvement of the efficiency of identifying potential T-cell epitopes. (C) 2001 by Elsevier Science Inc.
Resumo:
In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.