970 resultados para GENE PREDICTION
Resumo:
MapReduce frameworks such as Hadoop are well suited to handling large sets of data which can be processed separately and independently, with canonical applications in information retrieval and sales record analysis. Rapid advances in sequencing technology have ensured an explosion in the availability of genomic data, with a consequent rise in the importance of large scale comparative genomics, often involving operations and data relationships which deviate from the classical Map Reduce structure. This work examines the application of Hadoop to patterns of this nature, using as our focus a wellestablished workflow for identifying promoters - binding sites for regulatory proteins - Across multiple gene regions and organisms, coupled with the unifying step of assembling these results into a consensus sequence. Our approach demonstrates the utility of Hadoop for problems of this nature, showing how the tyranny of the "dominant decomposition" can be at least partially overcome. It also demonstrates how load balance and the granularity of parallelism can be optimized by pre-processing that splits and reorganizes input files, allowing a wide range of related problems to be brought under the same computational umbrella.
Resumo:
Thraustochytrids have become of considerable industrial and scientific interest in the past decade due to their health benefits. They have been proven to be the principle source in marine and estuarine fish diets with high percentage of long chain (LC) or polyunsaturated fatty acids (PUFA). Therefore, the oil extracted from fish for human document.forms[0].elements[13].select();consumption is rich in PUFA with high omega-3 fatty acid content. Docosahexaenoic acid (DHA) and eicosapentaenoic acid (EPA) of all of the omega-3 fatty acids, are considered beneficial essential oils for humans with a wide range of health benefits. These include brain and neural development in infants, general wellbeing of adults and drug delivery through precursor molecules. They have become one of the most extensively studied organisms for industrial oil preparations as PUFA extraction from fish becomes less profitable. Many forms of these Thraustochytrid oils are being trialled for human consumption all over the world. In Australia, there has been little research performed on these organisms in the past ten years. A few Australian studies have been conducted in the form of comparative studies related to PUFA production within the related genera, but not focussed on their identification or cellular and genomic characterisation. Therefore, the main aim of this study was to investigate the morphological and genetic characteristics of Australian Thraustochytrids in order to aid in their identification and characterisation, as well as to better understand the effect of environmental conditions in the regulation of PUFA production. It was also noted that there was a knowledge gap in the preservation and total genomic DNA extraction of these organisms for the purposes of scientific research. The cryopreservation of these organisms for studies around the world follows existing generic methods. However, it is well understood that many of these generic methods attract not only high costs for chemicals, but also uses considerable storage space and other resources, all of which can be improved with new or modified approaches. In this context, a simple and inexpensive bead preservation method is described, without compromising the storage shelf life. We also describe, for the first time, the effects of culture age on the successful cryopreservation of Thraustochytrids. It was evident in the literature that DNA and RNA extractions for molecular and genetic studies of Thraustochytrids follow the classical phenol-chloroform extraction methods. It was also observed that modern protocols failed to avoid the use of phenol-chloroform rather than improving preparation and cell disruption. In order to provide a high quantity and quality DNA extraction, a modified protocol has been introduced that employs the use of modern commercial extraction kits and standard laboratory equipment. Thraustochytrids have been shown to be highly conserved in their 18S rDNA gene sequences, which is used as the current standard for identification. It was demonstrated that the 18S rDNA gene sequence limits the recognition of closely related genera or within the genera from each member. Therefore, it was proposed that another profile, such as a randomly amplified polymorphic DNA (RAPD) based profiling system, be tested for use in the characterisation of Thraustochytrids. The RAPD profiles were shown to provide a unique DNA fingerprint for each isolate and small variations in their genome were able to be detected. This method involved the use of a minimum number of standard arbitrary primers and with an increase in the number of different primers used, a very high discrimination between organisms could be achieved. However, the method was not suitable for taxonomic purposes because the results did not correlate with other taxonomic features such as morphology. Another knowledge gap was found with respect to Australian Thraustochytrid growth characteristics, in that these had not been recorded and published. In order to rectify this, a record of colony and microscopic features of 12 selected isolates was performed. The results of preliminary studies indicated that further microbiological and biochemical studies are needed for full characterisation of these organisms. This information is of great importance to bio-prospecting of new Thraustochytrids from Australian ecosystems and would allow for their accurate identification, and so permit the prediction of their PUFA capability by comparison with related genera/species. It was well recognized that environmental stress plays a role in the PUFA production and is mainly due to the reactive oxygen species as abiotic stress (Chiou et al., 2001; Okuyama et al., 2008; Shabala et al., 2009; Shabala et al., 2001). In this aspect, this study makes the first attempt towards better understanding of this phenomenon by way of the use of real-time PCR for the detection of environmental effects on the regulation of PUFA production. Three main environmental conditions including temperature, pH and oxygen availability were monitored as stress inducers. In summary, this study provides novel approaches for the preservation and handling of Thraustochytrids, their molecular biological features, taxonomy, characterisation and responses to environmental factors with respect to their oil production enzymes. The information produced from this study will prove to be vital for both industrial and scientific investigations in the future.
Resumo:
Background: Although lentiviral vectors have been widely used for in vitro and in vivo gene therapy researches, there have been few studies systematically examining various conditions that may affect the determination of the number of viable vector particles in a vector preparation and the use of Multiplicity of Infection (MOI) as a parameter for the prediction of gene transfer events. Methods: Lentiviral vectors encoding a marker gene were packaged and supernatants concentrated. The number of viable vector particles was determined by in vitro transduction and fluorescent microscopy and FACs analyses. Various factors that may affect the transduction process, such as vector inoculum volume, target cell number and type, vector decay, variable vector - target cell contact and adsorption periods were studied. MOI between 0-32 was assessed on commonly used cell lines as well as a new cell line. Results: We demonstrated that the resulting values of lentiviral vector titre varied with changes of conditions in the transduction process, including inoculum volume of the vector, the type and number of target cells, vector stability and the length of period of the vector adsorption to target cells. Vector inoculum and the number of target cells determine the frequencies of gene transfer event, although not proportionally. Vector exposure time to target cells also influenced transduction results. Varying these parameters resulted in a greater than 50-fold differences in the vector titre from the same vector stock. Commonly used cell lines in vector titration were less sensitive to lentiviral vector-mediated gene transfer than a new cell line, FRL 19. Within 0-32 of MOI used transducing four different cell lines, the higher the MOI applied, the higher the efficiency of gene transfer obtained. Conclusion: Several variables in the transduction process affected in in vitro vector titration and resulted in vastly different values from the same vector stock, thus complicating the use of MOI for predicting gene transfer events. Commonly used target cell lines underestimated vector titre. However, within a certain range of MOI, it is possible that, if strictly controlled conditions are observed in the vector titration process, including the use of a sensitive cell line, such as FRL 19 for vector titration, lentivector-mediated gene transfer events could be predicted. © 2004 Zhang et al; licensee BioMed Central Ltd.
Resumo:
The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool `PromPredict'. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
Resumo:
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.
Resumo:
Background: A nucleosome is the fundamental repeating unit of the eukaryotic chromosome. It has been shown that the positioning of a majority of nucleosomes is primarily controlled by factors other than the intrinsic preference of the DNA sequence. One of the key questions in this context is the role, if any, that can be played by the variability of nucleosomal DNA structure. Results: In this study, we have addressed this question by analysing the variability at the dinucleotide and trinucleotide as well as longer length scales in a dataset of nucleosome X-ray crystal structures. We observe that the nucleosome structure displays remarkable local level structural versatility within the B-DNA family. The nucleosomal DNA also incorporates a large number of kinks. Conclusions: Based on our results, we propose that the local and global level versatility of B-DNA structure may be a significant factor modulating the formation of nucleosomes in the vicinity of high-plasticity genes, and in varying the probability of binding by regulatory proteins. Hence, these factors should be incorporated in the prediction algorithms and there may not be a unique `template' for predicting putative nucleosome sequences. In addition, the multimodal distribution of dinucleotide parameters for some steps and the presence of a large number of kinks in the nucleosomal DNA structure indicate that the linear elastic model, used by several algorithms to predict the energetic cost of nucleosome formation, may lead to incorrect results.
Resumo:
The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.
Resumo:
Anaplastic astrocytoma (AA; Grade III) and glioblastoma (GBM; Grade IV) are diffusely infiltrating tumors and are called malignant astrocytomas. The treatment regimen and prognosis are distinctly different between anaplastic astrocytoma and glioblastoma patients. Although histopathology based current grading system is well accepted and largely reproducible, intratumoral histologic variations often lead to difficulties in classification of malignant astrocytoma samples. In order to obtain a more robust molecular classifier, we analysed RT-qPCR expression data of 175 differentially regulated genes across astrocytoma using Prediction Analysis of Microarrays (PAM) and found the most discriminatory 16-gene expression signature for the classification of anaplastic astrocytoma and glioblastoma. The 16-gene signature obtained in the training set was validated in the test set with diagnostic accuracy of 89%. Additionally, validation of the 16-gene signature in multiple independent cohorts revealed that the signature predicted anaplastic astrocytoma and glioblastoma samples with accuracy rates of 99%, 88%, and 92% in TCGA, GSE1993 and GSE4422 datasets, respectively. The protein-protein interaction network and pathway analysis suggested that the 16-genes of the signature identified epithelial-mesenchymal transition (EMT) pathway as the most differentially regulated pathway in glioblastoma compared to anaplastic astrocytoma. In addition to identifying 16 gene classification signature, we also demonstrated that genes involved in epithelial-mesenchymal transition may play an important role in distinguishing glioblastoma from anaplastic astrocytoma.
Resumo:
Background: The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. Results: We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of similar to 58% and similar to 40% for localization and functions respectively of proteins were determined at a threshold of similar to 30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k nearest neighbor classifier confirmed that our results compared favorably. Conclusions: This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.
Resumo:
Body mass index (BMI) is a non-invasive measurement of obesity. It is commonly used for assessing adiposity and obesity-related risk prediction. Genetic differences between ethnic groups are important factors, which contribute to the variation in phenotypic effects. India inhabited by the first out-of-Africa human population and the contemporary Indian populations are admixture of two ancestral populations; ancestral north Indians (ANI) and ancestral south Indians (ASI). Although ANI are related to Europeans, ASI are not related to any group outside Indian-subcontinent. Hence, we expect novel genetic loci associated with BMI. In association analysis, we found eight genic SNPs in extreme of distribution (P <= 3.75 x 10(-5)), of which WWOX has already been reported to be associated with obesity-related traits hence excluded from further study. Interestingly, we observed rs1526538, an intronic SNP of THSD7A; a novel gene significantly associated with obesity (P = 2.88 x 10(-5), 8.922 x 10(-6) and 2.504 x 10(-9) in discovery, replication and combined stages, respectively). THSD7A is neural N-glycoprotein, which promotes angiogenesis and it is well known that angiogenesis modulates obesity, adipose metabolism and insulin sensitivity, hence our result find a correlation. This information can be used for drug target, early diagnosis of obesity and treatment.
Resumo:
Circadian clocks are 24-h timing devices that phase cellular responses; coordinate growth, physiology, and metabolism; and anticipate the day-night cycle. Here we report sensitivity of the Arabidopsis thaliana circadian oscillator to sucrose, providing evidence that plant metabolism can regulate circadian function. We found that the Arabidopsis circadian system is particularly sensitive to sucrose in the dark. These data suggest that there is a feedback between the molecular components that comprise the circadian oscillator and plant metabolism, with the circadian clock both regulating and being regulated by metabolism. We used also simulations within a three-loop mathematical model of the Arabidopsis circadian oscillator to identify components of the circadian clock sensitive to sucrose. The mathematical studies identified GIGANTEA (GI) as being associated with sucrose sensing. Experimental validation of this prediction demonstrated that GI is required for the full response of the circadian clock to sucrose. We demonstrate that GI acts as part of the sucrose-signaling network and propose this role permits metabolic input into circadian timing in Arabidopsis.
Resumo:
Estima-se que a prevalência global da população mundial com hepatite C é de 3%. Pouco se sabe sobre a resposta ao tratamento com respeito à resistência viral. Algumas mutações no fragmento de 109 aminoácidos da NS5B são associadas com resistência ao interferon (IFN) e ribavirina (RBV). Estudos moleculares e clínicos identificaram fatores associados com o hospedeiro e vírus relacionados associada com a resposta ao tratamento, tal como o gene que codifica a IL-28B. Este estudo foi dividido em duas fases, cujos objetivos foram caracterizar a frequência de mutações que conferem resistência ao HCV e avaliar a relevância das mutações em pacientes Respondedores (R) ou Não Respondedores (NR) ao tratamento e caracterizar geneticamente as populações sobre polimorfismos genéticos nos SNPs da IL-28B em relação ao prognóstico da resposta ao tratamento. As amostras dos pacientes foram submetidas a testes de genotipagem e carga viral. As sequências geradas foram comparadas no BLAST e no banco de dados Los Alamos HCV. Realizamos o alinhamento das sequências homólogas e as mutações identificadas. Com base no genótipo e carga viral determinamos a classificação dos pacientes de acordo com a resposta à terapia. O DNA genômico foi isolado a partir de sangue periférico para a realização da tipagem de SNPs de IL-28B. A metodologia utilizada foi de PCR em tempo real utilizando sondas TaqMan SNP específico. A análise dos dados foi realizada utilizando GraphPad Prism com qui-quadrado, risco relativo (RR), Odds Ratio (OR) e intervalo de confiança de 95%, com um nível de significância de P <0,05. Foi encontrado na primeira fase deste estudo uma taxa significativa mutações associadas ao tratamento nas amostras estudadas. A prevalência de mutações associadas à resistência ao IFN e RBV bem como a novos medicamentos antivirais localizados no fragmento de 109 aminoácidos da NS5B foi examinado em 69 indivíduos infectados naïve no Rio de Janeiro, Brasil. Na segunda fase, as mutações foram clinicamente relevantes. Desde então, procuramos observar as diferenças entre melhor ou pior prognóstico de acordo com a imunogenética que mostrou diferenciação entre os grupos R e NR ao tratamento em relação ao prognóstico da resposta terapêutica. Quando as diferenças entre as sequências da NS5B e a resposta ao tratamento foram consideradas verificou-se que associada a mutação R254K, estava a C316N que poderia conduzir a uma não resposta à terapia no genótipo 1b. Os nossos dados também suportaram forte associação de IL-28B rs12979860, com elevada probabilidade de resposta à terapia de IFN + RBV. Nossos dados evidenciam a presença de pacientes virgens de tratamento que abrigam mutações de resistência previamente descritas na literatura. A análise dos fatores preditores de resposta virológica mostrou que a predição de boa resposta ou não ao tratamento e ainda da progressão da doença é dependente de uma importante interação entre a genética viral e a do hospedeiro. Fato este importante para que no momento de avaliação de diagnóstico e conduta terapêutica, o médico possa tomar medidas apropriadas para o tratamento de cada paciente individualmente independentemente do genótipo do HCV em questão.
Resumo:
An association of the dopamine receptor D4 (DRD4) gene located on chromosome 11p15.5 and attention deficit/hyperactivity disorder (ADHD) has been demonstrated and replicated by multiple investigators. A specific allele [the 7-repeat of a 48-bp variable number of tandem repeats (VNTR) in exon 3] has been proposed as an etiological factor in attentional deficits manifested in some children diagnosed with this disorder. In the current study, we evaluated ADHD subgroups defined by the presence or absence of the 7-repeat allele of the DRD4 gene, using neuropsychological tests with reaction time measures designed to probe attentional networks with neuroanatomical foci in D4-rich brain regions. Despite the same severity of symptoms on parent and teacher ratings for the ADHD subgroups, the average reaction times of the 7-present subgroup showed normal speed and variability of response whereas the average reaction times of the 7-absent subgroup showed the expected abnormalities (slow and variable responses). This was opposite the primary prediction of the study. The 7-present subgroup seemed to be free of some of the neuropsychological abnormalities thought to characterize ADHD.
Resumo:
m Background: Cross-species nuclear transfer has been shown to be a potent approach to retain the genetic viability of a certain species near extinction. However, most embryos produced by cross-species nuclear transfer were compromised because that they were unable to develop to later stages. Gene expression analysis of cross-species cloned embryos will yield new insights into the regulatory mechanisms involved in cross-species nuclear transfer and embryonic development. Results: A novel gene, K31, was identified as an up-regulated gene in fish cross-subfamily cloned embryos using SSH approach and RACE method. K31 complete cDNA sequence is 1106 base pairs (bp) in length, with a 342 bp open reading frame (ORF) encoding a putative protein of 113 amino acids (aa). Comparative analysis revealed no homologous known gene in zebrafish and other species database. K31 protein contains a putative transmembrane helix and five putative phosphorylation sites but without a signal peptide. Expression pattern analysis by real time RT-PCR and whole-mount in situ hybridization (WISH) shows that it has the characteristics of constitutively expressed gene. Sub-cellular localization assay shows that K31 protein can not penetrate the nuclei. Interestingly, over-expression of K31 gene can cause lethality in the epithelioma papulosum cyprinid (EPC) cells in cell culture, which gave hint to the inefficient reprogramming events occurred in cloned embryos. Conclusion: Taken together, our findings indicated that K31 gene is a novel gene differentially expressed in fish cross-subfamily cloned embryos and over-expression of K31 gene can cause lethality of cultured fish cells. To our knowledge, this is the first report on the determination of novel genes involved in nucleo-cytoplasmic interaction of fish cross-subfamily cloned embryos.
Resumo:
The accurate recognition of cancer subtypes is very significant in clinic. Especially, the DNA microarray gene expression technology is applied to diagnosing and recognizing cancer types. This paper proposed a method of that recognized cancer subtypes based on geometrical learning. Firstly, the cancer genes expression profiles data was pretreated and selected feature genes by conventional method; then the expression data of feature genes in the training samples was construed each convex hull in the high-dimensional space using training algorithm of geometrical learning, while the independent test set was tested by the recognition algorithm of geometrical learning. The method was applied to the human acute leukemia gene expression data. The accuracy rate reached to 100%. The experiments have proved its efficiency and feasibility.