964 resultados para GENOMIC SEQUENCE
Resumo:
Modern computer systems are plagued with stability and security problems: applications lose data, web servers are hacked, and systems crash under heavy load. Many of these problems or anomalies arise from rare program behavior caused by attacks or errors. A substantial percentage of the web-based attacks are due to buffer overflows. Many methods have been devised to detect and prevent anomalous situations that arise from buffer overflows. The current state-of-art of anomaly detection systems is relatively primitive and mainly depend on static code checking to take care of buffer overflow attacks. For protection, Stack Guards and I-leap Guards are also used in wide varieties.This dissertation proposes an anomaly detection system, based on frequencies of system calls in the system call trace. System call traces represented as frequency sequences are profiled using sequence sets. A sequence set is identified by the starting sequence and frequencies of specific system calls. The deviations of the current input sequence from the corresponding normal profile in the frequency pattern of system calls is computed and expressed as an anomaly score. A simple Bayesian model is used for an accurate detection.Experimental results are reported which show that frequency of system calls represented using sequence sets, captures the normal behavior of programs under normal conditions of usage. This captured behavior allows the system to detect anomalies with a low rate of false positives. Data are presented which show that Bayesian Network on frequency variations responds effectively to induced buffer overflows. It can also help administrators to detect deviations in program flow introduced due to errors.
Resumo:
This paper discusses our research in developing a generalized and systematic method for anomaly detection. The key ideas are to represent normal program behaviour using system call frequencies and to incorporate probabilistic techniques for classification to detect anomalies and intrusions. Using experiments on the sendmail system call data, we demonstrate that concise and accurate classifiers can be constructed to detect anomalies. An overview of the approach that we have implemented is provided.
Resumo:
Code clones are portions of source code which are similar to the original program code. The presence of code clones is considered as a bad feature of software as the maintenance of software becomes difficult due to the presence of code clones. Methods for code clone detection have gained immense significance in the last few years as they play a significant role in engineering applications such as analysis of program code, program understanding, plagiarism detection, error detection, code compaction and many more similar tasks. Despite of all these facts, several features of code clones if properly utilized can make software development process easier. In this work, we have pointed out such a feature of code clones which highlight the relevance of code clones in test sequence identification. Here program slicing is used in code clone detection. In addition, a classification of code clones is presented and the benefit of using program slicing in code clone detection is also mentioned in this work.
Resumo:
In this thesis, different techniques for image analysis of high density microarrays have been investigated. Most of the existing image analysis techniques require prior knowledge of image specific parameters and direct user intervention for microarray image quantification. The objective of this research work was to develop of a fully automated image analysis method capable of accurately quantifying the intensity information from high density microarrays images. The method should be robust against noise and contaminations that commonly occur in different stages of microarray development.
Resumo:
DNA sequence representation methods are used to denote a gene structure effectively and help in similarities/dissimilarities analysis of coding sequences. Many different kinds of representations have been proposed in the literature. They can be broadly classified into Numerical, Graphical, Geometrical and Hybrid representation methods. DNA structure and function analysis are made easy with graphical and geometrical representation methods since it gives visual representation of a DNA structure. In numerical method, numerical values are assigned to a sequence and digital signal processing methods are used to analyze the sequence. Hybrid approaches are also reported in the literature to analyze DNA sequences. This paper reviews the latest developments in DNA Sequence representation methods. We also present a taxonomy of various methods. A comparison of these methods where ever possible is also done
Resumo:
Considerable research effort has been devoted in predicting the exon regions of genes. The binary indicator (BI), Electron ion interaction pseudo potential (EIIP), Filter method are some of the methods. All these methods make use of the period three behavior of the exon region. Even though the method suggested in this paper is similar to above mentioned methods , it introduces a set of sequences for mapping the nucleotides selected by applying genetic algorithm and found to be more promising
Resumo:
The primary habitat of Salmonella is the gastrointestinal tract of animals and they are discharged into the water bodies through the feces. Aquatic animals act as asymptomatic reservoirs of a wide range of Salmonella serotypes. The inevitable delay in the detection of Salmonella contamination and the low sensitivity of the conventional methods is a serious issue in the seafood industry. Due to the indiscriminate use, the antibiotics are finally accumulated in the aquatic environment which provides the required antibiotic stress for the emergence of more and more antibiotic resistant phenotypes ofSalmonella. Several genetic determinants like integrons, genomic islands etc. play their role in acquisition and reshuffling of antibiotic resistance genes. A large number of virulence determinants are required for Salmonella pathogenicity. The virulence potential of Salmonella is determined, to some extent, by the presence of phages or phage mediated genes in the bacterial genome. There is much intra-serotype polymorphism in Salmonella and epidemiological studies rely on genetic resemblance of the isolated strains. Proper identification of the strain employing the traditional and molecular techniques is a prerequisite for accurate epidemiological studies (Soto et al., 2000). In this context, a study was undertaken to determine the prevalence of different Salmonella serotypes in seafood and to characterize them
Resumo:
Lignocellulosic biomass is probably the best alternative resource for biofuel production and it is composed mainly of cellulose, hemicelluloses and lignin. Cellulose is the most abundant among the three and conversion of cellulose to glucose is catalyzed by the enzyme cellulase. Cellulases are groups of enzymes act synergistically upon cellulose to produce glucose and comprise of endoglucanase, cellobiohydrolase and β-glucosidase. β -glucosidase assumes great importance due to the fact that it is the rate limiting enzyme. Endoglucanases (EG) produces nicks in the cellulose polymer exposing reducing and non reducing ends, cellobiohydrolases (CBH) acts upon the reducing or non reducing ends to liberate cellobiose units, and β - glucosidases (BGL) cleaves the cellobiose to liberate glucose completing the hydrolysis. . β -glucosidases undergo feedback inhibition by their own product- β glucose, and cellobiose which is their substrate. Few filamentous fungi produce glucose tolerant β - glucosidases which can overcome this inhibition by tolerating the product concentration to a particular threshold. The present study had targeted a filamentous fungus producing glucose tolerant β - glucosidase which was identified by morphological as well as molecular method. The fungus showed 99% similarity to Aspergillus unguis strain which comes under the Aspergillus nidulans group where most of the glucose tolerant β -glucosidase belongs. The culture was designated the strain number NII 08123 and was deposited in the NII culture collection at CSIR-NIIST. β -glucosidase multiplicity is a common occurrence in fungal world and in A.unguis this was demonstrated using zymogram analysis. A total 5 extracellular isoforms were detected in fungus and the expression levels of these five isoforms varied based on the carbon source available in the medium. Three of these 5 isoforms were expressed in higher levels as identified by the increased fluorescence (due to larger amounts of MUG breakdown by enzyme action) and was speculated to contribute significantly to the total _- β glucosidase activity. These isoforms were named as BGL 1, BGL3 and BGL 5. Among the three, BGL5 was demonstrated to be the glucose tolerant β -glucosidase and this was a low molecular weight protein. Major fraction was a high molecular weight protein but with lesser tolerance to glucose. BGL 3 was between the two in both activity and glucose tolerance.121 Glucose tolerant .β -glucosidase was purified and characterized and kinetic analysis showed that the glucose inhibition constant (Ki) of the protein is 800mM and Km and Vmax of the enzyme was found to be 4.854 mM and 2.946 mol min-1mg protein-1respectively. The optimumtemperature was 60°C and pH 6.0. The molecular weight of the purified protein was ~10kDa in both SDS as well as Native PAGE indicating that the glucose tolerant BGL is a monomeric protein.The major β -glucosidase, BGL1 had a pH and temperature optima of 5.0 and 60 °C respectively. The apparent molecular weight of the Native protein is 240kDa. The Vmax and Km was 78.8 mol min-1mg protein-1 and 0.326mM respectively. Degenerate primers were designed for glycosyl hydrolase families 1, 3 and 5 and the BGL genes were amplified from genomic DNA of Aspergillus unguis. The sequence analyses performed on the amplicons results confirmed the presence of all the three genes. Amplicon with a size of ~500bp was sequenced and which matched to a GH1 –BGL from Aspergillus oryzae. GH3 degenerate primers producing amplicons were sequenced and the sequences matched to β - glucosidase of GH3 family from Aspergillus nidulans and Aspergillus acculateus. GH5 degenerate primers also gave amplification and sequencing results indicated the presence of GH5 family BGL gene in the Aspergillus unguis genomic DNA.From the partial gene sequencing results, specific as well as degenerate primers were designed for TAIL PCR. Sequencing results of the 1.0 Kb amplicon matched Aspergillus nidulans β -glucosidase gene which belongs to the GH1 family. The sequence mainly covered the N-Terminal region of the matching peptide. All the three BGL proteins ie. BGL1, BGL3 and BGL5 were purified by chromatography an electro elution from Native PAGE gels and were subjected to MALDI-TOF mass spectrometric analysis. The results showed that BGL1 peptide mass matched to . β -glucosidase-I of Aspergillus flavus which is a 92kDa protein with 69% protein coverage. The glucose tolerant β -glucosidase BGL5 mass matched to the catalytic C-terminal domain of β -glucosidase-F from Emericella nidulans, but the protein coverage was very low compared to the size of the Emericella nidulans protein. While comparing the size of BGL5 from Aspergillus unguis, the protein sequence coverage is more than 80%. BGL F is a glycosyl hydrolase family 3 protein.The properties of BGL5 seem to be very unique, in that it is a GH3 β -glucosidase with a very low molecular weight of ~10kDa and at the same time having catalytic activity and glucose 122 tolerance which is as yet un-described in GH β -glucosidases. The occurrence of a fully functional 10kDA protein with glucose tolerant BGL activity has tremendous implications both from the points of understanding the structure function relationships as well as for applications of BGL enzymes. BGL-3 showed similarity to BGL1 of Aspergillus aculateus which was another GH3 β -glucosidase. It may be noted that though PCR could detect GH1, GH3 and GH5 β-glucosidases in the fungus, the major isoforms BGL1 BGL3 and BGL5 were all GH3 family enzymes. This would imply that β-glucosidases belonging to other families may also co-exist in the fungus and the other minor isoforms detected in zymograms may account for them. In biomass hydrolysis, GT-BGL containing BGL enzyme was supplemented to cellulase and the performances of blends were compared with a cocktail where commercial β- glucosidase was supplemented to the biomass hydrolyzing enzyme preparation. The cocktail supplemented with A unguis BGL preparation yielded 555mg/g sugar in 12h compared to the commercial enzyme preparation which gave only 333mg/g in the same period and the maximum sugar yield of 858 mg/g was attained in 36h by the cocktail containing A. unguis BGL. While the commercial enzyme achieved almost similar sugar yield in 24h, there was rapid drop in sugar concentration after that, indicating probably the conversion of glucose back to di-or oligosaccharides by the transglycosylation activity of the BGl in that preparation. Compared this, the A.unguis enzyme containing preparation supported peak yields for longer duration (upto 48h) which is important for biomass conversion to other products since the hydrolysate has to undergo certain unit operations before it goes into the next stage ie – fermentation in any bioprocesses for production of either fuels or chemicals.. Most importantly the Aspergillus unguis BGL preparation yields approximately 1.6 fold increase in the sugar release compared to the commercial BGL within 12h of time interval and 2.25 fold increase in the sugar release compared to the control ie. Cellulase without BGL supplementation. The current study therefore leads to the identification of a potent new isolate producing glucose tolerant β - glucosidase. The organism identified as Aspergillus unguis comes under the Aspergillus nidulans group where most of the GT-BGL producers belong and the detailed studies showed that the glucose tolerant β -glucosidase was a very low molecular weight protein which probably belongs to the glycosyl hydrolase family 3. Inhibition kinetic studies helped to understand the Ki and it is the second highest among the nidulans group of Aspergilli. This has promoted us for a detailed study regarding the mechanism of glucose tolerance. The proteomic 123 analyses clearly indicate the presence of GH3 catalytic domain in the protein. Since the size of the protein is very low and still its active and showed glucose tolerance it is speculated that this could be an entirely new protein or the modification of the existing β -glucosidase with only the catalytic domain present in it. Hydrolysis experiments also qualify this BGL, a suitable candidate for the enzyme cocktail development for biomass hydrolysis
Resumo:
The ground state (J = 0) electronic correlation energy of the 4-electron Be-sequence is calculated in the Multi-Configuration Dirac-Fock approximation for Z = 4-20. The 4 electrons were distributed over the configurations arising from the 1s, 2s, 2p, 3s, 3p and 3d orbitals. Theoretical values obtained here are in good agreement with experimental correlation energies.
Resumo:
The present Thesis looks at the problem of protein folding using Monte Carlo and Langevin simulations, three topics in protein folding have been studied: 1) the effect of confining potential barriers, 2) the effect of a static external field and 3) the design of amino acid sequences which fold in a short time and which have a stable native state (global minimum). Regarding the first topic, we studied the confinement of a small protein of 16 amino acids known as 1NJ0 (PDB code) which has a beta-sheet structure as a native state. The confinement of proteins occurs frequently in the cell environment. Some molecules called Chaperones, present in the cytoplasm, capture the unfolded proteins in their interior and avoid the formation of aggregates and misfolded proteins. This mechanism of confinement mediated by Chaperones is not yet well understood. In the present work we considered two kinds of potential barriers which try to mimic the confinement induced by a Chaperon molecule. The first kind of potential was a purely repulsive barrier whose only effect is to create a cavity where the protein folds up correctly. The second kind of potential was a barrier which includes both attractive and repulsive effects. We performed Wang-Landau simulations to calculate the thermodynamical properties of 1NJ0. From the free energy landscape plot we found that 1NJ0 has two intermediate states in the bulk (without confinement) which are clearly separated from the native and the unfolded states. For the case of the purely repulsive barrier we found that the intermediate states get closer to each other in the free energy landscape plot and eventually they collapse into a single intermediate state. The unfolded state is more compact, compared to that in the bulk, as the size of the barrier decreases. For an attractive barrier modifications of the states (native, unfolded and intermediates) are observed depending on the degree of attraction between the protein and the walls of the barrier. The strength of the attraction is measured by the parameter $\epsilon$. A purely repulsive barrier is obtained for $\epsilon=0$ and a purely attractive barrier for $\epsilon=1$. The states are changed slightly for magnitudes of the attraction up to $\epsilon=0.4$. The disappearance of the intermediate states of 1NJ0 is already observed for $\epsilon =0.6$. A very high attractive barrier ($\epsilon \sim 1.0$) produces a completely denatured state. In the second topic of this Thesis we dealt with the interaction of a protein with an external electric field. We demonstrated by means of computer simulations, specifically by using the Wang-Landau algorithm, that the folded, unfolded, and intermediate states can be modified by means of a field. We have found that an external field can induce several modifications in the thermodynamics of these states: for relatively low magnitudes of the field ($<2.06 \times 10^8$ V/m) no major changes in the states are observed. However, for higher magnitudes than ($6.19 \times 10^8$ V/m) one observes the appearance of a new native state which exhibits a helix-like structure. In contrast, the original native state is a $\beta$-sheet structure. In the new native state all the dipoles in the backbone structure are aligned parallel to the field. The design of amino acid sequences constitutes the third topic of the present work. We have tested the Rate of Convergence criterion proposed by D. Gridnev and M. Garcia ({\it work unpublished}). We applied it to the study of off-lattice models. The Rate of Convergence criterion is used to decide if a certain sequence will fold up correctly within a relatively short time. Before the present work, the common way to decide if a certain sequence was a good/bad folder was by performing the whole dynamics until the sequence got its native state (if it existed), or by studying the curvature of the potential energy surface. There are some difficulties in the last two approaches. In the first approach, performing the complete dynamics for hundreds of sequences is a rather challenging task because of the CPU time needed. In the second approach, calculating the curvature of the potential energy surface is possible only for very smooth surfaces. The Rate of Convergence criterion seems to avoid the previous difficulties. With this criterion one does not need to perform the complete dynamics to find the good and bad sequences. Also, the criterion does not depend on the kind of force field used and therefore it can be used even for very rugged energy surfaces.
Resumo:
Eukaryotic DNA m5C methyltransferases (MTases) play a major role in many epigenetic regulatory processes like genomic imprinting, X-chromosome inactivation, silencing of transposons and gene expression. Members of the two DNA m5C MTase families, Dnmt1 and Dnmt3, are relatively well studied and many details of their biological functions, biochemical properties as well as interaction partners are known. In contrast, the biological functions of the highly conserved Dnmt2 family, which appear to have non-canonical dual substrate specificity, remain enigmatic despite the efforts of many researchers. The genome of the social amoeba Dictyostelium encodes Dnmt2-homolog, the DnmA, as the only DNA m5C MTase which allowed us to study Dnmt2 function in this organism without interference by the other enzymes. The dnmA gene can be easily disrupted but the knock-out clones did not show obvious phenotypes under normal lab conditions, suggesting that the function of DnmA is not vital for the organism. It appears that the dnmA gene has a low expression profile during vegetative growth and is only 5-fold upregulated during development. Fluorescence microscopy indicated that DnmA-GFP fusions were distributed between both the nucleus and cytoplasm with some enrichment in nuclei. Interestingly, the experiments showed specific dynamics of DnmA-GFP distribution during the cell cycle. The proteins colocalized with DNA in the interphase and were mainly removed from nuclei during mitosis. DnmA functions as an active DNA m5C MTase in vivo and is responsible for weak but detectable DNA methylation of several regions in the Dictyostelium genome. Nevertheless, gel retardation assays showed only slightly higher affinity of the enzyme to dsDNA compared to ssDNA and no specificity towards various sequence contexts, although weak but detectable specificity towards AT-rich sequences was observed. This could be due to intrinsic curvature of such sequences. Furthermore, DnmA did not show denaturant-resistant covalent complexes with dsDNA in vitro, although it could form covalent adducts with ssDNA. Low binding and methyltransfer activity in vitro suggest the necessity of additional factor in DnmA function. Nevertheless, no candidates could be identified in affinity purification experiments with different tagged DnmA fusions. In this respect, it should be noted that tagged DnmA fusion preparations from Dictyostelium showed somewhat higher activity in both covalent adduct formation and methylation assays than DnmA expressed in E.coli. Thus, the presence of co-purified factors cannot be excluded. The low efficiency of complex formation by the recombinant enzyme and the failure to define interacting proteins that could be required for DNA methylation in vivo, brought up the assumption that post-translational modifications could influence target recognition and enzymatic activity. Indeed, sites of phosphorylation, methylation and acetylation were identified within the target recognition domain (TRD) of DnmA by mass spectrometry. For phosphorylation, the combination of MS data and bioinformatic analysis revealed that some of the sites could well be targets for specific kinases in vivo. Preliminary 3D modeling of DnmA protein based on homology with hDNMT2 allowed us to show that several identified phosphorylation sites located on the surface of the molecule, where they would be available for kinases. The presence of modifications almost solely within the TRD domain of DnmA could potentially modulate the mode of its interaction with the target nucleic acids. DnmA was able to form denaturant-resistant covalent intermediates with several Dictyostelium tRNAs, using as a target C38 in the anticodon loop. The formation of complexes not always correlated with the data from methylation assays, and seemed to be dependent on both sequence and structure of the tRNA substrate. The pattern, previously suggested by the Helm group for optimal methyltransferase activity of hDNMT2, appeared to contribute significantly in the formation of covalent adducts but was not the only feature of the substrate required for DnmA and hDNMT2 functions. Both enzymes required Mg2+ to form covalent complexes, which indicated that the specific structure of the target tRNA was indispensable. The dynamics of covalent adduct accumulation was different for DnmA and different tRNAs. Interestingly, the profiles of covalent adduct accumulation for different tRNAs were somewhat similar for DnmA and hDNMT2 enzymes. According to the proposed catalytic mechanism for DNA m5C MTases, the observed denaturant-resistant complexes corresponded to covalent enamine intermediates. The apparent discrepancies in the data from covalent complex formation and methylation assays may be interpreted by the possibility of alternative pathways of the catalytic mechanism, leading not to methylation but to exchange or demethylation reactions. The reversibility of enamine intermediate formation should also be considered. Curiously, native gel retardation assays showed no or little difference in binding affinities of DnmA to different RNA substrates and thus the absence of specificity in the initial enzyme binding. The meaning of the tRNA methylation as well as identification of novel RNA substrates in vivo should be the aim of further experiments.
Resumo:
Background: The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped. ----- Methods: Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets. ----- Results: Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams. ----- Conclusions: Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.