925 resultados para SEQUENCE
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
Variation in hiring procedures occurs within fire service human resource departments. In this study, City 1 and City 2 applicants were required to pass their biophysical assessments prior to being hired as firefighters at the beginning and end of the screening process, respectively. City 1 applicants demonstrated significantly lower resting heart rate (RHR), resting diastolic blood pressure (RDBP), body fat% (BF) and higher z-scores for BF, trunk flexibility (TF) and overall clinical assessment (p<0.05). Regression analysis found that age and conducting the biophysical assessment at the end of the screening process explained poorer biophysical assessment results in BF% (R2=21%), BF z-score (R2=22%), TF z-score (R2=10%) and overall clinical assessment z-score (R2=7%). Each of RHR (OR=1.06, CI=1.01-1.10), RDBP (OR=1.05, CI=1.00-1.11) and BF% (OR=1.20, CI=1.07-1.37) increased the odds of being a City 2 firefighter (p<0.05). Biophysical screening at the end of the hiring process may result in the hiring of a less healthy firefighter.
Resumo:
The complete genome of an Erwinia amylovora bacteriophage, vB_EamM_Ea35-70 (Ea35-70), is 271,084 bp, encodes 318 putative proteins, and contains one tRNA. Comparative analysis with other Myoviridae genomes suggests that Ea35-70 is related to the Phikzlikevirus genus within the family Myoviridae, since 26% of Ea35-70 proteins share homology to proteins in Pseudomonas phage φKZ.
Resumo:
Affiliation: Département de biochimie, Faculté de médecine, Université de Montréal
Resumo:
In the present investigation, an attempt is made to study late Quaternary foraminiferal and pteropod records of the shelf of northern Kerala and to evaluate their potentiality in paleocenographic and paleoclimatic reconstruction. The study gives details of sediment cores, general characteristics of foraminifera and pteropod species recorded from the examined samples and their systematic classification, spatial distribution of Recent foraminifera and pteropods and their response to varying bathymetry, nature of substrate, organic matter content in sediment and hydrography across the shelf. An attempt is also made to establish an integrated chronostratigraphy for the examined core sections. An effort is also made to identify microfaunal criteria useful in biostratigraphic division in shallow marine core sections. An attempt is made to infer various factors responsible for the change in microfaunal assemblage. Reconstruction of sea level changes during the last 36,000 years was attempted based on the pteropod record. The study reveals a bathymetric control on benthic/planktic (BF/PF) foraminiferal and pteropods/planktic foraminiferal (Pt/PF) abundance ratio. Bathymetric distribution pattern of BF/PF ratio is opposite to the (Pt/PF) ratio with decreasing trend of former from the shore across the shelf. Quantitative benthic foraminiferal record in the surficial sediments reveals a positive correlation between the diversity and bathymetry. R-mode cluster analysis performed on 30n significant Recent benthic foraminiferal, determines three major assemblage.
Resumo:
Modern computer systems are plagued with stability and security problems: applications lose data, web servers are hacked, and systems crash under heavy load. Many of these problems or anomalies arise from rare program behavior caused by attacks or errors. A substantial percentage of the web-based attacks are due to buffer overflows. Many methods have been devised to detect and prevent anomalous situations that arise from buffer overflows. The current state-of-art of anomaly detection systems is relatively primitive and mainly depend on static code checking to take care of buffer overflow attacks. For protection, Stack Guards and I-leap Guards are also used in wide varieties.This dissertation proposes an anomaly detection system, based on frequencies of system calls in the system call trace. System call traces represented as frequency sequences are profiled using sequence sets. A sequence set is identified by the starting sequence and frequencies of specific system calls. The deviations of the current input sequence from the corresponding normal profile in the frequency pattern of system calls is computed and expressed as an anomaly score. A simple Bayesian model is used for an accurate detection.Experimental results are reported which show that frequency of system calls represented using sequence sets, captures the normal behavior of programs under normal conditions of usage. This captured behavior allows the system to detect anomalies with a low rate of false positives. Data are presented which show that Bayesian Network on frequency variations responds effectively to induced buffer overflows. It can also help administrators to detect deviations in program flow introduced due to errors.
Resumo:
This paper discusses our research in developing a generalized and systematic method for anomaly detection. The key ideas are to represent normal program behaviour using system call frequencies and to incorporate probabilistic techniques for classification to detect anomalies and intrusions. Using experiments on the sendmail system call data, we demonstrate that concise and accurate classifiers can be constructed to detect anomalies. An overview of the approach that we have implemented is provided.
Resumo:
Code clones are portions of source code which are similar to the original program code. The presence of code clones is considered as a bad feature of software as the maintenance of software becomes difficult due to the presence of code clones. Methods for code clone detection have gained immense significance in the last few years as they play a significant role in engineering applications such as analysis of program code, program understanding, plagiarism detection, error detection, code compaction and many more similar tasks. Despite of all these facts, several features of code clones if properly utilized can make software development process easier. In this work, we have pointed out such a feature of code clones which highlight the relevance of code clones in test sequence identification. Here program slicing is used in code clone detection. In addition, a classification of code clones is presented and the benefit of using program slicing in code clone detection is also mentioned in this work.
Resumo:
DNA sequence representation methods are used to denote a gene structure effectively and help in similarities/dissimilarities analysis of coding sequences. Many different kinds of representations have been proposed in the literature. They can be broadly classified into Numerical, Graphical, Geometrical and Hybrid representation methods. DNA structure and function analysis are made easy with graphical and geometrical representation methods since it gives visual representation of a DNA structure. In numerical method, numerical values are assigned to a sequence and digital signal processing methods are used to analyze the sequence. Hybrid approaches are also reported in the literature to analyze DNA sequences. This paper reviews the latest developments in DNA Sequence representation methods. We also present a taxonomy of various methods. A comparison of these methods where ever possible is also done
Resumo:
Considerable research effort has been devoted in predicting the exon regions of genes. The binary indicator (BI), Electron ion interaction pseudo potential (EIIP), Filter method are some of the methods. All these methods make use of the period three behavior of the exon region. Even though the method suggested in this paper is similar to above mentioned methods , it introduces a set of sequences for mapping the nucleotides selected by applying genetic algorithm and found to be more promising
Resumo:
The ground state (J = 0) electronic correlation energy of the 4-electron Be-sequence is calculated in the Multi-Configuration Dirac-Fock approximation for Z = 4-20. The 4 electrons were distributed over the configurations arising from the 1s, 2s, 2p, 3s, 3p and 3d orbitals. Theoretical values obtained here are in good agreement with experimental correlation energies.
Resumo:
The present Thesis looks at the problem of protein folding using Monte Carlo and Langevin simulations, three topics in protein folding have been studied: 1) the effect of confining potential barriers, 2) the effect of a static external field and 3) the design of amino acid sequences which fold in a short time and which have a stable native state (global minimum). Regarding the first topic, we studied the confinement of a small protein of 16 amino acids known as 1NJ0 (PDB code) which has a beta-sheet structure as a native state. The confinement of proteins occurs frequently in the cell environment. Some molecules called Chaperones, present in the cytoplasm, capture the unfolded proteins in their interior and avoid the formation of aggregates and misfolded proteins. This mechanism of confinement mediated by Chaperones is not yet well understood. In the present work we considered two kinds of potential barriers which try to mimic the confinement induced by a Chaperon molecule. The first kind of potential was a purely repulsive barrier whose only effect is to create a cavity where the protein folds up correctly. The second kind of potential was a barrier which includes both attractive and repulsive effects. We performed Wang-Landau simulations to calculate the thermodynamical properties of 1NJ0. From the free energy landscape plot we found that 1NJ0 has two intermediate states in the bulk (without confinement) which are clearly separated from the native and the unfolded states. For the case of the purely repulsive barrier we found that the intermediate states get closer to each other in the free energy landscape plot and eventually they collapse into a single intermediate state. The unfolded state is more compact, compared to that in the bulk, as the size of the barrier decreases. For an attractive barrier modifications of the states (native, unfolded and intermediates) are observed depending on the degree of attraction between the protein and the walls of the barrier. The strength of the attraction is measured by the parameter $\epsilon$. A purely repulsive barrier is obtained for $\epsilon=0$ and a purely attractive barrier for $\epsilon=1$. The states are changed slightly for magnitudes of the attraction up to $\epsilon=0.4$. The disappearance of the intermediate states of 1NJ0 is already observed for $\epsilon =0.6$. A very high attractive barrier ($\epsilon \sim 1.0$) produces a completely denatured state. In the second topic of this Thesis we dealt with the interaction of a protein with an external electric field. We demonstrated by means of computer simulations, specifically by using the Wang-Landau algorithm, that the folded, unfolded, and intermediate states can be modified by means of a field. We have found that an external field can induce several modifications in the thermodynamics of these states: for relatively low magnitudes of the field ($<2.06 \times 10^8$ V/m) no major changes in the states are observed. However, for higher magnitudes than ($6.19 \times 10^8$ V/m) one observes the appearance of a new native state which exhibits a helix-like structure. In contrast, the original native state is a $\beta$-sheet structure. In the new native state all the dipoles in the backbone structure are aligned parallel to the field. The design of amino acid sequences constitutes the third topic of the present work. We have tested the Rate of Convergence criterion proposed by D. Gridnev and M. Garcia ({\it work unpublished}). We applied it to the study of off-lattice models. The Rate of Convergence criterion is used to decide if a certain sequence will fold up correctly within a relatively short time. Before the present work, the common way to decide if a certain sequence was a good/bad folder was by performing the whole dynamics until the sequence got its native state (if it existed), or by studying the curvature of the potential energy surface. There are some difficulties in the last two approaches. In the first approach, performing the complete dynamics for hundreds of sequences is a rather challenging task because of the CPU time needed. In the second approach, calculating the curvature of the potential energy surface is possible only for very smooth surfaces. The Rate of Convergence criterion seems to avoid the previous difficulties. With this criterion one does not need to perform the complete dynamics to find the good and bad sequences. Also, the criterion does not depend on the kind of force field used and therefore it can be used even for very rugged energy surfaces.