936 resultados para automatic bug assignment
Resumo:
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (±20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed
Resumo:
Opposite enantiomers exhibit different NMR properties in the presence of an external common chiral element, and a chiral molecule exhibits different NMR properties in the presence of external enantiomeric chiral elements. Automatic prediction of such differences, and comparison with experimental values, leads to the assignment of the absolute configuration. Here two cases are reported, one using a dataset of 80 chiral secondary alcohols esterified with (R)-MTPA and the corresponding 1H NMR chemical shifts and the other with 94 13C NMR chemical shifts of chiral secondary alcohols in two enantiomeric chiral solvents. For the first application, counterpropagation neural networks were trained to predict the sign of the difference between chemical shifts of opposite stereoisomers. The neural networks were trained to process the chirality code of the alcohol as the input, and to give the NMR property as the output. In the second application, similar neural networks were employed, but the property to predict was the difference of chemical shifts in the two enantiomeric solvents. For independent test sets of 20 objects, 100% correct predictions were obtained in both applications concerning the sign of the chemical shifts differences. Additionally, with the second dataset, the difference of chemical shifts in the two enantiomeric solvents was quantitatively predicted, yielding r2 0.936 for the test set between the predicted and experimental values.
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Affiliation: Centre Robert-Cedergren de l'Université de Montréal en bio-informatique et génomique & Département de biochimie, Université de Montréal
Resumo:
Eigenvalue assignment methods are used widely in the design of control and state-estimation systems. The corresponding eigenvectors can be selected to ensure robustness. For specific applications, eigenstructure assignment can also be applied to achieve more general performance criteria. In this paper a new output feedback design approach using robust eigenstructure assignment to achieve prescribed mode input and output coupling is described. A minimisation technique is developed to improve both the mode coupling and the robustness of the system, whilst allowing the precision of the eigenvalue placement to be relaxed. An application to the design of an automatic flight control system is demonstrated.
Resumo:
Some points of the paper by N.K. Nichols (see ibid., vol.AC-31, p.643-5, 1986), concerning the robust pole assignment of linear multiinput systems, are clarified. It is stressed that the minimization of the condition number of the closed-loop eigenvector matrix does not necessarily lead to robustness of the pole assignment. It is shown why the computational method, which Nichols claims is robust, is in fact numerically unstable with respect to the determination of the gain matrix. In replying, Nichols presents arguments to support the choice of the conditioning of the closed-loop poles as a measure of robustness and to show that the methods of J Kautsky, N. K. Nichols and P. VanDooren (1985) are stable in the sense that they produce accurate solutions to well-conditioned problems.
Resumo:
Coordinate free conditions are given for pole assignment by feedback in linear descriptor (singular) systems which guarantee closed-loop regularity. These conditions are shown to be both necessary and sufficient for assignment of the maximum possible number of finite poles. Transformation to special coordinates are not used and the results provide a robust algorithm for the computation of the required feedback.
Resumo:
A number of computationally reliable direct methods for pole assignment by feedback have recently been developed. These direct procedures do not necessarily produce robust solutions to the problem, however, in the sense that the assigned poles are insensitive to perturbalions in the closed-loop system. This difficulty is illustrated here with results from a recent algorithm presented in this TRANSACTIONS and its causes are examined. A measure of robustness is described, and techniques for testing and improving robustness are indicated.
Resumo:
High Throughput Sequencing capabilities have made the process of assembling a transcriptome easier, whether or not there is a reference genome. But the quality of a transcriptome assembly must be good enough to capture the most comprehensive catalog of transcripts and their variations, and to carry out further experiments on transcriptomics. There is currently no consensus on which of the many sequencing technologies and assembly tools are the most effective. Many non-model organisms lack a reference genome to guide the transcriptome assembly. One question, therefore, is whether or not a reference-based genome assembly gives better results than de novo assembly. The blood-sucking insect Rhodnius prolixus-a vector for Chagas disease-has a reference genome. It is therefore a good model on which to compare reference-based and de novo transcriptome assemblies. In this study, we compared de novo and reference-based genome assembly strategies using three datasets (454, Illumina, 454 combined with Illumina) and various assembly software. We developed criteria to compare the resulting assemblies: the size distribution and number of transcripts, the proportion of potentially chimeric transcripts, how complete the assembly was (completeness evaluated both through CEGMA software and R. prolixus proteome fraction retrieved). Moreover, we looked for the presence of two chemosensory gene families (Odorant-Binding Proteins and Chemosensory Proteins) to validate the assembly quality. The reference-based assemblies after genome annotation were clearly better than those generated using de novo strategies alone. Reference-based strategies revealed new transcripts, including new isoforms unpredicted by automatic genome annotation. However, a combination of both de novo and reference-based strategies gave the best result, and allowed us to assemble fragmented transcripts.
Resumo:
For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, we present an approach to automatically suggest developers who have the appropriate expertise for handling a bug report. We model developer expertise using the vocabulary found in their source code contributions and compare this vocabulary to the vocabulary of bug reports. We evaluate our approach by comparing the suggested experts to the persons who eventually worked on the bug. Using eight years of Eclipse development as a case study, we achieve 33.6\% top-1 precision and 71.0\% top-10 recall.
Resumo:
Anisotropic magnetic susceptibility tensors chi of paramagnetic metal ions are manifested in pseudocontact shifts, residual dipolar couplings, and other paramagnetic observables that present valuable long-range information for structure determinations of protein-ligand complexes. A program was developed for automatic determination of the chi-tensor anisotropy parameters and amide resonance assignments in proteins labeled with paramagnetic metal ions. The program requires knowledge of the three-dimensional structure of the protein, the backbone resonance assignments of the diamagnetic protein, and a pair of 2D N-15-HSQC or 3D HNCO spectra recorded with and without paramagnetic metal ion. It allows the determination of reliable chi-tensor anisotropy parameters from 2D spectra of uniformly N-15-labeled proteins of fairly high molecular weight. Examples are shown for the 185-residue N-terminal domain of the subunit epsilon from E. coli DNA polymerase III in complex with the subunit theta and La3+ in its diamagnetic and Dy3+, Tb3+, and Er3+ in its paramagnetic form.
Resumo:
Knowledge-based radiation treatment is an emerging concept in radiotherapy. It
mainly refers to the technique that can guide or automate treatment planning in
clinic by learning from prior knowledge. Dierent models are developed to realize
it, one of which is proposed by Yuan et al. at Duke for lung IMRT planning. This
model can automatically determine both beam conguration and optimization ob-
jectives with non-coplanar beams based on patient-specic anatomical information.
Although plans automatically generated by this model demonstrate equivalent or
better dosimetric quality compared to clinical approved plans, its validity and gener-
ality are limited due to the empirical assignment to a coecient called angle spread
constraint dened in the beam eciency index used for beam ranking. To eliminate
these limitations, a systematic study on this coecient is needed to acquire evidences
for its optimal value.
To achieve this purpose, eleven lung cancer patients with complex tumor shape
with non-coplanar beams adopted in clinical approved plans were retrospectively
studied in the frame of the automatic lung IMRT treatment algorithm. The primary
and boost plans used in three patients were treated as dierent cases due to the
dierent target size and shape. A total of 14 lung cases, thus, were re-planned using
the knowledge-based automatic lung IMRT planning algorithm by varying angle
spread constraint from 0 to 1 with increment of 0.2. A modied beam angle eciency
index used for navigate the beam selection was adopted. Great eorts were made to assure the quality of plans associated to every angle spread constraint as good
as possible. Important dosimetric parameters for PTV and OARs, quantitatively
re
ecting the plan quality, were extracted from the DVHs and analyzed as a function
of angle spread constraint for each case. Comparisons of these parameters between
clinical plans and model-based plans were evaluated by two-sampled Students t-tests,
and regression analysis on a composite index built on the percentage errors between
dosimetric parameters in the model-based plans and those in the clinical plans as a
function of angle spread constraint was performed.
Results show that model-based plans generally have equivalent or better quality
than clinical approved plans, qualitatively and quantitatively. All dosimetric param-
eters except those for lungs in the automatically generated plans are statistically
better or comparable to those in the clinical plans. On average, more than 15% re-
duction on conformity index and homogeneity index for PTV and V40, V60 for heart
while an 8% and 3% increase on V5, V20 for lungs, respectively, are observed. The
intra-plan comparison among model-based plans demonstrates that plan quality does
not change much with angle spread constraint larger than 0.4. Further examination
on the variation curve of the composite index as a function of angle spread constraint
shows that 0.6 is the optimal value that can result in statistically the best achievable
plans.
Resumo:
Electric vehicles and electronic components inside the vehicle are becoming increasingly important. The software as well starts to have a significant impact on modern high-end cars therefore a careful validation process needs to be implemented with the aim of having a bug free product when it is released. The software complexity increases and thus also the testing phases is more demanding. Test can be troublesome and, in some cases, boring and easy. The intelligence can be moved in test definition and writing rather than on test execution. The aim of this document is to start the definition of an automatic modular testing system capable to execute test cycles on systems that interacts with the CAN networks and with DUT that can be touched with a robotic arm. The document defines a first version of the system, in particular the hardware interface part with the aim of taking logs and execute test in an automated fashion with the test engineer can have a higher focus on the test definition and analysis rather than execution.