906 resultados para alignment-free methods
Resumo:
Second-rank tensor interactions, such as quadrupolar interactions between the spin- 1 deuterium nuclei and the electric field gradients created by chemical bonds, are affected by rapid random molecular motions that modulate the orientation of the molecule with respect to the external magnetic field. In biological and model membrane systems, where a distribution of dynamically averaged anisotropies (quadrupolar splittings, chemical shift anisotropies, etc.) is present and where, in addition, various parts of the sample may undergo a partial magnetic alignment, the numerical analysis of the resulting Nuclear Magnetic Resonance (NMR) spectra is a mathematically ill-posed problem. However, numerical methods (de-Pakeing, Tikhonov regularization) exist that allow for a simultaneous determination of both the anisotropy and orientational distributions. An additional complication arises when relaxation is taken into account. This work presents a method of obtaining the orientation dependence of the relaxation rates that can be used for the analysis of the molecular motions on a broad range of time scales. An arbitrary set of exponential decay rates is described by a three-term truncated Legendre polynomial expansion in the orientation dependence, as appropriate for a second-rank tensor interaction, and a linear approximation to the individual decay rates is made. Thus a severe numerical instability caused by the presence of noise in the experimental data is avoided. At the same time, enough flexibility in the inversion algorithm is retained to achieve a meaningful mapping from raw experimental data to a set of intermediate, model-free
Resumo:
Variations in different types of genomes have been found to be responsible for a large degree of physical diversity such as appearance and susceptibility to disease. Identification of genomic variations is difficult and can be facilitated through computational analysis of DNA sequences. Newly available technologies are able to sequence billions of DNA base pairs relatively quickly. These sequences can be used to identify variations within their specific genome but must be mapped to a reference sequence first. In order to align these sequences to a reference sequence, we require mapping algorithms that make use of approximate string matching and string indexing methods. To date, few mapping algorithms have been tailored to handle the massive amounts of output generated by newly available sequencing technologies. In otrder to handle this large amount of data, we modified the popular mapping software BWA to run in parallel using OpenMPI. Parallel BWA matches the efficiency of multithreaded BWA functions while providing efficient parallelism for BWA functions that do not currently support multithreading. Parallel BWA shows significant wall time speedup in comparison to multithreaded BWA on high-performance computing clusters, and will thus facilitate the analysis of genome sequencing data.
Resumo:
We consider the problem of testing whether the observations X1, ..., Xn of a time series are independent with unspecified (possibly nonidentical) distributions symmetric about a common known median. Various bounds on the distributions of serial correlation coefficients are proposed: exponential bounds, Eaton-type bounds, Chebyshev bounds and Berry-Esséen-Zolotarev bounds. The bounds are exact in finite samples, distribution-free and easy to compute. The performance of the bounds is evaluated and compared with traditional serial dependence tests in a simulation experiment. The procedures proposed are applied to U.S. data on interest rates (commercial paper rate).
Resumo:
Naïvement perçu, le processus d’évolution est une succession d’événements de duplication et de mutations graduelles dans le génome qui mènent à des changements dans les fonctions et les interactions du protéome. La famille des hydrolases de guanosine triphosphate (GTPases) similaire à Ras constitue un bon modèle de travail afin de comprendre ce phénomène fondamental, car cette famille de protéines contient un nombre limité d’éléments qui diffèrent en fonctionnalité et en interactions. Globalement, nous désirons comprendre comment les mutations singulières au niveau des GTPases affectent la morphologie des cellules ainsi que leur degré d’impact sur les populations asynchrones. Mon travail de maîtrise vise à classifier de manière significative différents phénotypes de la levure Saccaromyces cerevisiae via l’analyse de plusieurs critères morphologiques de souches exprimant des GTPases mutées et natives. Notre approche à base de microscopie et d’analyses bioinformatique des images DIC (microscopie d’interférence différentielle de contraste) permet de distinguer les phénotypes propres aux cellules natives et aux mutants. L’emploi de cette méthode a permis une détection automatisée et une caractérisation des phénotypes mutants associés à la sur-expression de GTPases constitutivement actives. Les mutants de GTPases constitutivement actifs Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V ont été analysés avec succès. En effet, l’implémentation de différents algorithmes de partitionnement, permet d’analyser des données qui combinent les mesures morphologiques de population native et mutantes. Nos résultats démontrent que l’algorithme Fuzzy C-Means performe un partitionnement efficace des cellules natives ou mutantes, où les différents types de cellules sont classifiés en fonction de plusieurs facteurs de formes cellulaires obtenus à partir des images DIC. Cette analyse démontre que les mutations Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V induisent respectivement des phénotypes amorphe, allongé, rond et large qui sont représentés par des vecteurs de facteurs de forme distincts. Ces distinctions sont observées avec différentes proportions (morphologie mutante / morphologie native) dans les populations de mutants. Le développement de nouvelles méthodes automatisées d’analyse morphologique des cellules natives et mutantes s’avère extrêmement utile pour l’étude de la famille des GTPases ainsi que des résidus spécifiques qui dictent leurs fonctions et réseau d’interaction. Nous pouvons maintenant envisager de produire des mutants de GTPases qui inversent leur fonction en ciblant des résidus divergents. La substitution fonctionnelle est ensuite détectée au niveau morphologique grâce à notre nouvelle stratégie quantitative. Ce type d’analyse peut également être transposé à d’autres familles de protéines et contribuer de manière significative au domaine de la biologie évolutive.
Resumo:
Dans un premier temps, nous avons modélisé la structure d’une famille d’ARN avec une grammaire de graphes afin d’identifier les séquences qui en font partie. Plusieurs autres méthodes de modélisation ont été développées, telles que des grammaires stochastiques hors-contexte, des modèles de covariance, des profils de structures secondaires et des réseaux de contraintes. Ces méthodes de modélisation se basent sur la structure secondaire classique comparativement à nos grammaires de graphes qui se basent sur les motifs cycliques de nucléotides. Pour exemplifier notre modèle, nous avons utilisé la boucle E du ribosome qui contient le motif Sarcin-Ricin qui a été largement étudié depuis sa découverte par cristallographie aux rayons X au début des années 90. Nous avons construit une grammaire de graphes pour la structure du motif Sarcin-Ricin et avons dérivé toutes les séquences qui peuvent s’y replier. La pertinence biologique de ces séquences a été confirmée par une comparaison des séquences d’un alignement de plus de 800 séquences ribosomiques bactériennes. Cette comparaison a soulevée des alignements alternatifs pour quelques unes des séquences que nous avons supportés par des prédictions de structures secondaires et tertiaires. Les motifs cycliques de nucléotides ont été observés par les membres de notre laboratoire dans l'ARN dont la structure tertiaire a été résolue expérimentalement. Une étude des séquences et des structures tertiaires de chaque cycle composant la structure du Sarcin-Ricin a révélé que l'espace des séquences dépend grandement des interactions entre tous les nucléotides à proximité dans l’espace tridimensionnel, c’est-à-dire pas uniquement entre deux paires de bases adjacentes. Le nombre de séquences générées par la grammaire de graphes est plus petit que ceux des méthodes basées sur la structure secondaire classique. Cela suggère l’importance du contexte pour la relation entre la séquence et la structure, d’où l’utilisation d’une grammaire de graphes contextuelle plus expressive que les grammaires hors-contexte. Les grammaires de graphes que nous avons développées ne tiennent compte que de la structure tertiaire et négligent les interactions de groupes chimiques spécifiques avec des éléments extra-moléculaires, comme d’autres macromolécules ou ligands. Dans un deuxième temps et pour tenir compte de ces interactions, nous avons développé un modèle qui tient compte de la position des groupes chimiques à la surface des structures tertiaires. L’hypothèse étant que les groupes chimiques à des positions conservées dans des séquences prédéterminées actives, qui sont déplacés dans des séquences inactives pour une fonction précise, ont de plus grandes chances d’être impliqués dans des interactions avec des facteurs. En poursuivant avec l’exemple de la boucle E, nous avons cherché les groupes de cette boucle qui pourraient être impliqués dans des interactions avec des facteurs d'élongation. Une fois les groupes identifiés, on peut prédire par modélisation tridimensionnelle les séquences qui positionnent correctement ces groupes dans leurs structures tertiaires. Il existe quelques modèles pour adresser ce problème, telles que des descripteurs de molécules, des matrices d’adjacences de nucléotides et ceux basé sur la thermodynamique. Cependant, tous ces modèles utilisent une représentation trop simplifiée de la structure d’ARN, ce qui limite leur applicabilité. Nous avons appliqué notre modèle sur les structures tertiaires d’un ensemble de variants d’une séquence d’une instance du Sarcin-Ricin d’un ribosome bactérien. L’équipe de Wool à l’université de Chicago a déjà étudié cette instance expérimentalement en testant la viabilité de 12 variants. Ils ont déterminé 4 variants viables et 8 létaux. Nous avons utilisé cet ensemble de 12 séquences pour l’entraînement de notre modèle et nous avons déterminé un ensemble de propriétés essentielles à leur fonction biologique. Pour chaque variant de l’ensemble d’entraînement nous avons construit des modèles de structures tertiaires. Nous avons ensuite mesuré les charges partielles des atomes exposés sur la surface et encodé cette information dans des vecteurs. Nous avons utilisé l’analyse des composantes principales pour transformer les vecteurs en un ensemble de variables non corrélées, qu’on appelle les composantes principales. En utilisant la distance Euclidienne pondérée et l’algorithme du plus proche voisin, nous avons appliqué la technique du « Leave-One-Out Cross-Validation » pour choisir les meilleurs paramètres pour prédire l’activité d’une nouvelle séquence en la faisant correspondre à ces composantes principales. Finalement, nous avons confirmé le pouvoir prédictif du modèle à l’aide d’un nouvel ensemble de 8 variants dont la viabilité à été vérifiée expérimentalement dans notre laboratoire. En conclusion, les grammaires de graphes permettent de modéliser la relation entre la séquence et la structure d’un élément structural d’ARN, comme la boucle E contenant le motif Sarcin-Ricin du ribosome. Les applications vont de la correction à l’aide à l'alignement de séquences jusqu’au design de séquences ayant une structure prédéterminée. Nous avons également développé un modèle pour tenir compte des interactions spécifiques liées à une fonction biologique donnée, soit avec des facteurs environnants. Notre modèle est basé sur la conservation de l'exposition des groupes chimiques qui sont impliqués dans ces interactions. Ce modèle nous a permis de prédire l’activité biologique d’un ensemble de variants de la boucle E du ribosome qui se lie à des facteurs d'élongation.
Resumo:
In the present work, Indigenous polymer coated Tin Free Steel cans were analyzed fortheir suitability for thermal processing and storage of fish and fish products following standard methods. The raw materials used for the development of ready to eat thermally processed fish products were found to be of fresh condition. The values for various biochemical and microbiological parameters of the raw materials were well within the limits. Based on the analysis of commercial sterility, instrumental colour, texture, WB-shear force and sensory parameters, squid masala processed to F0 value of 8 min with a total process time of 38.5 min and cook value of 92 min was chosen as the optimum for squid masala in tin free steel cans while shrimp curry processed to F0 7 min with total process time of 44.0 min and cook value of 91.1 min was found to be ideal and was selected for storage study. Squid masala and shrimp curry thermally processed in indigenous polymer coated TFS cans were found to be acceptable even after one year of storage at room temperaturebased on the analysis of various sensory and biochemical parameters. Analysis of the Commission Internationale d’ Eclirage L*, a* and b* color values showed that the duration of exposure to heat treatment influenced the color parameters: the lightness (L*) and yellowness (b*)decreased, and the redness (a*) significantly increased with the increase in processing time or reduction in processing temperature.Instrumental analysis of texture showed that hardness-1 & 2 decreased with reduction in retort temperature while cohesiveness value did not show any appreciable change with decrease in temperature of processing. Other texture profile parameters like gumminess, springiness and chewiness decreased significantly with increase of processing time. W-B shear force values of mackerel meat processed at 130 °C were significantly higher than those processed at 121.1 and 115 °C. HTST processing of mackerel in brine helped in reducing the process time and improving the quality.The study also indicated that indigenous polymer coated TFS cans with easy openends can be a viable alternative to the conventional tin and aluminium cans. The industry can utilize these cans for processing ready to eat fish and shell fish products for both domestic and export markets. This will help in reviving the canning industry in India.
Resumo:
This thesis presents a detailed account of a cost - effective approach towards enhanced production of alkaline protease at profitable levels using different fermentation designs employing cheap agro-industrial residues. It involves the optimisation of process parameters for the production of a thermostable alkaline protease by Vibrio sp. V26 under solid state, submerged and biphasic fermentations, production of the enzyme using cell immobilisation technology and the application of the crude enzyme on the deproteinisation of crustacean waste.The present investigation suggests an economic move towards Improved production of alkaline protease at gainful altitudes employing different fermentation designs utilising inexpensive agro-industrial residues. Moreover, the use of agro-industrial and other solid waste substrates for fermentation helps to provide a substitute in conserving the already dwindling global energy resources. Another alternative for accomplishing economically feasible production is by the use of immobilisation technique. This method avoids the wasteful expense of continually growing microorganisms. The high protease producing potential of the organism under study ascertains their exploitation in the utilisation and management of wastes. However, strain improvement studies for the production of high yielding variants using mutagens or by gene transfer are required before recommending them to Industries.Industries, all over the world, have made several attempts to exploit the microbial diversity of this planet. For sustainable development, it is essential to discover, develop and defend this natural prosperity. The Industrial development of any country is critically dependent on the intellectual and financial investment in this area. The need of the hour is to harness the beneficial uses of microbes for maximum utilisation of natural resources and technological yields. Owing to the multitude of applications in a variety of industrial sectors, there has always been an increasing demand for novel producers and resources of alkaline proteases as well as for innovative methods of production at a commercial altitude. This investigation forms a humble endeavour towards this perspective and bequeaths hope and inspiration for inventions to follow.
Resumo:
This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
The application of nonlinear schemes like dual time stepping as preconditioners in matrix-free Newton-Krylov-solvers is considered and analyzed. We provide a novel formulation of the left preconditioned operator that says it is in fact linear in the matrix-free sense, but changes the Newton scheme. This allows to get some insight in the convergence properties of these schemes which are demonstrated through numerical results.
Resumo:
Even though there have been many studies on the impact of trade liberalisation on labour standards, most of the studies are at national level, and there is a lack of research at industry level. This paper examines the impact of free trade on labour standards in capital- and labour-intensive industries in a developing country. For empirical findings, I take the case of the garment industry, representing labour-intensive industry, and automotive industry, representing capital-intensive industry, in Indonesia in the face of ASEAN Free Trade Area (AFTA). Since the garment industry is a women-dominated industry, while the automotive industry is a men-dominated industry, this paper also employs a feminist perspective. As such, this paper also investigates whether free trade equally affects men and women workers. Besides free trade, other independent variables are also taken into account. Employing quantitative and qualitative methods, empirical evidence shows that there is an indication that free trade has a negative relationship with labour standards in the garment industry, whereas a positive relationships with labour standards in the automotive industry. This implies that free trade might result in decreasing labour standards in labour-intensive industry, while increasing standards in capital-intensive industry. It can also be inferred that free trade unequally affect men and women workers, in that women workers bear the brunt of free trade. The results also show that other internal and external independent variables are indicated to have relationships with labour standards in the garment and automotive industries. Therefore, these variables need to be considered in examining the extent of the impact of free trade on labour standards in labour- and capital-intensive industries.
Resumo:
We consider numerical methods for the compressible time dependent Navier-Stokes equations, discussing the spatial discretization by Finite Volume and Discontinuous Galerkin methods, the time integration by time adaptive implicit Runge-Kutta and Rosenbrock methods and the solution of the appearing nonlinear and linear equations systems by preconditioned Jacobian-Free Newton-Krylov, as well as Multigrid methods. As applications, thermal Fluid structure interaction and other unsteady flow problems are considered. The text is aimed at both mathematicians and engineers.
Resumo:
A new information-theoretic approach is presented for finding the pose of an object in an image. The technique does not require information about the surface properties of the object, besides its shape, and is robust with respect to variations of illumination. In our derivation, few assumptions are made about the nature of the imaging process. As a result the algorithms are quite general and can foreseeably be used in a wide variety of imaging situations. Experiments are presented that demonstrate the approach registering magnetic resonance (MR) images with computed tomography (CT) images, aligning a complex 3D object model to real scenes including clutter and occlusion, tracking a human head in a video sequence and aligning a view-based 2D object model to real images. The method is based on a formulation of the mutual information between the model and the image called EMMA. As applied here the technique is intensity-based, rather than feature-based. It works well in domains where edge or gradient-magnitude based methods have difficulty, yet it is more robust than traditional correlation. Additionally, it has an efficient implementation that is based on stochastic approximation. Finally, we will describe a number of additional real-world applications that can be solved efficiently and reliably using EMMA. EMMA can be used in machine learning to find maximally informative projections of high-dimensional data. EMMA can also be used to detect and correct corruption in magnetic resonance images (MRI).
Resumo:
La optimización y armonización son factores clave para tener un buen desempeño en la industria química. BASF ha desarrollado un proyecto llamada acelerador. El objetivo de este proyecto ha sido la armonización y la integración de los procesos de la cadena de suministro a nivel mundial. El proceso básico de manejo de inventarios se quedó fuera del proyecto y debía ser analizado. El departamento de manejo de inventarios en BASF SE ha estado desarrollando su propia estrategia para la definición de procesos globales de manufactura. En este trabajo se presentará un informe de las fases de la formulación de la estrategia y establecer algunas pautas para la fase de implementación que está teniendo lugar en 2012 y 2013.
Resumo:
In the present paper we discuss and compare two different energy decomposition schemes: Mayer's Hartree-Fock energy decomposition into diatomic and monoatomic contributions [Chem. Phys. Lett. 382, 265 (2003)], and the Ziegler-Rauk dissociation energy decomposition [Inorg. Chem. 18, 1558 (1979)]. The Ziegler-Rauk scheme is based on a separation of a molecule into fragments, while Mayer's scheme can be used in the cases where a fragmentation of the system in clearly separable parts is not possible. In the Mayer scheme, the density of a free atom is deformed to give the one-atom Mulliken density that subsequently interacts to give rise to the diatomic interaction energy. We give a detailed analysis of the diatomic energy contributions in the Mayer scheme and a close look onto the one-atom Mulliken densities. The Mulliken density ρA has a single large maximum around the nuclear position of the atom A, but exhibits slightly negative values in the vicinity of neighboring atoms. The main connecting point between both analysis schemes is the electrostatic energy. Both decomposition schemes utilize the same electrostatic energy expression, but differ in how fragment densities are defined. In the Mayer scheme, the electrostatic component originates from the interaction of the Mulliken densities, while in the Ziegler-Rauk scheme, the undisturbed fragment densities interact. The values of the electrostatic energy resulting from the two schemes differ significantly but typically have the same order of magnitude. Both methods are useful and complementary since Mayer's decomposition focuses on the energy of the finally formed molecule, whereas the Ziegler-Rauk scheme describes the bond formation starting from undeformed fragment densities