997 resultados para automated modeling
Resumo:
The VLT-FLAMES Tarantula Survey (VFTS) has secured mid-resolution spectra of over 300 O-type stars in the 30 Doradus region of the Large Magellanic Cloud. A homogeneous analysis of such a large sample requires automated techniques, an approach that will also be needed for the upcoming analysis of the Gaia surveys of the Northern and Southern Hemisphere supplementing the Gaia measurements. We point out the importance of Gaia for the study of O stars, summarize the O star science case of VFTS and present a test of the automated modeling technique using synthetically generated data. This method employs a genetic algorithm based optimization technique in combination with fastwind model atmospheres. The method is found to be robust and able to recover the main photospheric parameters accurately. Precise wind parameters can be obtained as well, however, as expected, for dwarf stars the rate of acceleration of the ow is poorly constrained.
Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
Resumo:
Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.
Resumo:
In the context of the investigation of the use of automated fingerprint identification systems (AFIS) for the evaluation of fingerprint evidence, the current study presents investigations into the variability of scores from an AFIS system when fingermarks from a known donor are compared to fingerprints that are not from the same source. The ultimate goal is to propose a model, based on likelihood ratios, which allows the evaluation of mark-to-print comparisons. In particular, this model, through its use of AFIS technology, benefits from the possibility of using a large amount of data, as well as from an already built-in proximity measure, the AFIS score. More precisely, the numerator of the LR is obtained from scores issued from comparisons between impressions from the same source and showing the same minutia configuration. The denominator of the LR is obtained by extracting scores from comparisons of the questioned mark with a database of non-matching sources. This paper focuses solely on the assignment of the denominator of the LR. We refer to it by the generic term of between-finger variability. The issues addressed in this paper in relation to between-finger variability are the required sample size, the influence of the finger number and general pattern, as well as that of the number of minutiae included and their configuration on a given finger. Results show that reliable estimation of between-finger variability is feasible with 10,000 scores. These scores should come from the appropriate finger number/general pattern combination as defined by the mark. Furthermore, strategies of obtaining between-finger variability when these elements cannot be conclusively seen on the mark (and its position with respect to other marks for finger number) have been presented. These results immediately allow case-by-case estimation of the between-finger variability in an operational setting.
Resumo:
The importance of efficient supply chain management has increased due to globalization and the blurring of organizational boundaries. Various supply chain management technologies have been identified to drive organizational profitability and financial performance. Organizations have historically been concentrating heavily on the flow of goods and services, while less attention has been dedicated to the flow of money. While supply chains are becoming more transparent and automated, new opportunities for financial supply chain management have emerged through information technology solutions and comprehensive financial supply chain management strategies. This research concentrates on the end part of the purchasing process which is the handling of invoices. Efficient invoice processing can have an impact on organizations working capital management and thus provide companies with better readiness to face the challenges related to cash management. Leveraging a process mining solution the aim of this research was to examine the automated invoice handling process of four different organizations. The invoice data was collected from each organizations invoice processing system. The sample included all the invoices organizations had processed during the year 2012. The main objective was to find out whether e-invoices are faster to process in an automated invoice processing solution than scanned invoices (post entry into invoice processing solution). Other objectives included looking into the longest lead times between process steps and the impact of manual process steps on cycle time. Processing of invoices from maverick purchases was also examined. Based on the results of the research and previous literature on the subject, suggestions for improving the process were proposed. The results of the research indicate that scanned invoices were processed faster than e-invoices. This is mostly due to the more complex processing of e-invoices. It should be noted however that the manual tasks related to turning a paper invoice into electronic format through scanning are ignored in this research. The transitions with the longest lead times in the invoice handling process included both pre-automated steps as well as manual steps performed by humans. When the most common manual steps were examined in more detail, it was clear that these steps had a prolonging impact on the process. Regarding invoices from maverick purchases the evidence shows that these invoices were slower to process than invoices from purchases conducted through e-procurement systems and from preferred suppliers. Suggestions on how to improve the process included: increasing invoice matching, reducing of manual steps and leveraging of different value added services such as invoice validation service, mobile solutions and supply chain financing services. For companies that have already reaped all the process efficiencies the next step is to engage in collaborative financial supply chain management strategies that can benefit the whole supply chain.
Resumo:
Parmodel is a web server for automated comparative modeling and evaluation of protein structures. The aim of this tool is to help inexperienced users to perform modeling, assessment, visualization, and optimization of protein models as well as crystallographers to evaluate structures solved experimentally. It is subdivided in four modules: Parmodel Modeling, Parmodel Assessment, Parmodel Visualization, and Parmodel Optimization. The main module is the Parmodel Modeling that allows the building of several models ford a same protein in a reduced time, through the distribution of modeling processes on a Beowulf cluster. Parmodel automates and integrates the main softwares used in comparative modeling as MODELLER, Whatcheck, Procheck, Raster3D, Molscript, and Gromacs. This web server is freely accessible at http://www.biocristalografia.df.ibilce.unesp.br/tools/parmodel. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
PhD Thesis in Bioengineering
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
Abstract : In the subject of fingerprints, the rise of computers tools made it possible to create powerful automated search algorithms. These algorithms allow, inter alia, to compare a fingermark to a fingerprint database and therefore to establish a link between the mark and a known source. With the growth of the capacities of these systems and of data storage, as well as increasing collaboration between police services on the international level, the size of these databases increases. The current challenge for the field of fingerprint identification consists of the growth of these databases, which makes it possible to find impressions that are very similar but coming from distinct fingers. However and simultaneously, this data and these systems allow a description of the variability between different impressions from a same finger and between impressions from different fingers. This statistical description of the withinand between-finger variabilities computed on the basis of minutiae and their relative positions can then be utilized in a statistical approach to interpretation. The computation of a likelihood ratio, employing simultaneously the comparison between the mark and the print of the case, the within-variability of the suspects' finger and the between-variability of the mark with respect to a database, can then be based on representative data. Thus, these data allow an evaluation which may be more detailed than that obtained by the application of rules established long before the advent of these large databases or by the specialists experience. The goal of the present thesis is to evaluate likelihood ratios, computed based on the scores of an automated fingerprint identification system when the source of the tested and compared marks is known. These ratios must support the hypothesis which it is known to be true. Moreover, they should support this hypothesis more and more strongly with the addition of information in the form of additional minutiae. For the modeling of within- and between-variability, the necessary data were defined, and acquired for one finger of a first donor, and two fingers of a second donor. The database used for between-variability includes approximately 600000 inked prints. The minimal number of observations necessary for a robust estimation was determined for the two distributions used. Factors which influence these distributions were also analyzed: the number of minutiae included in the configuration and the configuration as such for both distributions, as well as the finger number and the general pattern for between-variability, and the orientation of the minutiae for within-variability. In the present study, the only factor for which no influence has been shown is the orientation of minutiae The results show that the likelihood ratios resulting from the use of the scores of an AFIS can be used for evaluation. Relatively low rates of likelihood ratios supporting the hypothesis known to be false have been obtained. The maximum rate of likelihood ratios supporting the hypothesis that the two impressions were left by the same finger when the impressions came from different fingers obtained is of 5.2 %, for a configuration of 6 minutiae. When a 7th then an 8th minutia are added, this rate lowers to 3.2 %, then to 0.8 %. In parallel, for these same configurations, the likelihood ratios obtained are on average of the order of 100,1000, and 10000 for 6,7 and 8 minutiae when the two impressions come from the same finger. These likelihood ratios can therefore be an important aid for decision making. Both positive evolutions linked to the addition of minutiae (a drop in the rates of likelihood ratios which can lead to an erroneous decision and an increase in the value of the likelihood ratio) were observed in a systematic way within the framework of the study. Approximations based on 3 scores for within-variability and on 10 scores for between-variability were found, and showed satisfactory results. Résumé : Dans le domaine des empreintes digitales, l'essor des outils informatisés a permis de créer de puissants algorithmes de recherche automatique. Ces algorithmes permettent, entre autres, de comparer une trace à une banque de données d'empreintes digitales de source connue. Ainsi, le lien entre la trace et l'une de ces sources peut être établi. Avec la croissance des capacités de ces systèmes, des potentiels de stockage de données, ainsi qu'avec une collaboration accrue au niveau international entre les services de police, la taille des banques de données augmente. Le défi actuel pour le domaine de l'identification par empreintes digitales consiste en la croissance de ces banques de données, qui peut permettre de trouver des impressions très similaires mais provenant de doigts distincts. Toutefois et simultanément, ces données et ces systèmes permettent une description des variabilités entre différentes appositions d'un même doigt, et entre les appositions de différents doigts, basées sur des larges quantités de données. Cette description statistique de l'intra- et de l'intervariabilité calculée à partir des minuties et de leurs positions relatives va s'insérer dans une approche d'interprétation probabiliste. Le calcul d'un rapport de vraisemblance, qui fait intervenir simultanément la comparaison entre la trace et l'empreinte du cas, ainsi que l'intravariabilité du doigt du suspect et l'intervariabilité de la trace par rapport à une banque de données, peut alors se baser sur des jeux de données représentatifs. Ainsi, ces données permettent d'aboutir à une évaluation beaucoup plus fine que celle obtenue par l'application de règles établies bien avant l'avènement de ces grandes banques ou par la seule expérience du spécialiste. L'objectif de la présente thèse est d'évaluer des rapports de vraisemblance calcul és à partir des scores d'un système automatique lorsqu'on connaît la source des traces testées et comparées. Ces rapports doivent soutenir l'hypothèse dont il est connu qu'elle est vraie. De plus, ils devraient soutenir de plus en plus fortement cette hypothèse avec l'ajout d'information sous la forme de minuties additionnelles. Pour la modélisation de l'intra- et l'intervariabilité, les données nécessaires ont été définies, et acquises pour un doigt d'un premier donneur, et deux doigts d'un second donneur. La banque de données utilisée pour l'intervariabilité inclut environ 600000 empreintes encrées. Le nombre minimal d'observations nécessaire pour une estimation robuste a été déterminé pour les deux distributions utilisées. Des facteurs qui influencent ces distributions ont, par la suite, été analysés: le nombre de minuties inclus dans la configuration et la configuration en tant que telle pour les deux distributions, ainsi que le numéro du doigt et le dessin général pour l'intervariabilité, et la orientation des minuties pour l'intravariabilité. Parmi tous ces facteurs, l'orientation des minuties est le seul dont une influence n'a pas été démontrée dans la présente étude. Les résultats montrent que les rapports de vraisemblance issus de l'utilisation des scores de l'AFIS peuvent être utilisés à des fins évaluatifs. Des taux de rapports de vraisemblance relativement bas soutiennent l'hypothèse que l'on sait fausse. Le taux maximal de rapports de vraisemblance soutenant l'hypothèse que les deux impressions aient été laissées par le même doigt alors qu'en réalité les impressions viennent de doigts différents obtenu est de 5.2%, pour une configuration de 6 minuties. Lorsqu'une 7ème puis une 8ème minutie sont ajoutées, ce taux baisse d'abord à 3.2%, puis à 0.8%. Parallèlement, pour ces mêmes configurations, les rapports de vraisemblance sont en moyenne de l'ordre de 100, 1000, et 10000 pour 6, 7 et 8 minuties lorsque les deux impressions proviennent du même doigt. Ces rapports de vraisemblance peuvent donc apporter un soutien important à la prise de décision. Les deux évolutions positives liées à l'ajout de minuties (baisse des taux qui peuvent amener à une décision erronée et augmentation de la valeur du rapport de vraisemblance) ont été observées de façon systématique dans le cadre de l'étude. Des approximations basées sur 3 scores pour l'intravariabilité et sur 10 scores pour l'intervariabilité ont été trouvées, et ont montré des résultats satisfaisants.
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot
Resumo:
A method for determining soil hydraulic properties of a weathered tropical soil (Oxisol) using a medium-sized column with undisturbed soil is presented. The method was used to determine fitting parameters of the water retention curve and hydraulic conductivity functions of a soil column in support of a pesticide leaching study. The soil column was extracted from a continuously-used research plot in Central Oahu (Hawaii, USA) and its internal structure was examined by computed tomography. The experiment was based on tension infiltration into the soil column with free outflow at the lower end. Water flow through the soil core was mathematically modeled using a computer code that numerically solves the one-dimensional Richards equation. Measured soil hydraulic parameters were used for direct simulation, and the retention and soil hydraulic parameters were estimated by inverse modeling. The inverse modeling produced very good agreement between model outputs and measured flux and pressure head data for the relatively homogeneous column. The moisture content at a given pressure from the retention curve measured directly in small soil samples was lower than that obtained through parameter optimization based on experiments using a medium-sized undisturbed soil column.
Resumo:
TCRep 3D is an automated systematic approach for TCR-peptide-MHC class I structure prediction, based on homology and ab initio modeling. It has been considerably generalized from former studies to be applicable to large repertoires of TCR. First, the location of the complementary determining regions of the target sequences are automatically identified by a sequence alignment strategy against a database of TCR Vα and Vβ chains. A structure-based alignment ensures automated identification of CDR3 loops. The CDR are then modeled in the environment of the complex, in an ab initio approach based on a simulated annealing protocol. During this step, dihedral restraints are applied to drive the CDR1 and CDR2 loops towards their canonical conformations, described by Al-Lazikani et. al. We developed a new automated algorithm that determines additional restraints to iteratively converge towards TCR conformations making frequent hydrogen bonds with the pMHC. We demonstrated that our approach outperforms popular scoring methods (Anolea, Dope and Modeller) in predicting relevant CDR conformations. Finally, this modeling approach has been successfully applied to experimentally determined sequences of TCR that recognize the NY-ESO-1 cancer testis antigen. This analysis revealed a mechanism of selection of TCR through the presence of a single conserved amino acid in all CDR3β sequences. The important structural modifications predicted in silico and the associated dramatic loss of experimental binding affinity upon mutation of this amino acid show the good correspondence between the predicted structures and their biological activities. To our knowledge, this is the first systematic approach that was developed for large TCR repertoire structural modeling.
Resumo:
The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.
Resumo:
High-energy charged particles in the van Allen radiation belts and in solar energetic particle events can damage satellites on orbit leading to malfunctions and loss of satellite service. Here we describe some recent results from the SPACECAST project on modelling and forecasting the radiation belts, and modelling solar energetic particle events. We describe the SPACECAST forecasting system that uses physical models that include wave-particle interactions to forecast the electron radiation belts up to 3 h ahead. We show that the forecasts were able to reproduce the >2 MeV electron flux at GOES 13 during the moderate storm of 7-8 October 2012, and the period following a fast solar wind stream on 25-26 October 2012 to within a factor of 5 or so. At lower energies of 10- a few 100 keV we show that the electron flux at geostationary orbit depends sensitively on the high-energy tail of the source distribution near 10 RE on the nightside of the Earth, and that the source is best represented by a kappa distribution. We present a new model of whistler mode chorus determined from multiple satellite measurements which shows that the effects of wave-particle interactions beyond geostationary orbit are likely to be very significant. We also present radial diffusion coefficients calculated from satellite data at geostationary orbit which vary with Kp by over four orders of magnitude. We describe a new automated method to determine the position at the shock that is magnetically connected to the Earth for modelling solar energetic particle events and which takes into account entropy, and predict the form of the mean free path in the foreshock, and particle injection efficiency at the shock from analytical theory which can be tested in simulations.
Resumo:
High-energy charged particles in the van Allen radiation belts and in solar energetic particle events can damage satellites on orbit leading to malfunctions and loss of satellite service. Here we describe some recent results from the SPACECAST project on modelling and forecasting the radiation belts, and modelling solar energetic particle events. We describe the SPACECAST forecasting system that uses physical models that include wave-particle interactions to forecast the electron radiation belts up to 3 h ahead. We show that the forecasts were able to reproduce the >2 MeV electron flux at GOES 13 during the moderate storm of 7-8 October 2012, and the period following a fast solar wind stream on 25-26 October 2012 to within a factor of 5 or so. At lower energies of 10- a few 100 keV we show that the electron flux at geostationary orbit depends sensitively on the high-energy tail of the source distribution near 10 RE on the nightside of the Earth, and that the source is best represented by a kappa distribution. We present a new model of whistler mode chorus determined from multiple satellite measurements which shows that the effects of wave-particle interactions beyond geostationary orbit are likely to be very significant. We also present radial diffusion coefficients calculated from satellite data at geostationary orbit which vary with Kp by over four orders of magnitude. We describe a new automated method to determine the position at the shock that is magnetically connected to the Earth for modelling solar energetic particle events and which takes into account entropy, and predict the form of the mean free path in the foreshock, and particle injection efficiency at the shock from analytical theory which can be tested in simulations.
Resumo:
ABSTRACT This study aimed to verify the differences in radiation intensity as a function of distinct relief exposure surfaces and to quantify these effects on the leaf area index (LAI) and other variables expressing eucalyptus forest productivity for simulations in a process-based growth model. The study was carried out at two contrasting edaphoclimatic locations in the Rio Doce basin in Minas Gerais, Brazil. Two stands with 32-year-old plantations were used, allocating fixed plots in locations with northern and southern exposure surfaces. The meteorological data were obtained from two automated weather stations located near the study sites. Solar radiation was corrected for terrain inclination and exposure surfaces, as it is measured based on the plane, perpendicularly to the vertical location. The LAI values collected in the field were used. For the comparative simulations in productivity variation, the mechanistic 3PG model was used, considering the relief exposure surfaces. It was verified that during most of the year, the southern surfaces showed lower availability of incident solar radiation, resulting in up to 66% losses, compared to the same surface considered plane, probably related to its geographical location and higher declivity. Higher values were obtained for the plantings located on the northern surface for the variables LAI, volume and mean annual wood increase, with this tendency being repeated in the 3PG model simulations.