897 resultados para DNA Sequence, Hidden Markov Model, Bayesian Model, Sensitive Analysis, Markov Chain Monte Carlo
Credit risk contributions under the Vasicek one-factor model: a fast wavelet expansion approximation
Resumo:
To measure the contribution of individual transactions inside the total risk of a credit portfolio is a major issue in financial institutions. VaR Contributions (VaRC) and Expected Shortfall Contributions (ESC) have become two popular ways of quantifying the risks. However, the usual Monte Carlo (MC) approach is known to be a very time consuming method for computing these risk contributions. In this paper we consider the Wavelet Approximation (WA) method for Value at Risk (VaR) computation presented in [Mas10] in order to calculate the Expected Shortfall (ES) and the risk contributions under the Vasicek one-factor model framework. We decompose the VaR and the ES as a sum of sensitivities representing the marginal impact on the total portfolio risk. Moreover, we present technical improvements in the Wavelet Approximation (WA) that considerably reduce the computational effort in the approximation while, at the same time, the accuracy increases.
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
DNA sequence variation has been associated with quantitative changes in molecular phenotypes such as gene expression, but its impact on chromatin states is poorly characterized. To understand the interplay between chromatin and genetic control of gene regulation, we quantified allelic variability in transcription factor binding, histone modifications, and gene expression within humans. We found abundant allelic specificity in chromatin and extensive local, short-range, and long-range allelic coordination among the studied molecular phenotypes. We observed genetic influence on most of these phenotypes, with histone modifications exhibiting strong context-dependent behavior. Our results implicate transcription factors as primary mediators of sequence-specific regulation of gene expression programs, with histone modifications frequently reflecting the primary regulatory event.
Resumo:
The genetic characterization of unbalanced mixed stains remains an important area where improvement is imperative. In fact, with current methods for DNA analysis (Polymerase Chain Reaction with the SGM Plus™ multiplex kit), it is generally not possible to obtain a conventional autosomal DNA profile of the minor contributor if the ratio between the two contributors in a mixture is smaller than 1:10. This is a consequence of the fact that the major contributor's profile 'masks' that of the minor contributor. Besides known remedies to this problem, such as Y-STR analysis, a new compound genetic marker that consists of a Deletion/Insertion Polymorphism (DIP), linked to a Short Tandem Repeat (STR) polymorphism, has recently been developed and proposed elsewhere in literature [1]. The present paper reports on the derivation of an approach for the probabilistic evaluation of DIP-STR profiling results obtained from unbalanced DNA mixtures. The procedure is based on object-oriented Bayesian networks (OOBNs) and uses the likelihood ratio as an expression of the probative value. OOBNs are retained in this paper because they allow one to provide a clear description of the genotypic configuration observed for the mixed stain as well as for the various potential contributors (e.g., victim and suspect). These models also allow one to depict the assumed relevance relationships and perform the necessary probabilistic computations.
Resumo:
Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Resumo:
Purpose: Previously we reported on a premature termination mutation in SLC16A12 that leads to dominant juvenile cataract and renal glucosuria. To assess the mutation rate and genotype-phenotype correlations of SLC16A12 in juvenile or age-related forms of cataract, we performed a mutation screen in cataract patients. Methods: Clinical data of approximately 660 patients were collected, genomic DNA was isolated and analyzed. Exons 3 to 8 including flanking intron sequences of SLC16A12 were PCR amplified and DNA sequence was determined. Selected mutations were tested by cell culture assays, in silico analysis and RT-PCR. Results: We found sequence alterations at a rate of approximately 1/75 patients. None of them was found in 360 control alleles. Alterations affect splice site and regulatory region but most mutations caused an amino acid substitution. The majority of the coding region mutations maps to trans-membrane domains. One mutation located to the 5'UTR. It affects translational efficiency of SLC16A12. In addition, we identified a cataract-predisposing SNP in the non-coding region that causes allele-specific splicing of the 5'UTR region. Conclusions: Altered translational efficiency of the solute carrier SLC16A12 and its allele-specific splicing strongly support a model of challenged homeostasis to cause various forms of cataract. In addition, the pathogenic property of the here reported sequence alterations is supported by the lack of known sequence variations within the coding region of SLC16A12. Due to the relatively high mutation rate, we suggest to include SLC16A12 in diagnostic cataract screening. Generally, our data recommend the assessment of regulatory sequences for diagnostic purposes.
Resumo:
Little is known about the relation between the genome organization and gene expression in Leishmania. Bioinformatic analysis can be used to predict genes and find homologies with known proteins. A model was proposed, in which genes are organized into large clusters and transcribed from only one strand, in the form of large polycistronic primary transcripts. To verify the validity of this model, we studied gene expression at the transcriptional, post-transcriptional and translational levels in a unique locus of 34kb located on chr27 and represented by cosmid L979. Sequence analysis revealed 115 ORFs on either DNA strand. Using computer programs developed for Leishmania genes, only nine of these ORFs, localized on the same strand, were predicted to code for proteins, some of which show homologies with known proteins. Additionally, one pseudogene, was identified. We verified the biological relevance of these predictions. mRNAs from nine predicted genes and proteins from seven were detected. Nuclear run-on analyses confirmed that the top strand is transcribed by RNA polymerase II and suggested that there is no polymerase entry site. Low levels of transcription were detected in regions of the bottom strand and stable transcripts were identified for four ORFs on this strand not predicted to be protein-coding. In conclusion, the transcriptional organization of the Leishmania genome is complex, raising the possibility that computer predictions may not be comprehensive.
Resumo:
In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.
Resumo:
PURPOSE: In the radiopharmaceutical therapy approach to the fight against cancer, in particular when it comes to translating laboratory results to the clinical setting, modeling has served as an invaluable tool for guidance and for understanding the processes operating at the cellular level and how these relate to macroscopic observables. Tumor control probability (TCP) is the dosimetric end point quantity of choice which relates to experimental and clinical data: it requires knowledge of individual cellular absorbed doses since it depends on the assessment of the treatment's ability to kill each and every cell. Macroscopic tumors, seen in both clinical and experimental studies, contain too many cells to be modeled individually in Monte Carlo simulation; yet, in particular for low ratios of decays to cells, a cell-based model that does not smooth away statistical considerations associated with low activity is a necessity. The authors present here an adaptation of the simple sphere-based model from which cellular level dosimetry for macroscopic tumors and their end point quantities, such as TCP, may be extrapolated more reliably. METHODS: Ten homogenous spheres representing tumors of different sizes were constructed in GEANT4. The radionuclide 131I was randomly allowed to decay for each model size and for seven different ratios of number of decays to number of cells, N(r): 1000, 500, 200, 100, 50, 20, and 10 decays per cell. The deposited energy was collected in radial bins and divided by the bin mass to obtain the average bin absorbed dose. To simulate a cellular model, the number of cells present in each bin was calculated and an absorbed dose attributed to each cell equal to the bin average absorbed dose with a randomly determined adjustment based on a Gaussian probability distribution with a width equal to the statistical uncertainty consistent with the ratio of decays to cells, i.e., equal to Nr-1/2. From dose volume histograms the surviving fraction of cells, equivalent uniform dose (EUD), and TCP for the different scenarios were calculated. Comparably sized spherical models containing individual spherical cells (15 microm diameter) in hexagonal lattices were constructed, and Monte Carlo simulations were executed for all the same previous scenarios. The dosimetric quantities were calculated and compared to the adjusted simple sphere model results. The model was then applied to the Bortezomib-induced enzyme-targeted radiotherapy (BETR) strategy of targeting Epstein-Barr virus (EBV)-expressing cancers. RESULTS: The TCP values were comparable to within 2% between the adjusted simple sphere and full cellular models. Additionally, models were generated for a nonuniform distribution of activity, and results were compared between the adjusted spherical and cellular models with similar comparability. The TCP values from the experimental macroscopic tumor results were consistent with the experimental observations for BETR-treated 1 g EBV-expressing lymphoma tumors in mice. CONCLUSIONS: The adjusted spherical model presented here provides more accurate TCP values than simple spheres, on par with full cellular Monte Carlo simulations while maintaining the simplicity of the simple sphere model. This model provides a basis for complementing and understanding laboratory and clinical results pertaining to radiopharmaceutical therapy.
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, andmargin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
We report a Spanish family with autosomal-dominant non-neuropathic hereditary amyloidosis with a unique hepatic presentation and death from liver failure, usually by the sixth decade. The disease is caused by a previously unreported deletion/insertion mutation in exon 4 of the apolipoprotein AI (apoAI) gene encoding loss of residues 60-71 of normal mature apoAI and insertion at that position of two new residues, ValThr. Affected individuals are heterozygous for this mutation and have both normal apoAI and variant molecules bearing one extra positive charge, as predicted from the DNA sequence. The amyloid fibrils are composed exclusively of NH2-terminal fragments of the variant, ending mainly at positions corresponding to residues 83 and 92 in the mature wild-type sequence. Amyloid fibrils derived from the other three known amyloidogenic apoAI variants are also composed of similar NH2-terminal fragments. All known amyloidogenic apoAI variants carry one extra positive charge in this region, suggesting that it may be responsible for their enhanced amyloidogenicity. In addition to causing a new phenotype, this is the first deletion mutation to be described in association with hereditary amyloidosis and it significantly extends the value of the apoAI model for investigation of molecular mechanisms of amyloid fibrillogenesis.
Resumo:
In a recent paper [Phys. Rev. B 50, 3477 (1994)], P. Fratzl and O. Penrose present the results of the Monte Carlo simulation of the spinodal decomposition problem (phase separation) using the vacancy dynamics mechanism. They observe that the t1/3 growth regime is reached faster than when using the standard Kawasaki dynamics. In this Comment we provide a simple explanation for the phenomenon based on the role of interface diffusion, which they claim is irrelevant for the observed behavior.
Resumo:
Gel electrophoresis can be used to separate nicked circular DNA molecules of equal length but forming different knot types. At low electric fields, complex knots drift faster than simpler knots. However, at high electric field the opposite is the case and simpler knots migrate faster than more complex knots. Using Monte Carlo simulations we investigate the reasons of this reversal of relative order of electrophoretic mobility of DNA molecules forming different knot types. We observe that at high electric fields the simulated knotted molecules tend to hang over the gel fibres and require passing over a substantial energy barrier to slip over the impeding gel fibre. At low electric field the interactions of drifting molecules with the gel fibres are weak and there are no significant energy barriers that oppose the detachment of knotted molecules from transverse gel fibres.
Resumo:
A new arena for the dynamics of spacetime is proposed, in which the basic quantum variable is the two-point distance on a metric space. The scaling dimension (that is, the Kolmogorov capacity) in the neighborhood of each point then defines in a natural way a local concept of dimension. We study our model in the region of parameter space in which the resulting spacetime is not too different from a smooth manifold.
Resumo:
We have analyzed a two-dimensional lattice-gas model of cylindrical molecules which can exhibit four possible orientations. The Hamiltonian of the model contains positional and orientational energy interaction terms. The ground state of the model has been investigated on the basis of Karl¿s theorem. Monte Carlo simulation results have confirmed the predicted ground state. The model is able to reproduce, with appropriate values of the Hamiltonian parameters, both, a smectic-nematic-like transition and a nematic-isotropic-like transition. We have also analyzed the phase diagram of the system by mean-field techniques and Monte Carlo simulations. Mean-field calculations agree well qualitatively with Monte Carlo results but overestimate transition temperatures.