3 resultados para Covariance matrix estimation
em National Center for Biotechnology Information - NCBI
Resumo:
Patterns in sequences of amino acid hydrophobic free energies predict secondary structures in proteins. In protein folding, matches in hydrophobic free energy statistical wavelengths appear to contribute to selective aggregation of secondary structures in “hydrophobic zippers.” In a similar setting, the use of Fourier analysis to characterize the dominant statistical wavelengths of peptide ligands’ and receptor proteins’ hydrophobic modes to predict such matches has been limited by the aliasing and end effects of short peptide lengths, as well as the broad-band, mode multiplicity of many of their frequency (power) spectra. In addition, the sequence locations of the matching modes are lost in this transformation. We make new use of three techniques to address these difficulties: (i) eigenfunction construction from the linear decomposition of the lagged covariance matrices of the ligands and receptors as hydrophobic free energy sequences; (ii) maximum entropy, complex poles power spectra, which select the dominant modes of the hydrophobic free energy sequences or their eigenfunctions; and (iii) discrete, best bases, trigonometric wavelet transformations, which confirm the dominant spectral frequencies of the eigenfunctions and locate them as (absolute valued) moduli in the peptide or receptor sequence. The leading eigenfunction of the covariance matrix of a transmembrane receptor sequence locates the same transmembrane segments seen in n-block-averaged hydropathy plots while leaving the remaining hydrophobic modes unsmoothed and available for further analyses as secondary eigenfunctions. In these receptor eigenfunctions, we find a set of statistical wavelength matches between peptide ligands and their G-protein and tyrosine kinase coupled receptors, ranging across examples from 13.10 amino acids in acid fibroblast growth factor to 2.18 residues in corticotropin releasing factor. We find that the wavelet-located receptor modes in the extracellular loops are compatible with studies of receptor chimeric exchanges and point mutations. A nonbinding corticotropin-releasing factor receptor mutant is shown to have lost the signatory mode common to the normal receptor and its ligand. Hydrophobic free energy eigenfunctions and their transformations offer new quantitative physical homologies in database searches for peptide-receptor matches.
Resumo:
As additivity is a very useful property for a distance measure, a general additive distance is proposed under the stationary time-reversible (SR) model of nucleotide substitution or, more generally, under the stationary, time-reversible, and rate variable (SRV) model, which allows rate variation among nucleotide sites. A method for estimating the mean distance and the sampling variance is developed. In addition, a method is developed for estimating the variance-covariance matrix of distances, which is useful for the statistical test of phylogenies and molecular clocks. Computer simulation shows (i) if the sequences are longer than, say, 1000 bp, the SR method is preferable to simpler methods; (ii) the SR method is robust against deviations from time-reversibility; (iii) when the rate varies among sites, the SRV method is much better than the SR method because the distance is seriously underestimated by the SR method; and (iv) our method for estimating the sampling variance is accurate for sequences longer than 500 bp. Finally, a test is constructed for testing whether DNA evolution follows a general Markovian model.
Resumo:
A maximum likelihood estimator based on the coalescent for unequal migration rates and different subpopulation sizes is developed. The method uses a Markov chain Monte Carlo approach to investigate possible genealogies with branch lengths and with migration events. Properties of the new method are shown by using simulated data from a four-population n-island model and a source–sink population model. Our estimation method as coded in migrate is tested against genetree; both programs deliver a very similar likelihood surface. The algorithm converges to the estimates fairly quickly, even when the Markov chain is started from unfavorable parameters. The method was used to estimate gene flow in the Nile valley by using mtDNA data from three human populations.