2 resultados para hierarchical clustering techniques

em Illinois Digital Environment for Access to Learning and Scholarship Repository


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A fundamental step in understanding the effects of irradiation on metallic uranium and uranium dioxide ceramic fuels, or any material, must start with the nature of radiation damage on the atomic level. The atomic damage displacement results in a multitude of defects that influence the fuel performance. Nuclear reactions are coupled, in that changing one variable will alter others through feedback. In the field of fuel performance modeling, these difficulties are addressed through the use of empirical models rather than models based on first principles. Empirical models can be used as a predictive code through the careful manipulation of input variables for the limited circumstances that are closely tied to the data used to create the model. While empirical models are efficient and give acceptable results, these results are only applicable within the range of the existing data. This narrow window prevents modeling changes in operating conditions that would invalidate the model as the new operating conditions would not be within the calibration data set. This work is part of a larger effort to correct for this modeling deficiency. Uranium dioxide and metallic uranium fuels are analyzed through a kinetic Monte Carlo code (kMC) as part of an overall effort to generate a stochastic and predictive fuel code. The kMC investigations include sensitivity analysis of point defect concentrations, thermal gradients implemented through a temperature variation mesh-grid, and migration energy values. In this work, fission damage is primarily represented through defects on the oxygen anion sublattice. Results were also compared between the various models. Past studies of kMC point defect migration have not adequately addressed non-standard migration events such as clustering and dissociation of vacancies. As such, the General Utility Lattice Program (GULP) code was utilized to generate new migration energies so that additional non-migration events could be included into kMC code in the future for more comprehensive studies. Defect energies were calculated to generate barrier heights for single vacancy migration, clustering and dissociation of two vacancies, and vacancy migration while under the influence of both an additional oxygen and uranium vacancy.