2 resultados para Analysis of principal component
em Illinois Digital Environment for Access to Learning and Scholarship Repository
Resumo:
Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.
Resumo:
This thesis presents an analysis of the largest catalog to date of infrared spectra of massive young stellar objects in the Large Magellanic Cloud. Evidenced by their very different spectral features, the luminous objects span a range of evolutionary states from those most embedded in their natal molecular material to those that have dissipated and ionized their surroundings to form compact HII regions and photodissociation regions. We quantify the contributions of the various spectral features using the statistical method of principal component analysis. Using this analysis, we classify the YSO spectra into several distinct groups based upon their dominant spectral features: silicate absorption (S Group), silicate absorption and fine-structure line emission (SE), polycyclic aromatic hydrocarbon (PAH) emission (P Group), PAH and fine-structure line emission (PE), and only fine-structure line emission (E). Based upon the relative numbers of sources in each category, we are able to estimate the amount of time massive YSOs spend in each evolutionary stage. We find that approximately 50% of the sources have ionic fine-structure lines, indicating that a compact HII region forms about half-way through the YSO lifetime probed in our study. Of the 277 YSOs we collected spectra for, 41 have ice absorption features, indicating they are surrounded by cold ice-bearing dust particles. We have decomposed the shape of the ice features to probe the composition and thermal history of the ice. We find that most the CO2 ice is embedded a polar ice matrix that has been thermally processed by the embedded YSO. The amount of thermal processing may be correlated with the luminosity of the YSO. Using the Australia Telescope Compact Array, we imaged the dense gas around a subsample of our sources in the HII complexes N44, N105, N113, and N159 using HCO+ and HCN as dense gas tracers. We find that the molecular material in star forming environments is highly clumpy, with clumps that range from subparsec to ~2 parsecs in size and with masses between 10^2 to 10^4 solar masses. We find that there are varying levels of star formation in the clumps, with the lower-mass clumps tending to be without massive YSOs. These YSO-less clumps could either represent an earlier stage of clump to the more massive YSO-bearing ones or clumps that will never form a massive star. Clumps with massive YSOs at their centers have masses larger than those with massive YSOs at their edges, and we suggest that the difference is evolutionary: edge YSO clumps are more advanced than those with YSOs at their centers. Clumps with YSOs at their edges may have had a significant fraction of their mass disrupted or destroyed by the forming massive star. We find that the strength of the silicate absorption seen in YSO IR spectra feature is well-correlated with the on-source HCO+ and HCN flux densities, such that the strength of the feature is indicative of the embeddedness of the YSO. We estimate that ~40% of the entire spectral sample has strong silicate absorption features, implying that the YSOs are embedded in circumstellar material for about 40% of the time probed in our study.