10 resultados para high dimensional growing self organizing map with randomness
em CaltechTHESIS
Resumo:
There is a growing interest in taking advantage of possible patterns and structures in data so as to extract the desired information and overcome the curse of dimensionality. In a wide range of applications, including computer vision, machine learning, medical imaging, and social networks, the signal that gives rise to the observations can be modeled to be approximately sparse and exploiting this fact can be very beneficial. This has led to an immense interest in the problem of efficiently reconstructing a sparse signal from limited linear observations. More recently, low-rank approximation techniques have become prominent tools to approach problems arising in machine learning, system identification and quantum tomography.
In sparse and low-rank estimation problems, the challenge is the inherent intractability of the objective function, and one needs efficient methods to capture the low-dimensionality of these models. Convex optimization is often a promising tool to attack such problems. An intractable problem with a combinatorial objective can often be "relaxed" to obtain a tractable but almost as powerful convex optimization problem. This dissertation studies convex optimization techniques that can take advantage of low-dimensional representations of the underlying high-dimensional data. We provide provable guarantees that ensure that the proposed algorithms will succeed under reasonable conditions, and answer questions of the following flavor:
- For a given number of measurements, can we reliably estimate the true signal?
- If so, how good is the reconstruction as a function of the model parameters?
More specifically, i) Focusing on linear inverse problems, we generalize the classical error bounds known for the least-squares technique to the lasso formulation, which incorporates the signal model. ii) We show that intuitive convex approaches do not perform as well as expected when it comes to signals that have multiple low-dimensional structures simultaneously. iii) Finally, we propose convex relaxations for the graph clustering problem and give sharp performance guarantees for a family of graphs arising from the so-called stochastic block model. We pay particular attention to the following aspects. For i) and ii), we aim to provide a general geometric framework, in which the results on sparse and low-rank estimation can be obtained as special cases. For i) and iii), we investigate the precise performance characterization, which yields the right constants in our bounds and the true dependence between the problem parameters.
Resumo:
While some of the deepest results in nature are those that give explicit bounds between important physical quantities, some of the most intriguing and celebrated of such bounds come from fields where there is still a great deal of disagreement and confusion regarding even the most fundamental aspects of the theories. For example, in quantum mechanics, there is still no complete consensus as to whether the limitations associated with Heisenberg's Uncertainty Principle derive from an inherent randomness in physics, or rather from limitations in the measurement process itself, resulting from phenomena like back action. Likewise, the second law of thermodynamics makes a statement regarding the increase in entropy of closed systems, yet the theory itself has neither a universally-accepted definition of equilibrium, nor an adequate explanation of how a system with underlying microscopically Hamiltonian dynamics (reversible) settles into a fixed distribution.
Motivated by these physical theories, and perhaps their inconsistencies, in this thesis we use dynamical systems theory to investigate how the very simplest of systems, even with no physical constraints, are characterized by bounds that give limits to the ability to make measurements on them. Using an existing interpretation, we start by examining how dissipative systems can be viewed as high-dimensional lossless systems, and how taking this view necessarily implies the existence of a noise process that results from the uncertainty in the initial system state. This fluctuation-dissipation result plays a central role in a measurement model that we examine, in particular describing how noise is inevitably injected into a system during a measurement, noise that can be viewed as originating either from the randomness of the many degrees of freedom of the measurement device, or of the environment. This noise constitutes one component of measurement back action, and ultimately imposes limits on measurement uncertainty. Depending on the assumptions we make about active devices, and their limitations, this back action can be offset to varying degrees via control. It turns out that using active devices to reduce measurement back action leads to estimation problems that have non-zero uncertainty lower bounds, the most interesting of which arise when the observed system is lossless. One such lower bound, a main contribution of this work, can be viewed as a classical version of a Heisenberg uncertainty relation between the system's position and momentum. We finally also revisit the murky question of how macroscopic dissipation appears from lossless dynamics, and propose alternative approaches for framing the question using existing systematic methods of model reduction.
Resumo:
In response to infection or tissue dysfunction, immune cells develop into highly heterogeneous repertoires with diverse functions. Capturing the full spectrum of these functions requires analysis of large numbers of effector molecules from single cells. However, currently only 3-5 functional proteins can be measured from single cells. We developed a single cell functional proteomics approach that integrates a microchip platform with multiplex cell purification. This approach can quantitate 20 proteins from >5,000 phenotypically pure single cells simultaneously. With a 1-million fold miniaturization, the system can detect down to ~100 molecules and requires only ~104 cells. Single cell functional proteomic analysis finds broad applications in basic, translational and clinical studies. In the three studies conducted, it yielded critical insights for understanding clinical cancer immunotherapy, inflammatory bowel disease (IBD) mechanism and hematopoietic stem cell (HSC) biology.
To study phenotypically defined cell populations, single cell barcode microchips were coupled with upstream multiplex cell purification based on up to 11 parameters. Statistical algorithms were developed to process and model the high dimensional readouts. This analysis evaluates rare cells and is versatile for various cells and proteins. (1) We conducted an immune monitoring study of a phase 2 cancer cellular immunotherapy clinical trial that used T-cell receptor (TCR) transgenic T cells as major therapeutics to treat metastatic melanoma. We evaluated the functional proteome of 4 antigen-specific, phenotypically defined T cell populations from peripheral blood of 3 patients across 8 time points. (2) Natural killer (NK) cells can play a protective role in chronic inflammation and their surface receptor – killer immunoglobulin-like receptor (KIR) – has been identified as a risk factor of IBD. We compared the functional behavior of NK cells that had differential KIR expressions. These NK cells were retrieved from the blood of 12 patients with different genetic backgrounds. (3) HSCs are the progenitors of immune cells and are thought to have no immediate functional capacity against pathogen. However, recent studies identified expression of Toll-like receptors (TLRs) on HSCs. We studied the functional capacity of HSCs upon TLR activation. The comparison of HSCs from wild-type mice against those from genetics knock-out mouse models elucidates the responding signaling pathway.
In all three cases, we observed profound functional heterogeneity within phenotypically defined cells. Polyfunctional cells that conduct multiple functions also produce those proteins in large amounts. They dominate the immune response. In the cancer immunotherapy, the strong cytotoxic and antitumor functions from transgenic TCR T cells contributed to a ~30% tumor reduction immediately after the therapy. However, this infused immune response disappeared within 2-3 weeks. Later on, some patients gained a second antitumor response, consisted of the emergence of endogenous antitumor cytotoxic T cells and their production of multiple antitumor functions. These patients showed more effective long-term tumor control. In the IBD mechanism study, we noticed that, compared with others, NK cells expressing KIR2DL3 receptor secreted a large array of effector proteins, such as TNF-α, CCLs and CXCLs. The functions from these cells regulated disease-contributing cells and protected host tissues. Their existence correlated with IBD disease susceptibility. In the HSC study, the HSCs exhibited functional capacity by producing TNF-α, IL-6 and GM-CSF. TLR stimulation activated the NF-κB signaling in HSCs. Single cell functional proteome contains rich information that is independent from the genome and transcriptome. In all three cases, functional proteomic evaluation uncovered critical biological insights that would not be resolved otherwise. The integrated single cell functional proteomic analysis constructed a detail kinetic picture of the immune response that took place during the clinical cancer immunotherapy. It revealed concrete functional evidence that connected genetics to IBD disease susceptibility. Further, it provided predictors that correlated with clinical responses and pathogenic outcomes.
Resumo:
This thesis is motivated by safety-critical applications involving autonomous air, ground, and space vehicles carrying out complex tasks in uncertain and adversarial environments. We use temporal logic as a language to formally specify complex tasks and system properties. Temporal logic specifications generalize the classical notions of stability and reachability that are studied in the control and hybrid systems communities. Given a system model and a formal task specification, the goal is to automatically synthesize a control policy for the system that ensures that the system satisfies the specification. This thesis presents novel control policy synthesis algorithms for optimal and robust control of dynamical systems with temporal logic specifications. Furthermore, it introduces algorithms that are efficient and extend to high-dimensional dynamical systems.
The first contribution of this thesis is the generalization of a classical linear temporal logic (LTL) control synthesis approach to optimal and robust control. We show how we can extend automata-based synthesis techniques for discrete abstractions of dynamical systems to create optimal and robust controllers that are guaranteed to satisfy an LTL specification. Such optimal and robust controllers can be computed at little extra computational cost compared to computing a feasible controller.
The second contribution of this thesis addresses the scalability of control synthesis with LTL specifications. A major limitation of the standard automaton-based approach for control with LTL specifications is that the automaton might be doubly-exponential in the size of the LTL specification. We introduce a fragment of LTL for which one can compute feasible control policies in time polynomial in the size of the system and specification. Additionally, we show how to compute optimal control policies for a variety of cost functions, and identify interesting cases when this can be done in polynomial time. These techniques are particularly relevant for online control, as one can guarantee that a feasible solution can be found quickly, and then iteratively improve on the quality as time permits.
The final contribution of this thesis is a set of algorithms for computing feasible trajectories for high-dimensional, nonlinear systems with LTL specifications. These algorithms avoid a potentially computationally-expensive process of computing a discrete abstraction, and instead compute directly on the system's continuous state space. The first method uses an automaton representing the specification to directly encode a series of constrained-reachability subproblems, which can be solved in a modular fashion by using standard techniques. The second method encodes an LTL formula as mixed-integer linear programming constraints on the dynamical system. We demonstrate these approaches with numerical experiments on temporal logic motion planning problems with high-dimensional (10+ states) continuous systems.
Resumo:
Optical Coherence Tomography(OCT) is a popular, rapidly growing imaging technique with an increasing number of bio-medical applications due to its noninvasive nature. However, there are three major challenges in understanding and improving an OCT system: (1) Obtaining an OCT image is not easy. It either takes a real medical experiment or requires days of computer simulation. Without much data, it is difficult to study the physical processes underlying OCT imaging of different objects simply because there aren't many imaged objects. (2) Interpretation of an OCT image is also hard. This challenge is more profound than it appears. For instance, it would require a trained expert to tell from an OCT image of human skin whether there is a lesion or not. This is expensive in its own right, but even the expert cannot be sure about the exact size of the lesion or the width of the various skin layers. The take-away message is that analyzing an OCT image even from a high level would usually require a trained expert, and pixel-level interpretation is simply unrealistic. The reason is simple: we have OCT images but not their underlying ground-truth structure, so there is nothing to learn from. (3) The imaging depth of OCT is very limited (millimeter or sub-millimeter on human tissues). While OCT utilizes infrared light for illumination to stay noninvasive, the downside of this is that photons at such long wavelengths can only penetrate a limited depth into the tissue before getting back-scattered. To image a particular region of a tissue, photons first need to reach that region. As a result, OCT signals from deeper regions of the tissue are both weak (since few photons reached there) and distorted (due to multiple scatterings of the contributing photons). This fact alone makes OCT images very hard to interpret.
This thesis addresses the above challenges by successfully developing an advanced Monte Carlo simulation platform which is 10000 times faster than the state-of-the-art simulator in the literature, bringing down the simulation time from 360 hours to a single minute. This powerful simulation tool not only enables us to efficiently generate as many OCT images of objects with arbitrary structure and shape as we want on a common desktop computer, but it also provides us the underlying ground-truth of the simulated images at the same time because we dictate them at the beginning of the simulation. This is one of the key contributions of this thesis. What allows us to build such a powerful simulation tool includes a thorough understanding of the signal formation process, clever implementation of the importance sampling/photon splitting procedure, efficient use of a voxel-based mesh system in determining photon-mesh interception, and a parallel computation of different A-scans that consist a full OCT image, among other programming and mathematical tricks, which will be explained in detail later in the thesis.
Next we aim at the inverse problem: given an OCT image, predict/reconstruct its ground-truth structure on a pixel level. By solving this problem we would be able to interpret an OCT image completely and precisely without the help from a trained expert. It turns out that we can do much better. For simple structures we are able to reconstruct the ground-truth of an OCT image more than 98% correctly, and for more complicated structures (e.g., a multi-layered brain structure) we are looking at 93%. We achieved this through extensive uses of Machine Learning. The success of the Monte Carlo simulation already puts us in a great position by providing us with a great deal of data (effectively unlimited), in the form of (image, truth) pairs. Through a transformation of the high-dimensional response variable, we convert the learning task into a multi-output multi-class classification problem and a multi-output regression problem. We then build a hierarchy architecture of machine learning models (committee of experts) and train different parts of the architecture with specifically designed data sets. In prediction, an unseen OCT image first goes through a classification model to determine its structure (e.g., the number and the types of layers present in the image); then the image is handed to a regression model that is trained specifically for that particular structure to predict the length of the different layers and by doing so reconstruct the ground-truth of the image. We also demonstrate that ideas from Deep Learning can be useful to further improve the performance.
It is worth pointing out that solving the inverse problem automatically improves the imaging depth, since previously the lower half of an OCT image (i.e., greater depth) can be hardly seen but now becomes fully resolved. Interestingly, although OCT signals consisting the lower half of the image are weak, messy, and uninterpretable to human eyes, they still carry enough information which when fed into a well-trained machine learning model spits out precisely the true structure of the object being imaged. This is just another case where Artificial Intelligence (AI) outperforms human. To the best knowledge of the author, this thesis is not only a success but also the first attempt to reconstruct an OCT image at a pixel level. To even give a try on this kind of task, it would require fully annotated OCT images and a lot of them (hundreds or even thousands). This is clearly impossible without a powerful simulation tool like the one developed in this thesis.
Resumo:
A general framework for multi-criteria optimal design is presented which is well-suited for automated design of structural systems. A systematic computer-aided optimal design decision process is developed which allows the designer to rapidly evaluate and improve a proposed design by taking into account the major factors of interest related to different aspects such as design, construction, and operation.
The proposed optimal design process requires the selection of the most promising choice of design parameters taken from a large design space, based on an evaluation using specified criteria. The design parameters specify a particular design, and so they relate to member sizes, structural configuration, etc. The evaluation of the design uses performance parameters which may include structural response parameters, risks due to uncertain loads and modeling errors, construction and operating costs, etc. Preference functions are used to implement the design criteria in a "soft" form. These preference functions give a measure of the degree of satisfaction of each design criterion. The overall evaluation measure for a design is built up from the individual measures for each criterion through a preference combination rule. The goal of the optimal design process is to obtain a design that has the highest overall evaluation measure - an optimization problem.
Genetic algorithms are stochastic optimization methods that are based on evolutionary theory. They provide the exploration power necessary to explore high-dimensional search spaces to seek these optimal solutions. Two special genetic algorithms, hGA and vGA, are presented here for continuous and discrete optimization problems, respectively.
The methodology is demonstrated with several examples involving the design of truss and frame systems. These examples are solved by using the proposed hGA and vGA.
Resumo:
The intensities and relative abundances of galactic cosmic ray protons and antiprotons have been measured with the Isotope Matter Antimatter Experiment (IMAX), a balloon-borne magnet spectrometer. The IMAX payload had a successful flight from Lynn Lake, Manitoba, Canada on July 16, 1992. Particles detected by IMAX were identified by mass and charge via the Cherenkov-Rigidity and TOP-Rigidity techniques, with measured rms mass resolution ≤0.2 amu for Z=1 particles.
Cosmic ray antiprotons are of interest because they can be produced by the interactions of high energy protons and heavier nuclei with the interstellar medium as well as by more exotic sources. Previous cosmic ray antiproton experiments have reported an excess of antiprotons over that expected solely from cosmic ray interactions.
Analysis of the flight data has yielded 124405 protons and 3 antiprotons in the energy range 0.19-0.97 GeV at the instrument, 140617 protons and 8 antiprotons in the energy range 0.97-2.58 GeV, and 22524 protons and 5 antiprotons in the energy range 2.58-3.08 GeV. These measurements are a statistical improvement over previous antiproton measurements, and they demonstrate improved separation of antiprotons from the more abundant fluxes of protons, electrons, and other cosmic ray species.
When these results are corrected for instrumental and atmospheric background and losses, the ratios at the top of the atmosphere are p/p=3.21(+3.49, -1.97)x10^(-5) in the energy range 0.25-1.00 GeV, p/p=5.38(+3.48, -2.45) x10^(-5) in the energy range 1.00-2.61 GeV, and p/p=2.05(+1.79, -1.15) x10^(-4) in the energy range 2.61-3.11 GeV. The corresponding antiproton intensities, also corrected to the top of the atmosphere, are 2.3(+2.5, -1.4) x10^(-2) (m^2 s sr GeV)^(-1), 2.1(+1.4, -1.0) x10^(-2) (m^2 s sr GeV)^(-1), and 4.3(+3.7, -2.4) x10^(-2) (m^2 s sr GeV)^(-1) for the same energy ranges.
The IMAX antiproton fluxes and antiproton/proton ratios are compared with recent Standard Leaky Box Model (SLBM) calculations of the cosmic ray antiproton abundance. According to this model, cosmic ray antiprotons are secondary cosmic rays arising solely from the interaction of high energy cosmic rays with the interstellar medium. The effects of solar modulation of protons and antiprotons are also calculated, showing that the antiproton/proton ratio can vary by as much as an order of magnitude over the solar cycle. When solar modulation is taken into account, the IMAX antiproton measurements are found to be consistent with the most recent calculations of the SLBM. No evidence is found in the IMAX data for excess antiprotons arising from the decay of galactic dark matter, which had been suggested as an interpretation of earlier measurements. Furthermore, the consistency of the current results with the SLBM calculations suggests that the mean antiproton lifetime is at least as large as the cosmic ray storage time in the galaxy (~10^7 yr, based on measurements of cosmic ray ^(10)Be). Recent measurements by two other experiments are consistent with this interpretation of the IMAX antiproton results.
Resumo:
Noise measurements from 140°K to 350°K ambient temperature and between 10kHz and 22MHz performed on a double injection silicon diode as a function of operating point indicate that the high frequency noise depends linearly on the ambient temperature T and on the differential conductance g measured at the same frequency. The noise is represented quantitatively by〈i^2〉 = α•4kTgΔf. A new interpretation demands Nyquist noise with α ≡ 1 in these devices at high frequencies. This is in accord with an equivalent circuit derived for the double injection process. The effects of diode geometry on the static I-V characteristic as well as on the ac properties are illustrated. Investigation of the temperature dependence of double injection yields measurements of the temperature variation of the common high-level lifetime τ(τ ∝ T^2), the hole conductivity mobility µ_p (µ_p ∝ T^(-2.18)) and the electron conductivity mobility µ_n(µ_n ∝ T^(-1.75)).
Resumo:
DNA charge transport (CT) involves the efficient transfer of electrons or electron holes through the DNA π-stack over long molecular distances of at least 100 base-pairs. Despite this shallow distance dependence, DNA CT is sensitive to mismatches or lesions that disrupt π-stacking and is critically dependent on proper electronic coupling of the donor and acceptor moieties into the base stack. Favorable DNA CT is very rapid, occurring on the picosecond timescale. Because of this speed, electron holes equilibrate along the DNA π-stack, forming a characteristic pattern of DNA damage at low oxidation potential guanine multiplets. Furthermore, DNA CT may be used in a biological context. DNA processing enzymes with 4Fe4S clusters can perform DNA-mediated electron transfer (ET) self-exchange reactions with other 4Fe4S cluster proteins, even if the proteins are quite dissimilar, as long as the DNA-bound [4Fe4S]3+/2+ redox potentials are conserved. This mechanism would allow low copy number DNA repair proteins to find their lesions efficiently within the cell. DNA CT may also be used biologically for the long-range, selective activation of redox-active transcription factors. Within this work, we pursue other proteins that may utilize DNA CT within the cell and further elucidate aspects of the DNA-mediated ET self-exchange reaction of 4Fe4S cluster proteins.
Dps proteins, bacterial mini-ferritins that protect DNA from oxidative stress, are implicated in the survival and virulence of pathogenic bacteria. One aspect of their protection involves ferroxidase activity, whereby ferrous iron is bound and oxidized selectively by hydrogen peroxide, thereby preventing formation of damaging hydroxyl radicals via Fenton chemistry. Understanding the specific mechanism by which Dps proteins protect the bacterial genome could inform the development of new antibiotics. We investigate whether DNA-binding E. coli Dps can utilize DNA CT to protect the genome from a distance. An intercalating ruthenium photooxidant was employed to generate oxidative DNA damage via the flash-quench technique, which localizes to a low potential guanine triplet. We find that Dps loaded with ferrous iron, in contrast to Apo-Dps and ferric iron-loaded Dps which lack available reducing equivalents, significantly attenuates the yield of oxidative DNA damage at the guanine triplet. These data demonstrate that ferrous iron-loaded Dps is selectively oxidized to fill guanine radical holes, thereby restoring the integrity of the DNA. Luminescence studies indicate no direct interaction between the ruthenium photooxidant and Dps, supporting the DNA-mediated oxidation of ferrous iron-loaded Dps. Thus DNA CT may be a mechanism by which Dps efficiently protects the genome of pathogenic bacteria from a distance.
Further work focused on spectroscopic characterization of the DNA-mediated oxidation of ferrous iron-loaded Dps. X-band EPR was used to monitor the oxidation of DNA-bound Dps after DNA photooxidation via the flash-quench technique. Upon irradiation with poly(dGdC)2, a signal arises with g = 4.3, consistent with the formation of mononuclear high-spin Fe(III) sites of low symmetry, the expected oxidation product of Dps with one iron bound at each ferroxidase site. When poly(dGdC)2 is substituted with poly(dAdT)2, the yield of Dps oxidation is decreased significantly, indicating that guanine radicals facilitate Dps oxidation. The more favorable oxidation of Dps by guanine radicals supports the feasibility of a long-distance protection mechanism via DNA CT where Dps is oxidized to fill guanine radical holes in the bacterial genome produced by reactive oxygen species.
We have also explored possible electron transfer intermediates in the DNA-mediated oxidation of ferrous iron-loaded Dps. Dps proteins contain a conserved tryptophan residue in close proximity to the ferroxidase site (W52 in E. coli Dps). In comparison to WT Dps, in EPR studies of the oxidation of ferrous iron-loaded Dps following DNA photooxidation, W52Y and W52A mutants were deficient in forming the characteristic EPR signal at g = 4.3, with a larger deficiency for W52A compared to W52Y. In addition to EPR, we also probed the role of W52 Dps in cells using a hydrogen peroxide survival assay. Bacteria containing W52Y Dps survived the hydrogen peroxide challenge more similarly to those containing WT Dps, whereas cells with W52A Dps died off as quickly as cells without Dps. Overall, these results suggest the possibility of W52 as a CT hopping intermediate.
DNA-modified electrodes have become an essential tool for the study of the redox chemistry of DNA processing enzymes with 4Fe4S clusters. In many cases, it is necessary to investigate different complex samples and substrates in parallel in order to elucidate this chemistry. Therefore, we optimized and characterized a multiplexed electrochemical platform with the 4Fe4S cluster base excision repair glycosylase Endonuclease III (EndoIII). Closely packed DNA films, where the protein has limited surface accessibility, produce EndoIII electrochemical signals sensitive to an intervening mismatch, indicating a DNA-mediated process. Multiplexed analysis allowed more robust characterization of the CT-deficient Y82A EndoIII mutant, as well as comparison of a new family of mutations altering the electrostatics surrounding the 4Fe4S cluster in an effort to shift the reduction potential of the cluster. While little change in the DNA-bound midpoint potential was found for this family of mutants, likely indicating the dominant effect of DNA-binding on establishing the protein redox potential, significant variations in the efficiency of DNA-mediated electron transfer were apparent. On the basis of the stability of these proteins, examined by circular dichroism, we proposed that the electron transfer pathway in EndoIII can be perturbed not only by the removal of aromatic residues but also through changes in solvation near the cluster.
While the 4Fe4S cluster of EndoIII is relatively insensitive to oxidation and reduction in solution, we have found that upon DNA binding, the reduction potential of the [4Fe4S]3+/2+ couple shifts negatively by approximately 200 mV, bringing this couple into a physiologically relevant range. Demonstrated using electrochemistry experiments in the presence and absence of DNA, these studies do not provide direct molecular evidence for the species being observed. Sulfur K-edge X-ray absorbance spectroscopy (XAS) can be used to probe directly the covalency of iron-sulfur clusters, which is correlated to their reduction potential. We have shown that the Fe-S covalency of the 4Fe4S cluster of EndoIII increases upon DNA binding, stabilizing the oxidized [4Fe4S]3+ cluster, consistent with a negative shift in reduction potential. The 7% increase in Fe-S covalency corresponds to an approximately 150 mV shift, remarkably similar to DNA electrochemistry results. Therefore we have obtained direct molecular evidence for the shift in 4Fe4S reduction potential of EndoIII upon DNA binding, supporting the feasibility of our model whereby these proteins can utilize DNA CT to cooperate in order to efficiently find DNA lesions inside cells.
In conclusion, in this work we have explored the biological applications of DNA CT. We discovered that the DNA-binding bacterial ferritin Dps can protect the bacterial genome from a distance via DNA CT, perhaps contributing to pathogen survival and virulence. Furthermore, we optimized a multiplexed electrochemical platform for the study of the redox chemistry of DNA-bound 4Fe4S cluster proteins. Finally, we have used sulfur K-edge XAS to obtain direct molecular evidence for the negative shift in 4Fe4S cluster reduction potential of EndoIII upon DNA binding. These studies contribute to the understanding of DNA-mediated protein oxidation within cells.
Resumo:
I. The 3.7 Å Crystal Structure of Horse Heart Ferricytochrome C.
The crystal structure of horse heart ferricytochrome c has been determined to a resolution of 3.7 Å using the multiple isomorphous replacement technique. Two isomorphous derivatives were used in the analysis, leading to a map with a mean figure of merit of 0.458. The quality of the resulting map was extremely high, even though the derivative data did not appear to be of high quality.
Although it was impossible to fit the known amino acid sequence to the calculated structure in an unambiguous way, many important features of the molecule could still be determined from the 3.7 Å electron density map. Among these was the fact that cytochrome c contains little or no α-helix. The polypeptide chain appears to be wound about the heme group in such a way as to form a loosely packed hydrophobic core in the molecule.
The heme group is located in a cleft on the molecule with one edge exposed to the solvent. The fifth coordinating ligand is His 18 and the sixth coordinating ligand is probably neither His 26 nor His 33.
The high resolution analysis of cytochrome c is now in progress and should be completed within the next year.
II. The Application of the Karle-Hauptman Tangent Formula to Protein Phasing.
The Karle-Hauptman tangent formula has been shown to be applicable to the refinement of previously determined protein phases. Tests were made with both the cytochrome c data from Part I and a theoretical structure based on the myoglobin molecule. The refinement process was found to be highly dependent upon the manner in which the tangent formula was applied. Iterative procedures did not work well, at least at low resolution.
The tangent formula worked very well in selecting the true phase from the two possible phase choices resulting from a single isomorphous replacement phase analysis. The only restriction on this application is that the heavy atoms form a non-centric cluster in the unit cell.
Pages 156 through 284 in this Thesis consist of previously published papers relating to the above two sections. References to these papers can be found on page 155.