10 resultados para datasets
em CaltechTHESIS
Resumo:
The concept of seismogenic asperities and aseismic barriers has become a useful paradigm within which to understand the seismogenic behavior of major faults. Since asperities and barriers can be thought of as defining the potential rupture area of large megathrust earthquakes, it is thus important to identify their respective spatial extents, constrain their temporal longevity, and to develop a physical understanding for their behavior. Space geodesy is making critical contributions to the identification of slip asperities and barriers but progress in many geographical regions depends on improving the accuracy and precision of the basic measurements. This thesis begins with technical developments aimed at improving satellite radar interferometric measurements of ground deformation whereby we introduce an empirical correction algorithm for unwanted effects due to interferometric path delays that are due to spatially and temporally variable radar wave propagation speeds in the atmosphere. In chapter 2, I combine geodetic datasets with complementary spatio-temporal resolutions to improve our understanding of the spatial distribution of crustal deformation sources and their associated temporal evolution – here we use observations from Long Valley Caldera (California) as our test bed. In the third chapter I apply the tools developed in the first two chapters to analyze postseismic deformation associated with the 2010 Mw=8.8 Maule (Chile) earthquake. The result delimits patches where afterslip occurs, explores their relationship to coseismic rupture, quantifies frictional properties associated with inferred patches of afterslip, and discusses the relationship of asperities and barriers to long-term topography. The final chapter investigates interseismic deformation of the eastern Makran subduction zone by using satellite radar interferometry only, and demonstrates that with state-of-art techniques it is possible to quantify tectonic signals with small amplitude and long wavelength. Portions of the eastern Makran for which we estimate low fault coupling correspond to areas where bathymetric features on the downgoing plate are presently subducting, whereas the region of the 1945 M=8.1 earthquake appears to be more highly coupled.
Resumo:
This thesis describes the active structures of Myanmar and its surrounding regions, and the earthquake geology of the major active structures. Such investigation is needed urgently for this rapidly developing country that has suffered from destructive earthquakes in its long history. To archive a better understanding of the regional active tectonics and the seismic potential in the future, we utilized a global digital elevation model and optical satellite imagery to describe geomorphologic evidence for the principal neotectonic features of the western half of the Southeast Asia mainland. Our investigation shows three distinct active structural systems that accommodate the oblique convergence between the Indian plate and Southeast Asia and the extrusion of Asian territory around the eastern syntaxis of the Himalayan mountain range. Each of these active deformation belts can be further separated into several neotectonic domains, in which structures show distinctive active behaviors from one to another.
In order to better understand the behaviors of active structures, we focused on the active characteristics of the right-lateral Sagaing fault and the oblique subducting northern Sunda megathrust in the second part of this thesis. The detailed geomorphic investigations along these two major plate-interface faults revealed the recent slip behavior of these structures, and plausible recurrence intervals of major seismic events. We also documented the ground deformation of the 2011 Tarlay earthquake in remote eastern Myanmar from remote sensing datasets and post-earthquake field investigations. The field observation and the remote sensing measurements of surface ruptures of the Tarlay earthquake are the first study of this kind in the Myanmar region.
Resumo:
Seismic reflection methods have been extensively used to probe the Earth's crust and suggest the nature of its formative processes. The analysis of multi-offset seismic reflection data extends the technique from a reconnaissance method to a powerful scientific tool that can be applied to test specific hypotheses. The treatment of reflections at multiple offsets becomes tractable if the assumptions of high-frequency rays are valid for the problem being considered. Their validity can be tested by applying the methods of analysis to full wave synthetics.
Three studies illustrate the application of these principles to investigations of the nature of the crust in southern California. A survey shot by the COCORP consortium in 1977 across the San Andreas fault near Parkfield revealed events in the record sections whose arrival time decreased with offset. The reflectors generating these events are imaged using a multi-offset three-dimensional Kirchhoff migration. Migrations of full wave acoustic synthetics having the same limitations in geometric coverage as the field survey demonstrate the utility of this back projection process for imaging. The migrated depth sections show the locations of the major physical boundaries of the San Andreas fault zone. The zone is bounded on the southwest by a near-vertical fault juxtaposing a Tertiary sedimentary section against uplifted crystalline rocks of the fault zone block. On the northeast, the fault zone is bounded by a fault dipping into the San Andreas, which includes slices of serpentinized ultramafics, intersecting it at 3 km depth. These interpretations can be made despite complications introduced by lateral heterogeneities.
In 1985 the Calcrust consortium designed a survey in the eastern Mojave desert to image structures in both the shallow and the deep crust. Preliminary field experiments showed that the major geophysical acquisition problem to be solved was the poor penetration of seismic energy through a low-velocity surface layer. Its effects could be mitigated through special acquisition and processing techniques. Data obtained from industry showed that quality data could be obtained from areas having a deeper, older sedimentary cover, causing a re-definition of the geologic objectives. Long offset stationary arrays were designed to provide reversed, wider angle coverage of the deep crust over parts of the survey. The preliminary field tests and constant monitoring of data quality and parameter adjustment allowed 108 km of excellent crustal data to be obtained.
This dataset, along with two others from the central and western Mojave, was used to constrain rock properties and the physical condition of the crust. The multi-offset analysis proceeded in two steps. First, an increase in reflection peak frequency with offset is indicative of a thinly layered reflector. The thickness and velocity contrast of the layering can be calculated from the spectral dispersion, to discriminate between structures resulting from broad scale or local effects. Second, the amplitude effects at different offsets of P-P scattering from weak elastic heterogeneities indicate whether the signs of the changes in density, rigidity, and Lame's parameter at the reflector agree or are opposed. The effects of reflection generation and propagation in a heterogeneous, anisotropic crust were contained by the design of the experiment and the simplicity of the observed amplitude and frequency trends. Multi-offset spectra and amplitude trend stacks of the three Mojave Desert datasets suggest that the most reflective structures in the middle crust are strong Poisson's ratio (σ) contrasts. Porous zones or the juxtaposition of units of mutually distant origin are indicated. Heterogeneities in σ increase towards the top of a basal crustal zone at ~22 km depth. The transition to the basal zone and to the mantle include increases in σ. The Moho itself includes ~400 m layering having a velocity higher than that of the uppermost mantle. The Moho maintains the same configuration across the Mojave despite 5 km of crustal thinning near the Colorado River. This indicates that Miocene extension there either thinned just the basal zone, or that the basal zone developed regionally after the extensional event.
Resumo:
The epidemic of HIV/AIDS in the United States is constantly changing and evolving, starting from patient zero to now an estimated 650,000 to 900,000 Americans infected. The nature and course of HIV changed dramatically with the introduction of antiretrovirals. This discourse examines many different facets of HIV from the beginning where there wasn't any treatment for HIV until the present era of highly active antiretroviral therapy (HAART). By utilizing statistical analysis of clinical data, this paper examines where we were, where we are and projections as to where treatment of HIV/AIDS is headed.
Chapter Two describes the datasets that were used for the analyses. The primary database utilized was collected by myself from an outpatient HIV clinic. The data included dates from 1984 until the present. The second database was from the Multicenter AIDS Cohort Study (MACS) public dataset. The data from the MACS cover the time between 1984 and October 1992. Comparisons are made between both datasets.
Chapter Three discusses where we were. Before the first anti-HIV drugs (called antiretrovirals) were approved, there was no treatment to slow the progression of HIV. The first generation of antiretrovirals, reverse transcriptase inhibitors such as AZT (zidovudine), DDI (didanosine), DDC (zalcitabine), and D4T (stavudine) provided the first treatment for HIV. The first clinical trials showed that these antiretrovirals had a significant impact on increasing patient survival. The trials also showed that patients on these drugs had increased CD4+ T cell counts. Chapter Three examines the distributions of CD4 T cell counts. The results show that the estimated distributions of CD4 T cell counts are distinctly non-Gaussian. Thus distributional assumptions regarding CD4 T cell counts must be taken, into account when performing analyses with this marker. The results also show the estimated CD4 T cell distributions for each disease stage: asymptomatic, symptomatic and AIDS are non-Gaussian. Interestingly, the distribution of CD4 T cell counts for the asymptomatic period is significantly below that of the CD4 T cell distribution for the uninfected population suggesting that even in patients with no outward symptoms of HIV infection, there exists high levels of immunosuppression.
Chapter Four discusses where we are at present. HIV quickly grew resistant to reverse transcriptase inhibitors which were given sequentially as mono or dual therapy. As resistance grew, the positive effects of the reverse transcriptase inhibitors on CD4 T cell counts and survival dissipated. As the old era faded a new era characterized by a new class of drugs and new technology changed the way that we treat HIV-infected patients. Viral load assays were able to quantify the levels of HIV RNA in the blood. By quantifying the viral load, one now had a faster, more direct way to test antiretroviral regimen efficacy. Protease inhibitors, which attacked a different region of HIV than reverse transcriptase inhibitors, when used in combination with other antiretroviral agents were found to dramatically and significantly reduce the HIV RNA levels in the blood. Patients also experienced significant increases in CD4 T cell counts. For the first time in the epidemic, there was hope. It was hypothesized that with HAART, viral levels could be kept so low that the immune system as measured by CD4 T cell counts would be able to recover. If these viral levels could be kept low enough, it would be possible for the immune system to eradicate the virus. The hypothesis of immune reconstitution, that is bringing CD4 T cell counts up to levels seen in uninfected patients, is tested in Chapter Four. It was found that for these patients, there was not enough of a CD4 T cell increase to be consistent with the hypothesis of immune reconstitution.
In Chapter Five, the effectiveness of long-term HAART is analyzed. Survival analysis was conducted on 213 patients on long-term HAART. The primary endpoint was presence of an AIDS defining illness. A high level of clinical failure, or progression to an endpoint, was found.
Chapter Six yields insights into where we are going. New technology such as viral genotypic testing, that looks at the genetic structure of HIV and determines where mutations have occurred, has shown that HIV is capable of producing resistance mutations that confer multiple drug resistance. This section looks at resistance issues and speculates, ceterus parabis, where the state of HIV is going. This section first addresses viral genotype and the correlates of viral load and disease progression. A second analysis looks at patients who have failed their primary attempts at HAART and subsequent salvage therapy. It was found that salvage regimens, efforts to control viral replication through the administration of different combinations of antiretrovirals, were not effective in 90 percent of the population in controlling viral replication. Thus, primary attempts at therapy offer the best change of viral suppression and delay of disease progression. Documentation of transmission of drug-resistant virus suggests that the public health crisis of HIV is far from over. Drug resistant HIV can sustain the epidemic and hamper our efforts to treat HIV infection. The data presented suggest that the decrease in the morbidity and mortality due to HIV/AIDS is transient. Deaths due to HIV will increase and public health officials must prepare for this eventuality unless new treatments become available. These results also underscore the importance of the vaccine effort.
The final chapter looks at the economic issues related to HIV. The direct and indirect costs of treating HIV/AIDS are very high. For the first time in the epidemic, there exists treatment that can actually slow disease progression. The direct costs for HAART are estimated. It is estimated that the direct lifetime costs for treating each HIV infected patient with HAART is between $353,000 to $598,000 depending on how long HAART prolongs life. If one looks at the incremental cost per year of life saved it is only $101,000. This is comparable with the incremental costs per year of life saved from coronary artery bypass surgery.
Policy makers need to be aware that although HAART can delay disease progression, it is not a cure and HIV is not over. The results presented here suggest that the decreases in the morbidity and mortality due to HIV are transient. Policymakers need to be prepared for the eventual increase in AIDS incidence and mortality. Costs associated with HIV/AIDS are also projected to increase. The cost savings seen recently have been from the dramatic decreases in the incidence of AIDS defining opportunistic infections. As patients who have been on HAART the longest start to progress to AIDS, policymakers and insurance companies will find that the cost of treating HIV/AIDS will increase.
Resumo:
The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.
The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.
The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.
The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.
The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).
The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.
Resumo:
This thesis addresses a series of topics related to the question of how people find the foreground objects from complex scenes. With both computer vision modeling, as well as psychophysical analyses, we explore the computational principles for low- and mid-level vision.
We first explore the computational methods of generating saliency maps from images and image sequences. We propose an extremely fast algorithm called Image Signature that detects the locations in the image that attract human eye gazes. With a series of experimental validations based on human behavioral data collected from various psychophysical experiments, we conclude that the Image Signature and its spatial-temporal extension, the Phase Discrepancy, are among the most accurate algorithms for saliency detection under various conditions.
In the second part, we bridge the gap between fixation prediction and salient object segmentation with two efforts. First, we propose a new dataset that contains both fixation and object segmentation information. By simultaneously presenting the two types of human data in the same dataset, we are able to analyze their intrinsic connection, as well as understanding the drawbacks of today’s “standard” but inappropriately labeled salient object segmentation dataset. Second, we also propose an algorithm of salient object segmentation. Based on our novel discoveries on the connections of fixation data and salient object segmentation data, our model significantly outperforms all existing models on all 3 datasets with large margins.
In the third part of the thesis, we discuss topics around the human factors of boundary analysis. Closely related to salient object segmentation, boundary analysis focuses on delimiting the local contours of an object. We identify the potential pitfalls of algorithm evaluation for the problem of boundary detection. Our analysis indicates that today’s popular boundary detection datasets contain significant level of noise, which may severely influence the benchmarking results. To give further insights on the labeling process, we propose a model to characterize the principles of the human factors during the labeling process.
The analyses reported in this thesis offer new perspectives to a series of interrelating issues in low- and mid-level vision. It gives warning signs to some of today’s “standard” procedures, while proposing new directions to encourage future research.
Resumo:
The Madden-Julian Oscillation (MJO) is a pattern of intense rainfall and associated planetary-scale circulations in the tropical atmosphere, with a recurrence interval of 30-90 days. Although the MJO was first discovered 40 years ago, it is still a challenge to simulate the MJO in general circulation models (GCMs), and even with simple models it is difficult to agree on the basic mechanisms. This deficiency is mainly due to our poor understanding of moist convection—deep cumulus clouds and thunderstorms, which occur at scales that are smaller than the resolution elements of the GCMs. Moist convection is the most important mechanism for transporting energy from the ocean to the atmosphere. Success in simulating the MJO will improve our understanding of moist convection and thereby improve weather and climate forecasting.
We address this fundamental subject by analyzing observational datasets, constructing a hierarchy of numerical models, and developing theories. Parameters of the models are taken from observation, and the simulated MJO fits the data without further adjustments. The major findings include: 1) the MJO may be an ensemble of convection events linked together by small-scale high-frequency inertia-gravity waves; 2) the eastward propagation of the MJO is determined by the difference between the eastward and westward phase speeds of the waves; 3) the planetary scale of the MJO is the length over which temperature anomalies can be effectively smoothed by gravity waves; 4) the strength of the MJO increases with the typical strength of convection, which increases in a warming climate; 5) the horizontal scale of the MJO increases with the spatial frequency of convection; and 6) triggered convection, where potential energy accumulates until a threshold is reached, is important in simulating the MJO. Our findings challenge previous paradigms, which consider the MJO as a large-scale mode, and point to ways for improving the climate models.
Resumo:
Mitochondria contain a 16.6 kb circular genome encoding 13 proteins as well as mitochondrial tRNAs and rRNAs. Copies of the genome are organized into nucleoids containing both DNA and proteins, including the machinery required for mtDNA replication and transcription. Although mtDNA integrity is essential for cellular and organismal viability, regulation of proliferation of the mitochondrial genome is poorly understood. To elucidate the mechanisms behind this, we chose to study the interplay between mtDNA copy number and the proteins involved in mitochondrial fusion, another required function in cells. Strikingly, we found that mouse embryonic fibroblasts lacking fusion also had a mtDNA copy number deficit. To understand this phenomenon further, we analyzed the binding of mitochondrial transcription factor A, whose role in transcription, replication, and packaging of the genome is well-established and crucial for cellular maintenance. Using ChIP-seq, we were able to detect largely uniform, non-specific binding across the genome, with no occupancy in the known specific binding sites in the regulatory region. We did detect a single binding site directly upstream of a known origin of replication, suggesting that TFAM may play a direct role in replication. Finally, although TFAM has been previously shown to localize to the nuclear genome, we found no evidence for such binding sites in our system.
To further understand the regulation of mtDNA by other proteins, we analyzed publicly available ChIP-seq datasets from ENCODE, modENCODE, and mouseENCODE for evidence of nuclear transcription factor binding to the mitochondrial genome. We identified eight human transcription factors and three mouse transcription factors that demonstrated binding events with the classical strand asymmetrical morphology of classical binding sites. ChIP-seq is a powerful tool for understanding the interactions between proteins and the mitochondrial genome, and future studies promise to further the understanding of how mtDNA is regulated within the nucleoid.
Resumo:
The objective of this thesis is to develop a framework to conduct velocity resolved - scalar modeled (VR-SM) simulations, which will enable accurate simulations at higher Reynolds and Schmidt (Sc) numbers than are currently feasible. The framework established will serve as a first step to enable future simulation studies for practical applications. To achieve this goal, in-depth analyses of the physical, numerical, and modeling aspects related to Sc>>1 are presented, specifically when modeling in the viscous-convective subrange. Transport characteristics are scrutinized by examining scalar-velocity Fourier mode interactions in Direct Numerical Simulation (DNS) datasets and suggest that scalar modes in the viscous-convective subrange do not directly affect large-scale transport for high Sc. Further observations confirm that discretization errors inherent in numerical schemes can be sufficiently large to wipe out any meaningful contribution from subfilter models. This provides strong incentive to develop more effective numerical schemes to support high Sc simulations. To lower numerical dissipation while maintaining physically and mathematically appropriate scalar bounds during the convection step, a novel method of enforcing bounds is formulated, specifically for use with cubic Hermite polynomials. Boundedness of the scalar being transported is effected by applying derivative limiting techniques, and physically plausible single sub-cell extrema are allowed to exist to help minimize numerical dissipation. The proposed bounding algorithm results in significant performance gain in DNS of turbulent mixing layers and of homogeneous isotropic turbulence. Next, the combined physical/mathematical behavior of the subfilter scalar-flux vector is analyzed in homogeneous isotropic turbulence, by examining vector orientation in the strain-rate eigenframe. The results indicate no discernible dependence on the modeled scalar field, and lead to the identification of the tensor-diffusivity model as a good representation of the subfilter flux. Velocity resolved - scalar modeled simulations of homogeneous isotropic turbulence are conducted to confirm the behavior theorized in these a priori analyses, and suggest that the tensor-diffusivity model is ideal for use in the viscous-convective subrange. Simulations of a turbulent mixing layer are also discussed, with the partial objective of analyzing Schmidt number dependence of a variety of scalar statistics. Large-scale statistics are confirmed to be relatively independent of the Schmidt number for Sc>>1, which is explained by the dominance of subfilter dissipation over resolved molecular dissipation in the simulations. Overall, the VR-SM framework presented is quite effective in predicting large-scale transport characteristics of high Schmidt number scalars, however, it is determined that prediction of subfilter quantities would entail additional modeling intended specifically for this purpose. The VR-SM simulations presented in this thesis provide us with the opportunity to overlap with experimental studies, while at the same time creating an assortment of baseline datasets for future validation of LES models, thereby satisfying the objectives outlined for this work.
Resumo:
In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.
In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.
Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.
In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.