970 resultados para Datasets
Resumo:
Seismic reflection methods have been extensively used to probe the Earth's crust and suggest the nature of its formative processes. The analysis of multi-offset seismic reflection data extends the technique from a reconnaissance method to a powerful scientific tool that can be applied to test specific hypotheses. The treatment of reflections at multiple offsets becomes tractable if the assumptions of high-frequency rays are valid for the problem being considered. Their validity can be tested by applying the methods of analysis to full wave synthetics.
Three studies illustrate the application of these principles to investigations of the nature of the crust in southern California. A survey shot by the COCORP consortium in 1977 across the San Andreas fault near Parkfield revealed events in the record sections whose arrival time decreased with offset. The reflectors generating these events are imaged using a multi-offset three-dimensional Kirchhoff migration. Migrations of full wave acoustic synthetics having the same limitations in geometric coverage as the field survey demonstrate the utility of this back projection process for imaging. The migrated depth sections show the locations of the major physical boundaries of the San Andreas fault zone. The zone is bounded on the southwest by a near-vertical fault juxtaposing a Tertiary sedimentary section against uplifted crystalline rocks of the fault zone block. On the northeast, the fault zone is bounded by a fault dipping into the San Andreas, which includes slices of serpentinized ultramafics, intersecting it at 3 km depth. These interpretations can be made despite complications introduced by lateral heterogeneities.
In 1985 the Calcrust consortium designed a survey in the eastern Mojave desert to image structures in both the shallow and the deep crust. Preliminary field experiments showed that the major geophysical acquisition problem to be solved was the poor penetration of seismic energy through a low-velocity surface layer. Its effects could be mitigated through special acquisition and processing techniques. Data obtained from industry showed that quality data could be obtained from areas having a deeper, older sedimentary cover, causing a re-definition of the geologic objectives. Long offset stationary arrays were designed to provide reversed, wider angle coverage of the deep crust over parts of the survey. The preliminary field tests and constant monitoring of data quality and parameter adjustment allowed 108 km of excellent crustal data to be obtained.
This dataset, along with two others from the central and western Mojave, was used to constrain rock properties and the physical condition of the crust. The multi-offset analysis proceeded in two steps. First, an increase in reflection peak frequency with offset is indicative of a thinly layered reflector. The thickness and velocity contrast of the layering can be calculated from the spectral dispersion, to discriminate between structures resulting from broad scale or local effects. Second, the amplitude effects at different offsets of P-P scattering from weak elastic heterogeneities indicate whether the signs of the changes in density, rigidity, and Lame's parameter at the reflector agree or are opposed. The effects of reflection generation and propagation in a heterogeneous, anisotropic crust were contained by the design of the experiment and the simplicity of the observed amplitude and frequency trends. Multi-offset spectra and amplitude trend stacks of the three Mojave Desert datasets suggest that the most reflective structures in the middle crust are strong Poisson's ratio (σ) contrasts. Porous zones or the juxtaposition of units of mutually distant origin are indicated. Heterogeneities in σ increase towards the top of a basal crustal zone at ~22 km depth. The transition to the basal zone and to the mantle include increases in σ. The Moho itself includes ~400 m layering having a velocity higher than that of the uppermost mantle. The Moho maintains the same configuration across the Mojave despite 5 km of crustal thinning near the Colorado River. This indicates that Miocene extension there either thinned just the basal zone, or that the basal zone developed regionally after the extensional event.
Resumo:
The epidemic of HIV/AIDS in the United States is constantly changing and evolving, starting from patient zero to now an estimated 650,000 to 900,000 Americans infected. The nature and course of HIV changed dramatically with the introduction of antiretrovirals. This discourse examines many different facets of HIV from the beginning where there wasn't any treatment for HIV until the present era of highly active antiretroviral therapy (HAART). By utilizing statistical analysis of clinical data, this paper examines where we were, where we are and projections as to where treatment of HIV/AIDS is headed.
Chapter Two describes the datasets that were used for the analyses. The primary database utilized was collected by myself from an outpatient HIV clinic. The data included dates from 1984 until the present. The second database was from the Multicenter AIDS Cohort Study (MACS) public dataset. The data from the MACS cover the time between 1984 and October 1992. Comparisons are made between both datasets.
Chapter Three discusses where we were. Before the first anti-HIV drugs (called antiretrovirals) were approved, there was no treatment to slow the progression of HIV. The first generation of antiretrovirals, reverse transcriptase inhibitors such as AZT (zidovudine), DDI (didanosine), DDC (zalcitabine), and D4T (stavudine) provided the first treatment for HIV. The first clinical trials showed that these antiretrovirals had a significant impact on increasing patient survival. The trials also showed that patients on these drugs had increased CD4+ T cell counts. Chapter Three examines the distributions of CD4 T cell counts. The results show that the estimated distributions of CD4 T cell counts are distinctly non-Gaussian. Thus distributional assumptions regarding CD4 T cell counts must be taken, into account when performing analyses with this marker. The results also show the estimated CD4 T cell distributions for each disease stage: asymptomatic, symptomatic and AIDS are non-Gaussian. Interestingly, the distribution of CD4 T cell counts for the asymptomatic period is significantly below that of the CD4 T cell distribution for the uninfected population suggesting that even in patients with no outward symptoms of HIV infection, there exists high levels of immunosuppression.
Chapter Four discusses where we are at present. HIV quickly grew resistant to reverse transcriptase inhibitors which were given sequentially as mono or dual therapy. As resistance grew, the positive effects of the reverse transcriptase inhibitors on CD4 T cell counts and survival dissipated. As the old era faded a new era characterized by a new class of drugs and new technology changed the way that we treat HIV-infected patients. Viral load assays were able to quantify the levels of HIV RNA in the blood. By quantifying the viral load, one now had a faster, more direct way to test antiretroviral regimen efficacy. Protease inhibitors, which attacked a different region of HIV than reverse transcriptase inhibitors, when used in combination with other antiretroviral agents were found to dramatically and significantly reduce the HIV RNA levels in the blood. Patients also experienced significant increases in CD4 T cell counts. For the first time in the epidemic, there was hope. It was hypothesized that with HAART, viral levels could be kept so low that the immune system as measured by CD4 T cell counts would be able to recover. If these viral levels could be kept low enough, it would be possible for the immune system to eradicate the virus. The hypothesis of immune reconstitution, that is bringing CD4 T cell counts up to levels seen in uninfected patients, is tested in Chapter Four. It was found that for these patients, there was not enough of a CD4 T cell increase to be consistent with the hypothesis of immune reconstitution.
In Chapter Five, the effectiveness of long-term HAART is analyzed. Survival analysis was conducted on 213 patients on long-term HAART. The primary endpoint was presence of an AIDS defining illness. A high level of clinical failure, or progression to an endpoint, was found.
Chapter Six yields insights into where we are going. New technology such as viral genotypic testing, that looks at the genetic structure of HIV and determines where mutations have occurred, has shown that HIV is capable of producing resistance mutations that confer multiple drug resistance. This section looks at resistance issues and speculates, ceterus parabis, where the state of HIV is going. This section first addresses viral genotype and the correlates of viral load and disease progression. A second analysis looks at patients who have failed their primary attempts at HAART and subsequent salvage therapy. It was found that salvage regimens, efforts to control viral replication through the administration of different combinations of antiretrovirals, were not effective in 90 percent of the population in controlling viral replication. Thus, primary attempts at therapy offer the best change of viral suppression and delay of disease progression. Documentation of transmission of drug-resistant virus suggests that the public health crisis of HIV is far from over. Drug resistant HIV can sustain the epidemic and hamper our efforts to treat HIV infection. The data presented suggest that the decrease in the morbidity and mortality due to HIV/AIDS is transient. Deaths due to HIV will increase and public health officials must prepare for this eventuality unless new treatments become available. These results also underscore the importance of the vaccine effort.
The final chapter looks at the economic issues related to HIV. The direct and indirect costs of treating HIV/AIDS are very high. For the first time in the epidemic, there exists treatment that can actually slow disease progression. The direct costs for HAART are estimated. It is estimated that the direct lifetime costs for treating each HIV infected patient with HAART is between $353,000 to $598,000 depending on how long HAART prolongs life. If one looks at the incremental cost per year of life saved it is only $101,000. This is comparable with the incremental costs per year of life saved from coronary artery bypass surgery.
Policy makers need to be aware that although HAART can delay disease progression, it is not a cure and HIV is not over. The results presented here suggest that the decreases in the morbidity and mortality due to HIV are transient. Policymakers need to be prepared for the eventual increase in AIDS incidence and mortality. Costs associated with HIV/AIDS are also projected to increase. The cost savings seen recently have been from the dramatic decreases in the incidence of AIDS defining opportunistic infections. As patients who have been on HAART the longest start to progress to AIDS, policymakers and insurance companies will find that the cost of treating HIV/AIDS will increase.
Resumo:
The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.
The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.
The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.
The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.
The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).
The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.
Resumo:
This thesis addresses a series of topics related to the question of how people find the foreground objects from complex scenes. With both computer vision modeling, as well as psychophysical analyses, we explore the computational principles for low- and mid-level vision.
We first explore the computational methods of generating saliency maps from images and image sequences. We propose an extremely fast algorithm called Image Signature that detects the locations in the image that attract human eye gazes. With a series of experimental validations based on human behavioral data collected from various psychophysical experiments, we conclude that the Image Signature and its spatial-temporal extension, the Phase Discrepancy, are among the most accurate algorithms for saliency detection under various conditions.
In the second part, we bridge the gap between fixation prediction and salient object segmentation with two efforts. First, we propose a new dataset that contains both fixation and object segmentation information. By simultaneously presenting the two types of human data in the same dataset, we are able to analyze their intrinsic connection, as well as understanding the drawbacks of today’s “standard” but inappropriately labeled salient object segmentation dataset. Second, we also propose an algorithm of salient object segmentation. Based on our novel discoveries on the connections of fixation data and salient object segmentation data, our model significantly outperforms all existing models on all 3 datasets with large margins.
In the third part of the thesis, we discuss topics around the human factors of boundary analysis. Closely related to salient object segmentation, boundary analysis focuses on delimiting the local contours of an object. We identify the potential pitfalls of algorithm evaluation for the problem of boundary detection. Our analysis indicates that today’s popular boundary detection datasets contain significant level of noise, which may severely influence the benchmarking results. To give further insights on the labeling process, we propose a model to characterize the principles of the human factors during the labeling process.
The analyses reported in this thesis offer new perspectives to a series of interrelating issues in low- and mid-level vision. It gives warning signs to some of today’s “standard” procedures, while proposing new directions to encourage future research.
Resumo:
The Madden-Julian Oscillation (MJO) is a pattern of intense rainfall and associated planetary-scale circulations in the tropical atmosphere, with a recurrence interval of 30-90 days. Although the MJO was first discovered 40 years ago, it is still a challenge to simulate the MJO in general circulation models (GCMs), and even with simple models it is difficult to agree on the basic mechanisms. This deficiency is mainly due to our poor understanding of moist convection—deep cumulus clouds and thunderstorms, which occur at scales that are smaller than the resolution elements of the GCMs. Moist convection is the most important mechanism for transporting energy from the ocean to the atmosphere. Success in simulating the MJO will improve our understanding of moist convection and thereby improve weather and climate forecasting.
We address this fundamental subject by analyzing observational datasets, constructing a hierarchy of numerical models, and developing theories. Parameters of the models are taken from observation, and the simulated MJO fits the data without further adjustments. The major findings include: 1) the MJO may be an ensemble of convection events linked together by small-scale high-frequency inertia-gravity waves; 2) the eastward propagation of the MJO is determined by the difference between the eastward and westward phase speeds of the waves; 3) the planetary scale of the MJO is the length over which temperature anomalies can be effectively smoothed by gravity waves; 4) the strength of the MJO increases with the typical strength of convection, which increases in a warming climate; 5) the horizontal scale of the MJO increases with the spatial frequency of convection; and 6) triggered convection, where potential energy accumulates until a threshold is reached, is important in simulating the MJO. Our findings challenge previous paradigms, which consider the MJO as a large-scale mode, and point to ways for improving the climate models.
Resumo:
Mitochondria contain a 16.6 kb circular genome encoding 13 proteins as well as mitochondrial tRNAs and rRNAs. Copies of the genome are organized into nucleoids containing both DNA and proteins, including the machinery required for mtDNA replication and transcription. Although mtDNA integrity is essential for cellular and organismal viability, regulation of proliferation of the mitochondrial genome is poorly understood. To elucidate the mechanisms behind this, we chose to study the interplay between mtDNA copy number and the proteins involved in mitochondrial fusion, another required function in cells. Strikingly, we found that mouse embryonic fibroblasts lacking fusion also had a mtDNA copy number deficit. To understand this phenomenon further, we analyzed the binding of mitochondrial transcription factor A, whose role in transcription, replication, and packaging of the genome is well-established and crucial for cellular maintenance. Using ChIP-seq, we were able to detect largely uniform, non-specific binding across the genome, with no occupancy in the known specific binding sites in the regulatory region. We did detect a single binding site directly upstream of a known origin of replication, suggesting that TFAM may play a direct role in replication. Finally, although TFAM has been previously shown to localize to the nuclear genome, we found no evidence for such binding sites in our system.
To further understand the regulation of mtDNA by other proteins, we analyzed publicly available ChIP-seq datasets from ENCODE, modENCODE, and mouseENCODE for evidence of nuclear transcription factor binding to the mitochondrial genome. We identified eight human transcription factors and three mouse transcription factors that demonstrated binding events with the classical strand asymmetrical morphology of classical binding sites. ChIP-seq is a powerful tool for understanding the interactions between proteins and the mitochondrial genome, and future studies promise to further the understanding of how mtDNA is regulated within the nucleoid.
Resumo:
The objective of this thesis is to develop a framework to conduct velocity resolved - scalar modeled (VR-SM) simulations, which will enable accurate simulations at higher Reynolds and Schmidt (Sc) numbers than are currently feasible. The framework established will serve as a first step to enable future simulation studies for practical applications. To achieve this goal, in-depth analyses of the physical, numerical, and modeling aspects related to Sc>>1 are presented, specifically when modeling in the viscous-convective subrange. Transport characteristics are scrutinized by examining scalar-velocity Fourier mode interactions in Direct Numerical Simulation (DNS) datasets and suggest that scalar modes in the viscous-convective subrange do not directly affect large-scale transport for high Sc. Further observations confirm that discretization errors inherent in numerical schemes can be sufficiently large to wipe out any meaningful contribution from subfilter models. This provides strong incentive to develop more effective numerical schemes to support high Sc simulations. To lower numerical dissipation while maintaining physically and mathematically appropriate scalar bounds during the convection step, a novel method of enforcing bounds is formulated, specifically for use with cubic Hermite polynomials. Boundedness of the scalar being transported is effected by applying derivative limiting techniques, and physically plausible single sub-cell extrema are allowed to exist to help minimize numerical dissipation. The proposed bounding algorithm results in significant performance gain in DNS of turbulent mixing layers and of homogeneous isotropic turbulence. Next, the combined physical/mathematical behavior of the subfilter scalar-flux vector is analyzed in homogeneous isotropic turbulence, by examining vector orientation in the strain-rate eigenframe. The results indicate no discernible dependence on the modeled scalar field, and lead to the identification of the tensor-diffusivity model as a good representation of the subfilter flux. Velocity resolved - scalar modeled simulations of homogeneous isotropic turbulence are conducted to confirm the behavior theorized in these a priori analyses, and suggest that the tensor-diffusivity model is ideal for use in the viscous-convective subrange. Simulations of a turbulent mixing layer are also discussed, with the partial objective of analyzing Schmidt number dependence of a variety of scalar statistics. Large-scale statistics are confirmed to be relatively independent of the Schmidt number for Sc>>1, which is explained by the dominance of subfilter dissipation over resolved molecular dissipation in the simulations. Overall, the VR-SM framework presented is quite effective in predicting large-scale transport characteristics of high Schmidt number scalars, however, it is determined that prediction of subfilter quantities would entail additional modeling intended specifically for this purpose. The VR-SM simulations presented in this thesis provide us with the opportunity to overlap with experimental studies, while at the same time creating an assortment of baseline datasets for future validation of LES models, thereby satisfying the objectives outlined for this work.
Resumo:
In the first part of the thesis we explore three fundamental questions that arise naturally when we conceive a machine learning scenario where the training and test distributions can differ. Contrary to conventional wisdom, we show that in fact mismatched training and test distribution can yield better out-of-sample performance. This optimal performance can be obtained by training with the dual distribution. This optimal training distribution depends on the test distribution set by the problem, but not on the target function that we want to learn. We show how to obtain this distribution in both discrete and continuous input spaces, as well as how to approximate it in a practical scenario. Benefits of using this distribution are exemplified in both synthetic and real data sets.
In order to apply the dual distribution in the supervised learning scenario where the training data set is fixed, it is necessary to use weights to make the sample appear as if it came from the dual distribution. We explore the negative effect that weighting a sample can have. The theoretical decomposition of the use of weights regarding its effect on the out-of-sample error is easy to understand but not actionable in practice, as the quantities involved cannot be computed. Hence, we propose the Targeted Weighting algorithm that determines if, for a given set of weights, the out-of-sample performance will improve or not in a practical setting. This is necessary as the setting assumes there are no labeled points distributed according to the test distribution, only unlabeled samples.
Finally, we propose a new class of matching algorithms that can be used to match the training set to a desired distribution, such as the dual distribution (or the test distribution). These algorithms can be applied to very large datasets, and we show how they lead to improved performance in a large real dataset such as the Netflix dataset. Their computational complexity is the main reason for their advantage over previous algorithms proposed in the covariate shift literature.
In the second part of the thesis we apply Machine Learning to the problem of behavior recognition. We develop a specific behavior classifier to study fly aggression, and we develop a system that allows analyzing behavior in videos of animals, with minimal supervision. The system, which we call CUBA (Caltech Unsupervised Behavior Analysis), allows detecting movemes, actions, and stories from time series describing the position of animals in videos. The method summarizes the data, as well as it provides biologists with a mathematical tool to test new hypotheses. Other benefits of CUBA include finding classifiers for specific behaviors without the need for annotation, as well as providing means to discriminate groups of animals, for example, according to their genetic line.
Resumo:
Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity are, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes.
Resumo:
The Inter-American Tropical Tuna Commission (IATTC) staff has been sampling the size distributions of tunas in the eastern Pacific Ocean (EPO) since 1954, and the species composition of the catches since 2000. The IATTC staff use the data from the species composition samples, in conjunction with observer and/or logbook data, and unloading data from the canneries to estimate the total annual catches of yellowfin (Thunnus albacares), skipjack (Katsuwonus pelamis), and bigeye (Thunnus obesus) tunas. These sample data are collected based on a stratified sampling design. I propose an update of the stratification of the EPO into more homogenous areas in order to reduce the variance in the estimates of the total annual catches and incorporate the geographical shifts resulting from the expansion of the floating-object fishery during the 1990s. The sampling model used by the IATTC is a stratified two-stage (cluster) random sampling design with first stage units varying (unequal) in size. The strata are month, area, and set type. Wells, the first cluster stage, are selected to be sampled only if all of the fish were caught in the same month, same area, and same set type. Fish, the second cluster stage, are sampled for lengths, and independently, for species composition of the catch. The EPO is divided into 13 sampling areas, which were defined in 1968, based on the catch distributions of yellowfin and skipjack tunas. This area stratification does not reflect the multi-species, multi-set-type fishery of today. In order to define more homogenous areas, I used agglomerative cluster analysis to look for groupings of the size data and the catch and effort data for 2000–2006. I plotted the results from both datasets against the IATTC Sampling Areas, and then created new areas. I also used the results of the cluster analysis to update the substitution scheme for strata with catch, but no sample. I then calculated the total annual catch (and variance) by species by stratifying the data into new Proposed Sampling Areas and compared the results to those reported by the IATTC. Results showed that re-stratifying the areas produced smaller variances of the catch estimates for some species in some years, but the results were not significant.
Resumo:
Lonchophylla bokermanniSazima, Vizotto & Taddei, 1978 é uma espécie de morcego nectarívoro de médio porte endêmica do sudeste brasileiro. Pouco se sabe sobre sua biologia e distribuição geográfica, e por isso mesmo é classificada pela União para Conservação da Natureza (IUCN) como Deficiente de Dados. Está, no entanto, na lista brasileira da fauna ameaçada de extinção, sendo considerada Vulnerável por apresentar distribuição restrita, populações pequenas e isoladas, e estar vivenciando uma rápida destruição de seus habitats.Uma das mais importantes lacunas no conhecimento sobre L. bokermanni é o seu padrão de distribuição geográfica. Esta espécie possui uma distribuição disjunta, com uma forma na porção interior de sua distribuição, restrita aos arredores de sua localidade tipo, e uma forma com uma distribuição mais ampla, entre a Serra do Mar e o litoral. Existe a possibilidade de que a forma costeira possa corresponder a uma espécie ainda não descrita, visto que possui antebraços menores e algumas medidas cranianasdiferentes em relação a forma do interior.Nesta dissertação procuro gerar dados quantitativos mínimos necessários para determinar o status de conservação de L. bokermanni segundo os critérios da IUCN. Tendo em vista as incertezas taxonômicas, sempre que possível as análises foram feitas com três conjuntos de dados: i) todos os registros de ocorrência, assumindo que representam uma única espécie, ii) apenas com os dados da forma do interior, assumindo que representam L. bokermanni, e iii) apenas com os dados da forma costeira, assumindo que representam uma nova espécie. No primeiro capítulo foram identificadas áreas prioritárias para a busca de novas populações de L. bokermanni Essas áreas apresentam as condições climáticas e altitudinais típicas para a espécie, mantêm sua cobertura florestal, têm poucos inventários de quirópteros e estão fora da área de distribuição conhecida da espécie. O capítulo também apresenta o resultado da busca em campo por novas populações da espécie em três destas áreas prioritárias, ao sul da distribuição conhecida. No segundo capítulo a probabilidade de detecção e ocupação de Lonchophylla bokermanni foi modelada em escala regional e local, utilizando covariáveis ambientais e metodológicas que podem explicar os padrões encontrados. O grau de incerteza na distribuição conhecida da espécie foi avaliado, e estimou-se o esforço mínimo necessário para termos confiançana ausência da espécie em uma localidade. No terceiro capítulo a informação apresentada nos capítulos anteriores foi utilizada para determinar o status de conservação de L. bokermanni (segundo o critério de Extensão de Ocorrência da IUCN), discutir o estado atual de conhecimento sobre a espécie e as consequências de possíveis mudanças taxonômicas para seu status de conservação.
Resumo:
Dados faltantes são um problema comum em estudos epidemiológicos e, dependendo da forma como ocorrem, as estimativas dos parâmetros de interesse podem estar enviesadas. A literatura aponta algumas técnicas para se lidar com a questão, e, a imputação múltipla vem recebendo destaque nos últimos anos. Esta dissertação apresenta os resultados da utilização da imputação múltipla de dados no contexto do Estudo Pró-Saúde, um estudo longitudinal entre funcionários técnico-administrativos de uma universidade no Rio de Janeiro. No primeiro estudo, após simulação da ocorrência de dados faltantes, imputou-se a variável cor/raça das participantes, e aplicou-se um modelo de análise de sobrevivência previamente estabelecido, tendo como desfecho a história auto-relatada de miomas uterinos. Houve replicação do procedimento (100 vezes) para se determinar a distribuição dos coeficientes e erros-padrão das estimativas da variável de interesse. Apesar da natureza transversal dos dados aqui utilizados (informações da linha de base do Estudo Pró-Saúde, coletadas em 1999 e 2001), buscou-se resgatar a história do seguimento das participantes por meio de seus relatos, criando uma situação na qual a utilização do modelo de riscos proporcionais de Cox era possível. Nos cenários avaliados, a imputação demonstrou resultados satisfatórios, inclusive quando da avaliação de performance realizada. A técnica demonstrou um bom desempenho quando o mecanismo de ocorrência dos dados faltantes era do tipo MAR (Missing At Random) e o percentual de não-resposta era de 10%. Ao se imputar os dados e combinar as estimativas obtidas nos 10 bancos (m=10) gerados, o viés das estimativas era de 0,0011 para a categoria preta e 0,0015 para pardas, corroborando a eficiência da imputação neste cenário. Demais configurações também apresentaram resultados semelhantes. No segundo artigo, desenvolve-se um tutorial para aplicação da imputação múltipla em estudos epidemiológicos, que deverá facilitar a utilização da técnica por pesquisadores brasileiros ainda não familiarizados com o procedimento. São apresentados os passos básicos e decisões necessárias para se imputar um banco de dados, e um dos cenários utilizados no primeiro estudo é apresentado como exemplo de aplicação da técnica. Todas as análises foram conduzidas no programa estatístico R, versão 2.15 e os scripts utilizados são apresentados ao final do texto.
Resumo:
Lymphangioleiomyomatosis (LAM) is a rare lung-metastasizing neoplasm caused by the proliferation of smooth muscle-like cells that commonly carry loss-of-function mutations in either the tuberous sclerosis complex 1 or 2 (TSC1 or TSC2) genes. While allosteric inhibition of the mechanistic target of rapamycin (mTOR) has shown substantial clinical benefit, complementary therapies are required to improve response and/or to treat specific patients. However, there is a lack of LAM biomarkers that could potentially be used to monitor the disease and to develop other targeted therapies. We hypothesized that the mediators of cancer metastasis to lung, particularly in breast cancer, also play a relevant role in LAM. Analyses across independent breast cancer datasets revealed associations between low TSC1/2 expression, altered mTOR complex 1 (mTORC1) pathway signaling, and metastasis to lung. Subsequently, immunohistochemical analyses of 23 LAM lesions revealed positivity in all cases for the lung metastasis mediators fascin 1 (FSCN1) and inhibitor of DNA binding 1 (ID1). Moreover, assessment of breast cancer stem or luminal progenitor cell biomarkers showed positivity in most LAM tissue for the aldehyde dehydrogenase 1 (ALDH1), integrin-beta 3 (ITGB3/CD61), and/or the sex-determining region Y-box 9 (SOX9) proteins. The immunohistochemical analyses also provided evidence of heterogeneity between and within LAM cases. The analysis of Tsc2-deficient cells revealed relative over-expression of FSCN1 and ID1; however, Tsc2-deficient cells did not show higher sensitivity to ID1-based cancer inhibitors. Collectively, the results of this study reveal novel LAM biomarkers linked to breast cancer metastasis to lung and to cell stemness, which in turn might guide the assessment of additional or complementary therapeutic opportunities for LAM.
Resumo:
Elucidating the intricate relationship between brain structure and function, both in healthy and pathological conditions, is a key challenge for modern neuroscience. Recent progress in neuroimaging has helped advance our understanding of this important issue, with diffusion images providing information about structural connectivity (SC) and functional magnetic resonance imaging shedding light on resting state functional connectivity (rsFC). Here, we adopt a systems approach, relying on modular hierarchical clustering, to study together SC and rsFC datasets gathered independently from healthy human subjects. Our novel approach allows us to find a common skeleton shared by structure and function from which a new, optimal, brain partition can be extracted. We describe the emerging common structure-function modules (SFMs) in detail and compare them with commonly employed anatomical or functional parcellations. Our results underline the strong correspondence between brain structure and resting-state dynamics as well as the emerging coherent organization of the human brain.
Resumo:
We review the appropriateness of using SNIa observations to detect potential signatures of anisotropic expansion in the Universe. We focus on Union2 and SNLS3 SNIa datasets and use the hemispherical comparison method to detect possible anisotropic features. Unlike some previous works where nondiagonal elements of the covariance matrix were neglected, we use the full covariance matrix of the SNIa data, thus obtaining more realistic and not underestimated errors. As a matter of fact, the significance of previously claimed detections of a preferred direction in the Union2 dataset completely disappears once we include the effects of using the full covariance matrix. Moreover, we also find that such apreferred direction is aligned with the orthogonal direction of the SDSS observational plane and this suggests a clear indication that the SDSS subsample of the Union2 dataset introduces a significant bias, making the detected preferred direction unphysical. We thus find that current SNIa surveys are inappropriate to test anisotropic features due to their highly non-homogeneous angular distribution in the sky. In addition, after removal of the highest in homogeneous sub-samples, the number of SNIa is too low. Finally, we take advantage of the particular distribution of SNLS SNIa sub- sample in the SNLS3 data set, in which the observations were taken along four different directions. We fit each direction independently and find consistent results at the 1 sigma level. Although the likelihoods peak at relatively different values of Omega(m), the low number of data along each direction gives rise to large errors so that the likelihoods are sufficiently broad as to overlap within 1 sigma. (C) 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// creativecommons. org/licenses/by/4.0/).