935 resultados para Complex data
                                
Resumo:
Genome-wide association studies have been instrumental in identifying genetic variants associated with complex traits such as human disease or gene expression phenotypes. It has been proposed that extending existing analysis methods by considering interactions between pairs of loci may uncover additional genetic effects. However, the large number of possible two-marker tests presents significant computational and statistical challenges. Although several strategies to detect epistasis effects have been proposed and tested for specific phenotypes, so far there has been no systematic attempt to compare their performance using real data. We made use of thousands of gene expression traits from linkage and eQTL studies, to compare the performance of different strategies. We found that using information from marginal associations between markers and phenotypes to detect epistatic effects yielded a lower false discovery rate (FDR) than a strategy solely using biological annotation in yeast, whereas results from human data were inconclusive. For future studies whose aim is to discover epistatic effects, we recommend incorporating information about marginal associations between SNPs and phenotypes instead of relying solely on biological annotation. Improved methods to discover epistatic effects will result in a more complete understanding of complex genetic effects.
                                
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
                                
Resumo:
It is well established that interactions between CD4(+) T cells and major histocompatibility complex class II (MHCII) positive antigen-presenting cells (APCs) of hematopoietic origin play key roles in both the maintenance of tolerance and the initiation and development of autoimmune and inflammatory disorders. In sharp contrast, despite nearly three decades of intensive research, the functional relevance of MHCII expression by non-hematopoietic tissue-resident cells has remained obscure. The widespread assumption that MHCII expression by non-hematopoietic APCs has an impact on autoimmune and inflammatory diseases has in most instances neither been confirmed nor excluded by indisputable in vivo data. Here we review and put into perspective conflicting in vitro and in vivo results on the putative impact of MHCII expression by non-hematopoietic APCs-in both target organs and secondary lymphoid tissues-on the initiation and development of representative autoimmune and inflammatory disorders. Emphasis will be placed on the lacunar status of our knowledge in this field. We also discuss new mouse models-developed on the basis of our understanding of the molecular mechanisms that regulate MHCII expression-that constitute valuable tools for filling the severe gaps in our knowledge on the functions of non-hematopoietic APCs in inflammatory conditions.
                                
Resumo:
Synaptic transmission depends critically on the Sec1p/Munc18 protein Munc18-1, but it is unclear whether Munc18-1 primarily operates as a integral part of the fusion machinery or has a more upstream role in fusion complex assembly. Here, we show that point mutations in Munc18-1 that interfere with binding to the free Syntaxin1a N-terminus and strongly impair binding to assembled SNARE complexes all support normal docking, priming and fusion of synaptic vesicles, and normal synaptic plasticity in munc18-1 null mutant neurons. These data support a prevailing role of Munc18-1 before/during SNARE-complex assembly, while its continued association to assembled SNARE complexes is dispensable for synaptic transmission.
                                
Resumo:
The identification of endogenously produced antigenic peptides presented by MHC class I molecules has opened the way to peptide-based strategies for CTL induction in vivo. Here we demonstrate that the induction in vivo of CTL directed against naturally processed antigens can be triggered by injection of syngeneic cells expressing covalent major histocompatibility complex class I-peptide complexes. In the model system used, the induction of HLA-Cw3 specific cytotoxic T lymphocytes (CTL) in mice by cell surface-associated, covalent H-2Kd (Kd)-Cw3 peptide complexes was investigated. The Kd-restricted Cw3 peptide 170-179 (RYLKNGKETL), which mimics the major natural epitope recognized by Cw3-specific CTL in H-2d mice, was converted to a photoreactive derivative by replacing Arg-170 with N-beta-(4-azidosalicyloyl)-L-2,3-diaminopropionic acid. This peptide derivative was equivalent to the parental Cw3 peptide in terms of binding to Kd molecules and recognition by Cw3-specific CTL clones and could be cross-linked efficiently and selectively to Kd molecules on the surface of Con A-stimulated spleen cells from H-2d mice. Photocross-linking prevented the rapid dissociation of Kd-peptide derivative complexes that takes place under physiological conditions. Cultures of spleen cells or peritoneal exudate cells from mice inoculated i.p. with peptide-pulsed and photocross-linked cells developed a strong CTL response following antigenic stimulation in vitro. The cultured cells efficiently lysed not only target cells sensitized with the Cw3 170-179 peptide but also target cells transfected with the Cw3 gene. Moreover, their TCR preferentially expressed V beta 10 and J alpha pHDS58 segments as well as conserved junctional sequences, as has been observed previously in Cw3-specific CTL responses. In contrast, no Cw3-specific CTL response could be obtained in cultures derived from mice injected with Con A-stimulated spleen cells pulsed with the peptide derivative without photocross-linking.
                                
Resumo:
To study the major histocompatibility complex class II I-E dependence of mouse mammary tumor virus (MMTV) superantigens, we constructed hybrids between the I-E-dependent MMTV(GR) and the I-E-independent mtv-7 superantigens and tested them in vivo. Our results suggest that, although the C-terminal third mediates I-A interaction, additional binding sites are located elsewhere in the superantigen.
                                
Resumo:
The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.
                                
Resumo:
Application of semi-distributed hydrological models to large, heterogeneous watersheds deals with several problems. On one hand, the spatial and temporal variability in catchment features should be adequately represented in the model parameterization, while maintaining the model complexity in an acceptable level to take advantage of state-of-the-art calibration techniques. On the other hand, model complexity enhances uncertainty in adjusted model parameter values, therefore increasing uncertainty in the water routing across the watershed. This is critical for water quality applications, where not only streamflow, but also a reliable estimation of the surface versus subsurface contributions to the runoff is needed. In this study, we show how a regularized inversion procedure combined with a multiobjective function calibration strategy successfully solves the parameterization of a complex application of a water quality-oriented hydrological model. The final value of several optimized parameters showed significant and consistentdifferences across geological and landscape features. Although the number of optimized parameters was significantly increased by the spatial and temporal discretization of adjustable parameters, the uncertainty in water routing results remained at reasonable values. In addition, a stepwise numerical analysis showed that the effects on calibration performance due to inclusion of different data types in the objective function could be inextricably linked. Thus caution should be taken when adding or removing data from an aggregated objective function.
                                
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
                                
Resumo:
Regulator of G-protein signalling (RGS) proteins negatively regulate heterotrimeric G-protein signalling through their conserved RGS domains. RGS domains act as GTPase-activating proteins, accelerating the GTP hydrolysis rate of the activated form of Gα-subunits. Although omnipresent in eukaryotes, RGS proteins have not been adequately analysed in non-mammalian organisms. The Drosophila melanogaster Gαo-subunit and the RGS domain of its interacting partner CG5036 have been overproduced and purified; the crystallization of the complex of the two proteins using PEG 4000 as a crystallizing agent and preliminary X-ray crystallographic analysis are reported. Diffraction data were collected to 2.0 Å resolution using a synchrotron-radiation source.
                                
Resumo:
In order to contribute to the debate about southern glacial refugia used by temperate species and more northern refugia used by boreal or cold-temperate species, we examined the phylogeography of a widespread snake species (Vipera berus) inhabiting Europe up to the Arctic Circle. The analysis of the mitochondrial DNA (mtDNA) sequence variation in 1043 bp of the cytochrome b gene and in 918 bp of the noncoding control region was performed with phylogenetic approaches. Our results suggest that both the duplicated control region and cytochrome b evolve at a similar rate in this species. Phylogenetic analysis showed that V. berus is divided into three major mitochondrial lineages, probably resulting from an Italian, a Balkan and a Northern (from France to Russia) refugial area in Eastern Europe, near the Carpathian Mountains. In addition, the Northern clade presents an important substructure, suggesting two sequential colonization events in Europe. First, the continent was colonized from the three main refugial areas mentioned above during the Lower-Mid Pleistocene. Second, recolonization of most of Europe most likely originated from several refugia located outside of the Mediterranean peninsulas (Carpathian region, east of the Carpathians, France and possibly Hungary) during the Mid-Late Pleistocene, while populations within the Italian and Balkan Peninsulas fluctuated only slightly in distribution range, with larger lowland populations during glacial times and with refugial mountain populations during interglacials, as in the present time. The phylogeographical structure revealed in our study suggests complex recolonization dynamics of the European continent by V. berus, characterized by latitudinal as well as altitudinal range shifts, driven by both climatic changes and competition with related species.
                                
Resumo:
The mature TCR is composed of a clonotypic heterodimer (alpha beta or gamma delta) associated with the invariant CD3 components (gamma, delta, epsilon and zeta). There is now considerable evidence that more immature forms of the TCR-CD3 complex (consisting of either CD3 alone or CD3 associated with a heterodimer of TCR beta and pre-T alpha) can be expressed at the cell surface on early thymocytes. These pre-TCR complexes are believed to be necessary for the ordered progression of early T cell development. We have analyzed in detail the expression of both the pre-TCR and CD3 complex at various stages of adult thymus development. Our data indicate that all CD3 components are already expressed at the mRNA level by the earliest identifiable (CD4lo) thymic precursor. In contrast, genes encoding the pre-TCR complex (pre-T alpha and fully rearranged TCR beta) are first expressed at the CD44loCD25+CD4-CD8- stage. Detectable surface expression of both CD3 and TCR beta are delayed relative to expression of the corresponding genes, suggesting the existence of other (as yet unidentified) components of the pre-TCR complex.
                                
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
                                
Resumo:
Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.
                                
Resumo:
PURPOSE: The aim of this study was to develop models based on kernel regression and probability estimation in order to predict and map IRC in Switzerland by taking into account all of the following: architectural factors, spatial relationships between the measurements, as well as geological information. METHODS: We looked at about 240,000 IRC measurements carried out in about 150,000 houses. As predictor variables we included: building type, foundation type, year of construction, detector type, geographical coordinates, altitude, temperature and lithology into the kernel estimation models. We developed predictive maps as well as a map of the local probability to exceed 300 Bq/m(3). Additionally, we developed a map of a confidence index in order to estimate the reliability of the probability map. RESULTS: Our models were able to explain 28% of the variations of IRC data. All variables added information to the model. The model estimation revealed a bandwidth for each variable, making it possible to characterize the influence of each variable on the IRC estimation. Furthermore, we assessed the mapping characteristics of kernel estimation overall as well as by municipality. Overall, our model reproduces spatial IRC patterns which were already obtained earlier. On the municipal level, we could show that our model accounts well for IRC trends within municipal boundaries. Finally, we found that different building characteristics result in different IRC maps. Maps corresponding to detached houses with concrete foundations indicate systematically smaller IRC than maps corresponding to farms with earth foundation. CONCLUSIONS: IRC mapping based on kernel estimation is a powerful tool to predict and analyze IRC on a large-scale as well as on a local level. This approach enables to develop tailor-made maps for different architectural elements and measurement conditions and to account at the same time for geological information and spatial relations between IRC measurements.
 
                    