972 resultados para computational study


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The electron hole transfer (HT) properties of DNA are substantially affected by thermal fluctuations of the π stack structure. Depending on the mutual position of neighboring nucleobases, electronic coupling V may change by several orders of magnitude. In the present paper, we report the results of systematic QM/molecular dynamic (MD) calculations of the electronic couplings and on-site energies for the hole transfer. Based on 15 ns MD trajectories for several DNA oligomers, we calculate the average coupling squares 〈 V2 〉 and the energies of basepair triplets X G+ Y and X A+ Y, where X, Y=G, A, T, and C. For each of the 32 systems, 15 000 conformations separated by 1 ps are considered. The three-state generalized Mulliken-Hush method is used to derive electronic couplings for HT between neighboring basepairs. The adiabatic energies and dipole moment matrix elements are computed within the INDO/S method. We compare the rms values of V with the couplings estimated for the idealized B -DNA structure and show that in several important cases the couplings calculated for the idealized B -DNA structure are considerably underestimated. The rms values for intrastrand couplings G-G, A-A, G-A, and A-G are found to be similar, ∼0.07 eV, while the interstrand couplings are quite different. The energies of hole states G+ and A+ in the stack depend on the nature of the neighboring pairs. The X G+ Y are by 0.5 eV more stable than X A+ Y. The thermal fluctuations of the DNA structure facilitate the HT process from guanine to adenine. The tabulated couplings and on-site energies can be used as reference parameters in theoretical and computational studies of HT processes in DNA

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: The importance of hemodynamics in the etiopathogenesis of intracranial aneurysms (IAs) is widely accepted.Computational fluid dynamics (CFD) is being used increasingly for hemodynamic predictions. However, alogn with thecontinuing development and validation of these tools, it is imperative to collect the opinion of the clinicians. Methods: A workshopon CFD was conducted during the European Society of Minimally Invasive Neurological Therapy (ESMINT) Teaching Course,Lisbon, Portugal. 36 delegates, mostly clinicians, performed supervised CFD analysis for an IA, using the @neuFuse softwaredeveloped within the European project @neurIST. Feedback on the workshop was collected and analyzed. The performancewas assessed on a scale of 1 to 4 and, compared with experts’ performance. Results: Current dilemmas in the management ofunruptured IAs remained the most important motivating factor to attend the workshop and majority of participants showedinterest in participating in a multicentric trial. The participants achieved an average score of 2.52 (range 0–4) which was 63% (range 0–100%) of an expert user. Conclusions: Although participants showed a manifest interest in CFD, there was a clear lack ofawareness concerning the role of hemodynamics in the etiopathogenesis of IAs and the use of CFD in this context. More effortstherefore are required to enhance understanding of the clinicians in the subject.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last few years, there has been a growing focus on faster computational methods to support clinicians in planning stenting procedures. This study investigates the possibility of introducing computational approximations in modelling stent deployment in aneurysmatic cerebral vessels to achieve simulations compatible with the constraints of real clinical workflows. The release of a self-expandable stent in a simplified aneurysmatic vessel was modelled in four different initial positions. Six progressively simplified modelling approaches (based on Finite Element method and Fast Virtual Stenting – FVS) have been used. Comparing accuracy of the results, the final configuration of the stent is more affected by neglecting mechanical properties of materials (FVS) than by adopting 1D instead of 3D stent models. Nevertheless, the differencesshowed are acceptable compared to those achieved by considering different stent initial positions. Regarding computationalcosts, simulations involving 1D stent features are the only ones feasible in clinical context.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVES: The reconstruction of the right ventricular outflow tract (RVOT) with valved conduits remains a challenge. The reoperation rate at 5 years can be as high as 25% and depends on age, type of conduit, conduit diameter and principal heart malformation. The aim of this study is to provide a bench model with computer fluid dynamics to analyse the haemodynamics of the RVOT, pulmonary artery, its bifurcation, and left and right pulmonary arteries that in the future may serve as a tool for analysis and prediction of outcome following RVOT reconstruction. METHODS: Pressure, flow and diameter at the RVOT, pulmonary artery, bifurcation of the pulmonary artery, and left and right pulmonary arteries were measured in five normal pigs with a mean weight of 24.6 ± 0.89 kg. Data obtained were used for a 3D computer fluid-dynamics simulation of flow conditions, focusing on the pressure, flow and shear stress profile of the pulmonary trunk to the level of the left and right pulmonary arteries. RESULTS: Three inlet steady flow profiles were obtained at 0.2, 0.29 and 0.36 m/s that correspond to the flow rates of 1.5, 2.0 and 2.5 l/min flow at the RVOT. The flow velocity profile was constant at the RVOT down to the bifurcation and decreased at the left and right pulmonary arteries. In all three inlet velocity profiles, low sheer stress and low-velocity areas were detected along the left wall of the pulmonary artery, at the pulmonary artery bifurcation and at the ostia of both pulmonary arteries. CONCLUSIONS: This computed fluid real-time model provides us with a realistic picture of fluid dynamics in the pulmonary tract area. Deep shear stress areas correspond to a turbulent flow profile that is a predictive factor for the development of vessel wall arteriosclerosis. We believe that this bench model may be a useful tool for further evaluation of RVOT pathology following surgical reconstructions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The activation of the specific immune response against tumor cells is based on the recognition by the CD8+ Cytotoxic Τ Lymphocytes (CTL), of antigenic peptides (p) presented at the surface of the cell by the class I major histocompatibility complex (MHC). The ability of the so-called T-Cell Receptors (TCR) to discriminate between self and non-self peptides constitutes the most important specific control mechanism against infected cells. The TCR/pMHC interaction has been the subject of much attention in cancer therapy since the design of the adoptive transfer approach, in which Τ lymphocytes presenting an interesting response against tumor cells are extracted from the patient, expanded in vitro, and reinfused after immunodepletion, possibly leading to cancer regression. In the last decade, major progress has been achieved by the introduction of engineered lypmhocytes. In the meantime, the understanding of the molecular aspects of the TCRpMHC interaction has become essential to guide in vitro and in vivo studies. In 1996, the determination of the first structure of a TCRpMHC complex by X-ray crystallography revealed the molecular basis of the interaction. Since then, molecular modeling techniques have taken advantage of crystal structures to study the conformational space of the complex, and understand the specificity of the recognition of the pMHC by the TCR. In the meantime, experimental techniques used to determine the sequences of TCR that bind to a pMHC complex have been used intensively, leading to the collection of large repertoires of TCR sequences that are specific for a given pMHC. There is a growing need for computational approaches capable of predicting the molecular interactions that occur upon TCR/pMHC binding without relying on the time consuming resolution of a crystal structure. This work presents new approaches to analyze the molecular principles that govern the recognition of the pMHC by the TCR and the subsequent activation of the T-cell. We first introduce TCRep 3D, a new method to model and study the structural properties of TCR repertoires, based on homology and ab initio modeling. We discuss the methodology in details, and demonstrate that it outperforms state of the art modeling methods in predicting relevant TCR conformations. Two successful applications of TCRep 3D that supported experimental studies on TCR repertoires are presented. Second, we present a rigid body study of TCRpMHC complexes that gives a fair insight on the TCR approach towards pMHC. We show that the binding mode of the TCR is correctly described by long-distance interactions. Finally, the last section is dedicated to a detailed analysis of an experimental hydrogen exchange study, which suggests that some regions of the constant domain of the TCR are subject to conformational changes upon binding to the pMHC. We propose a hypothesis of the structural signaling of TCR molecules leading to the activation of the T-cell. It is based on the analysis of correlated motions in the TCRpMHC structure. - L'activation de la réponse immunitaire spécifique dirigée contre les cellules tumorales est basée sur la reconnaissance par les Lymphocytes Τ Cytotoxiques (CTL), d'un peptide antigénique (p) présenté à la suface de la cellule par le complexe majeur d'histocompatibilité de classe I (MHC). La capacité des récepteurs des lymphocytes (TCR) à distinguer les peptides endogènes des peptides étrangers constitue le mécanisme de contrôle le plus important dirigé contre les cellules infectées. L'interaction entre le TCR et le pMHC est le sujet de beaucoup d'attention dans la thérapie du cancer, depuis la conception de la méthode de transfer adoptif: les lymphocytes capables d'une réponse importante contre les cellules tumorales sont extraits du patient, amplifiés in vitro, et réintroduits après immunosuppression. Il peut en résulter une régression du cancer. Ces dix dernières années, d'importants progrès ont été réalisés grâce à l'introduction de lymphocytes modifiés par génie génétique. En parallèle, la compréhension du TCRpMHC au niveau moléculaire est donc devenue essentielle pour soutenir les études in vitro et in vivo. En 1996, l'obtention de la première structure du complexe TCRpMHC à l'aide de la cristallographie par rayons X a révélé les bases moléculaires de l'interaction. Depuis lors, les techniques de modélisation moléculaire ont exploité les structures expérimentales pour comprendre la spécificité de la reconnaissance du pMHC par le TCR. Dans le même temps, de nouvelles techniques expérimentales permettant de déterminer la séquence de TCR spécifiques envers un pMHC donné, ont été largement exploitées. Ainsi, d'importants répertoires de TCR sont devenus disponibles, et il est plus que jamais nécessaire de développer des approches informatiques capables de prédire les interactions moléculaires qui ont lieu lors de la liaison du TCR au pMHC, et ce sans dépendre systématiquement de la résolution d'une structure cristalline. Ce mémoire présente une nouvelle approche pour analyser les principes moléculaires régissant la reconnaissance du pMHC par le TCR, et l'activation du lymphocyte qui en résulte. Dans un premier temps, nous présentons TCRep 3D, une nouvelle méthode basée sur les modélisations par homologie et ab initio, pour l'étude de propriétés structurales des répertoires de TCR. Le procédé est discuté en détails et comparé à des approches standard. Nous démontrons ainsi que TCRep 3D est le plus performant pour prédire des conformations pertinentes du TCR. Deux applications à des études expérimentales des répertoires TCR sont ensuite présentées. Dans la seconde partie de ce travail nous présentons une étude de complexes TCRpMHC qui donne un aperçu intéressant du mécanisme d'approche du pMHC par le TCR. Finalement, la dernière section se concentre sur l'analyse détaillée d'une étude expérimentale basée sur les échanges deuterium/hydrogène, dont les résultats révèlent que certaines régions clés du domaine constant du TCR sont sujettes à un changement conformationnel lors de la liaison au pMHC. Nous proposons une hypothèse pour la signalisation structurelle des TCR, menant à l'activation du lymphocyte. Celle-ci est basée sur l'analyse des mouvements corrélés observés dans la structure du TCRpMHC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vector of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics, and is a variant of the perfect phylogeny haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is "full" genotypes, here we assume less informative input, and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time, by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by that tree they map onto, and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes, and present a linear algorithm for identifying those genotypes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

MicroRNAs (miRs) are involved in the pathogenesis of several neoplasms; however, there are no data on their expression patterns and possible roles in adrenocortical tumors. Our objective was to study adrenocortical tumors by an integrative bioinformatics analysis involving miR and transcriptomics profiling, pathway analysis, and a novel, tissue-specific miR target prediction approach. Thirty-six tissue samples including normal adrenocortical tissues, benign adenomas, and adrenocortical carcinomas (ACC) were studied by simultaneous miR and mRNA profiling. A novel data-processing software was used to identify all predicted miR-mRNA interactions retrieved from PicTar, TargetScan, and miRBase. Tissue-specific target prediction was achieved by filtering out mRNAs with undetectable expression and searching for mRNA targets with inverse expression alterations as their regulatory miRs. Target sets and significant microarray data were subjected to Ingenuity Pathway Analysis. Six miRs with significantly different expression were found. miR-184 and miR-503 showed significantly higher, whereas miR-511 and miR-214 showed significantly lower expression in ACCs than in other groups. Expression of miR-210 was significantly lower in cortisol-secreting adenomas than in ACCs. By calculating the difference between dCT(miR-511) and dCT(miR-503) (delta cycle threshold), ACCs could be distinguished from benign adenomas with high sensitivity and specificity. Pathway analysis revealed the possible involvement of G2/M checkpoint damage in ACC pathogenesis. To our knowledge, this is the first report describing miR expression patterns and pathway analysis in sporadic adrenocortical tumors. miR biomarkers may be helpful for the diagnosis of adrenocortical malignancy. This tissue-specific target prediction approach may be used in other tumors too.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper demonstrates a novel distributed architecture to facilitate the acquisition of Language Resources. We build a factory that automates the stages involved in the acquisition, production, updating and maintenance of these resources. The factory is designed as a platform where functionalities are deployed as web services, which can be combined in complex acquisition chains using workflows. We show a case study, which acquires a Translation Memory for a given pair of languages and a domain using web services for crawling, sentence alignment and conversion to TMX.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acid-sensing ion channels (ASICs) are key receptors for extracellular protons. These neuronal nonvoltage-gated Na(+) channels are involved in learning, the expression of fear, neurodegeneration after ischemia, and pain sensation. We have applied a systematic approach to identify potential pH sensors in ASIC1a and to elucidate the mechanisms by which pH variations govern ASIC gating. We first calculated the pK(a) value of all extracellular His, Glu, and Asp residues using a Poisson-Boltzmann continuum approach, based on the ASIC three-dimensional structure, to identify candidate pH-sensing residues. The role of these residues was then assessed by site-directed mutagenesis and chemical modification, combined with functional analysis. The localization of putative pH-sensing residues suggests that pH changes control ASIC gating by protonation/deprotonation of many residues per subunit in different channel domains. Analysis of the function of residues in the palm domain close to the central vertical axis of the channel allowed for prediction of conformational changes of this region during gating. Our study provides a basis for the intrinsic ASIC pH dependence and describes an approach that can also be applied to the investigation of the mechanisms of the pH dependence of other proteins.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vector of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics and is a variant of the PP haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is '' full '' genotypes, here, we assume less informative input and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by the tree they map onto and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes and present a linear algorithm for identifying those genotypes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

AbstractAlthough the genomes from any two human individuals are more than 99.99% identical at the sequence level, some structural variation can be observed. Differences between genomes include single nucleotide polymorphism (SNP), inversion and copy number changes (gain or loss of DNA). The latter can range from submicroscopic events (CNVs, at least 1kb in size) to complete chromosomal aneuploidies. Small copy number variations have often no (lethal) consequences to the cell, but a few were associated to disease susceptibility and phenotypic variations. Larger re-arrangements (i.e. complete chromosome gain) are frequently associated with more severe consequences on health such as genomic disorders and cancer. High-throughput technologies like DNA microarrays enable the detection of CNVs in a genome-wide fashion. Since the initial catalogue of CNVs in the human genome in 2006, there has been tremendous interest in CNVs both in the context of population and medical genetics. Understanding CNV patterns within and between human populations is essential to elucidate their possible contribution to disease. But genome analysis is a challenging task; the technology evolves rapidly creating needs for novel, efficient and robust analytical tools which need to be compared with existing ones. Also, while the link between CNV and disease has been established, the relative CNV contribution is not fully understood and the predisposition to disease from CNVs of the general population has not been yet investigated.During my PhD thesis, I worked on several aspects related to CNVs. As l will report in chapter 3, ! was interested in computational methods to detect CNVs from the general population. I had access to the CoLaus dataset, a population-based study with more than 6,000 participants from the Lausanne area. All these individuals were analysed on SNP arrays and extensive clinical information were available. My work explored existing CNV detection methods and I developed a variety of metrics to compare their performance. Since these methods were not producing entirely satisfactory results, I implemented my own method which outperformed two existing methods. I also devised strategies to combine CNVs from different individuals into CNV regions.I was also interested in the clinical impact of CNVs in common disease (chapter 4). Through an international collaboration led by the Centre Hospitalier Universitaire Vaudois (CHUV) and the Imperial College London I was involved as a main data analyst in the investigation of a rare deletion at chromosome 16p11 detected in obese patients. Specifically, we compared 8,456 obese patients and 11,856 individuals from the general population and we found that the deletion was accounting for 0.7% of the morbid obesity cases and was absent in healthy non- obese controls. This highlights the importance of rare variants with strong impact and provides new insights in the design of clinical studies to identify the missing heritability in common disease.Furthermore, I was interested in the detection of somatic copy number alterations (SCNA) and their consequences in cancer (chapter 5). This project was a collaboration initiated by the Ludwig Institute for Cancer Research and involved other groups from the Swiss Institute of Bioinformatics, the CHUV and Universities of Lausanne and Geneva. The focus of my work was to identify genes with altered expression levels within somatic copy number alterations (SCNA) in seven metastatic melanoma ceil lines, using CGH and SNP arrays, RNA-seq, and karyotyping. Very few SCNA genes were shared by even two melanoma samples making it difficult to draw any conclusions at the individual gene level. To overcome this limitation, I used a network-guided analysis to determine whether any pathways, defined by amplified or deleted genes, were common among the samples. Six of the melanoma samples were potentially altered in four pathways and five samples harboured copy-number and expression changes in components of six pathways. In total, this approach identified 28 pathways. Validation with two external, large melanoma datasets confirmed all but three of the detected pathways and demonstrated the utility of network-guided approaches for both large and small datasets analysis.RésuméBien que le génome de deux individus soit similaire à plus de 99.99%, des différences de structure peuvent être observées. Ces différences incluent les polymorphismes simples de nucléotides, les inversions et les changements en nombre de copies (gain ou perte d'ADN). Ces derniers varient de petits événements dits sous-microscopiques (moins de 1kb en taille), appelés CNVs (copy number variants) jusqu'à des événements plus large pouvant affecter des chromosomes entiers. Les petites variations sont généralement sans conséquence pour la cellule, toutefois certaines ont été impliquées dans la prédisposition à certaines maladies, et à des variations phénotypiques dans la population générale. Les réarrangements plus grands (par exemple, une copie additionnelle d'un chromosome appelée communément trisomie) ont des répercutions plus grave pour la santé, comme par exemple dans certains syndromes génomiques et dans le cancer. Les technologies à haut-débit telle les puces à ADN permettent la détection de CNVs à l'échelle du génome humain. La cartographie en 2006 des CNV du génome humain, a suscité un fort intérêt en génétique des populations et en génétique médicale. La détection de différences au sein et entre plusieurs populations est un élément clef pour élucider la contribution possible des CNVs dans les maladies. Toutefois l'analyse du génome reste une tâche difficile, la technologie évolue très rapidement créant de nouveaux besoins pour le développement d'outils, l'amélioration des précédents, et la comparaison des différentes méthodes. De plus, si le lien entre CNV et maladie a été établit, leur contribution précise n'est pas encore comprise. De même que les études sur la prédisposition aux maladies par des CNVs détectés dans la population générale n'ont pas encore été réalisées.Pendant mon doctorat, je me suis concentré sur trois axes principaux ayant attrait aux CNV. Dans le chapitre 3, je détaille mes travaux sur les méthodes d'analyses des puces à ADN. J'ai eu accès aux données du projet CoLaus, une étude de la population de Lausanne. Dans cette étude, le génome de plus de 6000 individus a été analysé avec des puces SNP et de nombreuses informations cliniques ont été récoltées. Pendant mes travaux, j'ai utilisé et comparé plusieurs méthodes de détection des CNVs. Les résultats n'étant pas complètement satisfaisant, j'ai implémenté ma propre méthode qui donne de meilleures performances que deux des trois autres méthodes utilisées. Je me suis aussi intéressé aux stratégies pour combiner les CNVs de différents individus en régions.Je me suis aussi intéressé à l'impact clinique des CNVs dans le cas des maladies génétiques communes (chapitre 4). Ce projet fut possible grâce à une étroite collaboration avec le Centre Hospitalier Universitaire Vaudois (CHUV) et l'Impérial College à Londres. Dans ce projet, j'ai été l'un des analystes principaux et j'ai travaillé sur l'impact clinique d'une délétion rare du chromosome 16p11 présente chez des patients atteints d'obésité. Dans cette collaboration multidisciplinaire, nous avons comparés 8'456 patients atteint d'obésité et 11 '856 individus de la population générale. Nous avons trouvés que la délétion était impliquée dans 0.7% des cas d'obésité morbide et était absente chez les contrôles sains (non-atteint d'obésité). Notre étude illustre l'importance des CNVs rares qui peuvent avoir un impact clinique très important. De plus, ceci permet d'envisager une alternative aux études d'associations pour améliorer notre compréhension de l'étiologie des maladies génétiques communes.Egalement, j'ai travaillé sur la détection d'altérations somatiques en nombres de copies (SCNA) et de leurs conséquences pour le cancer (chapitre 5). Ce projet fut une collaboration initiée par l'Institut Ludwig de Recherche contre le Cancer et impliquant l'Institut Suisse de Bioinformatique, le CHUV et les Universités de Lausanne et Genève. Je me suis concentré sur l'identification de gènes affectés par des SCNAs et avec une sur- ou sous-expression dans des lignées cellulaires dérivées de mélanomes métastatiques. Les données utilisées ont été générées par des puces ADN (CGH et SNP) et du séquençage à haut débit du transcriptome. Mes recherches ont montrées que peu de gènes sont récurrents entre les mélanomes, ce qui rend difficile l'interprétation des résultats. Pour contourner ces limitations, j'ai utilisé une analyse de réseaux pour définir si des réseaux de signalisations enrichis en gènes amplifiés ou perdus, étaient communs aux différents échantillons. En fait, parmi les 28 réseaux détectés, quatre réseaux sont potentiellement dérégulés chez six mélanomes, et six réseaux supplémentaires sont affectés chez cinq mélanomes. La validation de ces résultats avec deux larges jeux de données publiques, a confirmée tous ces réseaux sauf trois. Ceci démontre l'utilité de cette approche pour l'analyse de petits et de larges jeux de données.Résumé grand publicL'avènement de la biologie moléculaire, en particulier ces dix dernières années, a révolutionné la recherche en génétique médicale. Grâce à la disponibilité du génome humain de référence dès 2001, de nouvelles technologies telles que les puces à ADN sont apparues et ont permis d'étudier le génome dans son ensemble avec une résolution dite sous-microscopique jusque-là impossible par les techniques traditionnelles de cytogénétique. Un des exemples les plus importants est l'étude des variations structurales du génome, en particulier l'étude du nombre de copies des gènes. Il était établi dès 1959 avec l'identification de la trisomie 21 par le professeur Jérôme Lejeune que le gain d'un chromosome supplémentaire était à l'origine de syndrome génétique avec des répercussions graves pour la santé du patient. Ces observations ont également été réalisées en oncologie sur les cellules cancéreuses qui accumulent fréquemment des aberrations en nombre de copies (telles que la perte ou le gain d'un ou plusieurs chromosomes). Dès 2004, plusieurs groupes de recherches ont répertorié des changements en nombre de copies dans des individus provenant de la population générale (c'est-à-dire sans symptômes cliniques visibles). En 2006, le Dr. Richard Redon a établi la première carte de variation en nombre de copies dans la population générale. Ces découvertes ont démontrées que les variations dans le génome était fréquentes et que la plupart d'entre elles étaient bénignes, c'est-à-dire sans conséquence clinique pour la santé de l'individu. Ceci a suscité un très grand intérêt pour comprendre les variations naturelles entre individus mais aussi pour mieux appréhender la prédisposition génétique à certaines maladies.Lors de ma thèse, j'ai développé de nouveaux outils informatiques pour l'analyse de puces à ADN dans le but de cartographier ces variations à l'échelle génomique. J'ai utilisé ces outils pour établir les variations dans la population suisse et je me suis consacré par la suite à l'étude de facteurs pouvant expliquer la prédisposition aux maladies telles que l'obésité. Cette étude en collaboration avec le Centre Hospitalier Universitaire Vaudois a permis l'identification d'une délétion sur le chromosome 16 expliquant 0.7% des cas d'obésité morbide. Cette étude a plusieurs répercussions. Tout d'abord elle permet d'effectuer le diagnostique chez les enfants à naître afin de déterminer leur prédisposition à l'obésité. Ensuite ce locus implique une vingtaine de gènes. Ceci permet de formuler de nouvelles hypothèses de travail et d'orienter la recherche afin d'améliorer notre compréhension de la maladie et l'espoir de découvrir un nouveau traitement Enfin notre étude fournit une alternative aux études d'association génétique qui n'ont eu jusqu'à présent qu'un succès mitigé.Dans la dernière partie de ma thèse, je me suis intéressé à l'analyse des aberrations en nombre de copies dans le cancer. Mon choix s'est porté sur l'étude de mélanomes, impliqués dans le cancer de la peau. Le mélanome est une tumeur très agressive, elle est responsable de 80% des décès des cancers de la peau et est souvent résistante aux traitements utilisés en oncologie (chimiothérapie, radiothérapie). Dans le cadre d'une collaboration entre l'Institut Ludwig de Recherche contre le Cancer, l'Institut Suisse de Bioinformatique, le CHUV et les universités de Lausanne et Genève, nous avons séquencés l'exome (les gènes) et le transcriptome (l'expression des gènes) de sept mélanomes métastatiques, effectués des analyses du nombre de copies par des puces à ADN et des caryotypes. Mes travaux ont permis le développement de nouvelles méthodes d'analyses adaptées au cancer, d'établir la liste des réseaux de signalisation cellulaire affectés de façon récurrente chez le mélanome et d'identifier deux cibles thérapeutiques potentielles jusqu'alors ignorées dans les cancers de la peau.