129 resultados para computational modelling
Resumo:
This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.
Resumo:
Computational anatomy with magnetic resonance imaging (MRI) is well established as a noninvasive biomarker of Alzheimer's disease (AD); however, there is less certainty about its dependency on the staging of AD. We use classical group analyses and automated machine learning classification of standard structural MRI scans to investigate AD diagnostic accuracy from the preclinical phase to clinical dementia. Longitudinal data from the Alzheimer's Disease Neuroimaging Initiative were stratified into 4 groups according to the clinical status-(1) AD patients; (2) mild cognitive impairment (MCI) converters; (3) MCI nonconverters; and (4) healthy controls-and submitted to a support vector machine. The obtained classifier was significantly above the chance level (62%) for detecting AD already 4 years before conversion from MCI. Voxel-based univariate tests confirmed the plausibility of our findings detecting a distributed network of hippocampal-temporoparietal atrophy in AD patients. We also identified a subgroup of control subjects with brain structure and cognitive changes highly similar to those observed in AD. Our results indicate that computational anatomy can detect AD substantially earlier than suggested by current models. The demonstrated differential spatial pattern of atrophy between correctly and incorrectly classified AD patients challenges the assumption of a uniform pathophysiological process underlying clinically identified AD.
Resumo:
Acid-sensing ion channels (ASICs) are key receptors for extracellular protons. These neuronal nonvoltage-gated Na(+) channels are involved in learning, the expression of fear, neurodegeneration after ischemia, and pain sensation. We have applied a systematic approach to identify potential pH sensors in ASIC1a and to elucidate the mechanisms by which pH variations govern ASIC gating. We first calculated the pK(a) value of all extracellular His, Glu, and Asp residues using a Poisson-Boltzmann continuum approach, based on the ASIC three-dimensional structure, to identify candidate pH-sensing residues. The role of these residues was then assessed by site-directed mutagenesis and chemical modification, combined with functional analysis. The localization of putative pH-sensing residues suggests that pH changes control ASIC gating by protonation/deprotonation of many residues per subunit in different channel domains. Analysis of the function of residues in the palm domain close to the central vertical axis of the channel allowed for prediction of conformational changes of this region during gating. Our study provides a basis for the intrinsic ASIC pH dependence and describes an approach that can also be applied to the investigation of the mechanisms of the pH dependence of other proteins.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Background: Bone health is a concern when treating early stage breast cancer patients with adjuvant aromatase inhibitors. Early detection of patients (pts) at risk of osteoporosis and fractures may be helpful for starting preventive therapies and selecting the most appropriate endocrine therapy schedule. We present statistical models describing the evolution of lumbar and hip bone mineral density (BMD) in pts treated with tamoxifen (T), letrozole (L) and sequences of T and L. Methods: Available dual-energy x-ray absorptiometry exams (DXA) of pts treated in trial BIG 1-98 were retrospectively collected from Swiss centers. Treatment arms: A) T for 5 years, B) L for 5 years, C) 2 years of T followed by 3 years of L and, D) 2 years of L followed by 3 years of T. Pts without DXA were used as a control for detecting selection biases. Patients randomized to arm A were subsequently allowed an unplanned switch from T to L. Allowing for variations between DXA machines and centres, two repeated measures models, using a covariance structure that allow for different times between DXA, were used to estimate changes in hip and lumbar BMD (g/cm2) from trial randomization. Prospectively defined covariates, considered as fixed effects in the multivariable models in an intention to treat analysis, at the time of trial randomization were: age, height, weight, hysterectomy, race, known osteoporosis, tobacco use, prior bone fracture, prior hormone replacement therapy (HRT), bisphosphonate use and previous neo-/adjuvant chemotherapy (ChT). Similarly, the T-scores for lumbar and hip BMD measurements were modeled using a per-protocol approach (allowing for treatment switch in arm A), specifically studying the effect of each therapy upon T-score percentage. Results: A total of 247 out of 546 pts had between 1 and 5 DXA; a total of 576 DXA were collected. Number of DXA measurements per arm were; arm A 133, B 137, C 141 and D 135. The median follow-up time was 5.8 years. Significant factors positively correlated with lumbar and hip BMD in the multivariate analysis were weight, previous HRT use, neo-/adjuvant ChT, hysterectomy and height. Significant negatively correlated factors in the models were osteoporosis, treatment arm (B/C/D vs. A), time since endocrine therapy start, age and smoking (current vs. never).Modeling the T-score percentage, differences from T to L were -4.199% (p = 0.036) and -4.907% (p = 0.025) for the hip and lumbar measurements respectively, before any treatment switch occurred. Conclusions: Our statistical models describe the lumbar and hip BMD evolution for pts treated with L and/or T. The results of both localisations confirm that, contrary to expectation, the sequential schedules do not seem less detrimental for the BMD than L monotherapy. The estimated difference in BMD T-score percent is at least 4% from T to L.
Resumo:
A haplotype is an m-long binary vector. The XOR-genotype of two haplotypes is the m-vector of their coordinate-wise XOR. We study the following problem: Given a set of XOR-genotypes, reconstruct their haplotypes so that the set of resulting haplotypes can be mapped onto a perfect phylogeny (PP) tree. The question is motivated by studying population evolution in human genetics and is a variant of the PP haplotyping problem that has received intensive attention recently. Unlike the latter problem, in which the input is '' full '' genotypes, here, we assume less informative input and so may be more economical to obtain experimentally. Building on ideas of Gusfield, we show how to solve the problem in polynomial time by a reduction to the graph realization problem. The actual haplotypes are not uniquely determined by the tree they map onto and the tree itself may or may not be unique. We show that tree uniqueness implies uniquely determined haplotypes, up to inherent degrees of freedom, and give a sufficient condition for the uniqueness. To actually determine the haplotypes given the tree, additional information is necessary. We show that two or three full genotypes suffice to reconstruct all the haplotypes and present a linear algorithm for identifying those genotypes.
Resumo:
Knowledge about spatial biodiversity patterns is a basic criterion for reserve network design. Although herbarium collections hold large quantities of information, the data are often scattered and cannot supply complete spatial coverage. Alternatively, herbarium data can be used to fit species distribution models and their predictions can be used to provide complete spatial coverage and derive species richness maps. Here, we build on previous effort to propose an improved compositionalist framework for using species distribution models to better inform conservation management. We illustrate the approach with models fitted with six different methods and combined using an ensemble approach for 408 plant species in a tropical and megadiverse country (Ecuador). As a complementary view to the traditional richness hotspots methodology, consisting of a simple stacking of species distribution maps, the compositionalist modelling approach used here combines separate predictions for different pools of species to identify areas of alternative suitability for conservation. Our results show that the compositionalist approach better captures the established protected areas than the traditional richness hotspots strategies and allows the identification of areas in Ecuador that would optimally complement the current protection network. Further studies should aim at refining the approach with more groups and additional species information.
Resumo:
Sustainable resource use is one of the most important environmental issues of our times. It is closely related to discussions on the 'peaking' of various natural resources serving as energy sources, agricultural nutrients, or metals indispensable in high-technology applications. Although the peaking theory remains controversial, it is commonly recognized that a more sustainable use of resources would alleviate negative environmental impacts related to resource use. In this thesis, sustainable resource use is analysed from a practical standpoint, through several different case studies. Four of these case studies relate to resource metabolism in the Canton of Geneva in Switzerland: the aim was to model the evolution of chosen resource stocks and flows in the coming decades. The studied resources were copper (a bulk metal), phosphorus (a vital agricultural nutrient), and wood (a renewable resource). In addition, the case of lithium (a critical metal) was analysed briefly in a qualitative manner and in an electric mobility perspective. In addition to the Geneva case studies, this thesis includes a case study on the sustainability of space life support systems. Space life support systems are systems whose aim is to provide the crew of a spacecraft with the necessary metabolic consumables over the course of a mission. Sustainability was again analysed from a resource use perspective. In this case study, the functioning of two different types of life support systems, ARES and BIORAT, were evaluated and compared; these systems represent, respectively, physico-chemical and biological life support systems. Space life support systems could in fact be used as a kind of 'laboratory of sustainability' given that they represent closed and relatively simple systems compared to complex and open terrestrial systems such as the Canton of Geneva. The chosen analysis method used in the Geneva case studies was dynamic material flow analysis: dynamic material flow models were constructed for the resources copper, phosphorus, and wood. Besides a baseline scenario, various alternative scenarios (notably involving increased recycling) were also examined. In the case of space life support systems, the methodology of material flow analysis was also employed, but as the data available on the dynamic behaviour of the systems was insufficient, only static simulations could be performed. The results of the case studies in the Canton of Geneva show the following: were resource use to follow population growth, resource consumption would be multiplied by nearly 1.2 by 2030 and by 1.5 by 2080. A complete transition to electric mobility would be expected to only slightly (+5%) increase the copper consumption per capita while the lithium demand in cars would increase 350 fold. For example, phosphorus imports could be decreased by recycling sewage sludge or human urine; however, the health and environmental impacts of these options have yet to be studied. Increasing the wood production in the Canton would not significantly decrease the dependence on wood imports as the Canton's production represents only 5% of total consumption. In the comparison of space life support systems ARES and BIORAT, BIORAT outperforms ARES in resource use but not in energy use. However, as the systems are dimensioned very differently, it remains questionable whether they can be compared outright. In conclusion, the use of dynamic material flow analysis can provide useful information for policy makers and strategic decision-making; however, uncertainty in reference data greatly influences the precision of the results. Space life support systems constitute an extreme case of resource-using systems; nevertheless, it is not clear how their example could be of immediate use to terrestrial systems.
Resumo:
AbstractAlthough the genomes from any two human individuals are more than 99.99% identical at the sequence level, some structural variation can be observed. Differences between genomes include single nucleotide polymorphism (SNP), inversion and copy number changes (gain or loss of DNA). The latter can range from submicroscopic events (CNVs, at least 1kb in size) to complete chromosomal aneuploidies. Small copy number variations have often no (lethal) consequences to the cell, but a few were associated to disease susceptibility and phenotypic variations. Larger re-arrangements (i.e. complete chromosome gain) are frequently associated with more severe consequences on health such as genomic disorders and cancer. High-throughput technologies like DNA microarrays enable the detection of CNVs in a genome-wide fashion. Since the initial catalogue of CNVs in the human genome in 2006, there has been tremendous interest in CNVs both in the context of population and medical genetics. Understanding CNV patterns within and between human populations is essential to elucidate their possible contribution to disease. But genome analysis is a challenging task; the technology evolves rapidly creating needs for novel, efficient and robust analytical tools which need to be compared with existing ones. Also, while the link between CNV and disease has been established, the relative CNV contribution is not fully understood and the predisposition to disease from CNVs of the general population has not been yet investigated.During my PhD thesis, I worked on several aspects related to CNVs. As l will report in chapter 3, ! was interested in computational methods to detect CNVs from the general population. I had access to the CoLaus dataset, a population-based study with more than 6,000 participants from the Lausanne area. All these individuals were analysed on SNP arrays and extensive clinical information were available. My work explored existing CNV detection methods and I developed a variety of metrics to compare their performance. Since these methods were not producing entirely satisfactory results, I implemented my own method which outperformed two existing methods. I also devised strategies to combine CNVs from different individuals into CNV regions.I was also interested in the clinical impact of CNVs in common disease (chapter 4). Through an international collaboration led by the Centre Hospitalier Universitaire Vaudois (CHUV) and the Imperial College London I was involved as a main data analyst in the investigation of a rare deletion at chromosome 16p11 detected in obese patients. Specifically, we compared 8,456 obese patients and 11,856 individuals from the general population and we found that the deletion was accounting for 0.7% of the morbid obesity cases and was absent in healthy non- obese controls. This highlights the importance of rare variants with strong impact and provides new insights in the design of clinical studies to identify the missing heritability in common disease.Furthermore, I was interested in the detection of somatic copy number alterations (SCNA) and their consequences in cancer (chapter 5). This project was a collaboration initiated by the Ludwig Institute for Cancer Research and involved other groups from the Swiss Institute of Bioinformatics, the CHUV and Universities of Lausanne and Geneva. The focus of my work was to identify genes with altered expression levels within somatic copy number alterations (SCNA) in seven metastatic melanoma ceil lines, using CGH and SNP arrays, RNA-seq, and karyotyping. Very few SCNA genes were shared by even two melanoma samples making it difficult to draw any conclusions at the individual gene level. To overcome this limitation, I used a network-guided analysis to determine whether any pathways, defined by amplified or deleted genes, were common among the samples. Six of the melanoma samples were potentially altered in four pathways and five samples harboured copy-number and expression changes in components of six pathways. In total, this approach identified 28 pathways. Validation with two external, large melanoma datasets confirmed all but three of the detected pathways and demonstrated the utility of network-guided approaches for both large and small datasets analysis.RésuméBien que le génome de deux individus soit similaire à plus de 99.99%, des différences de structure peuvent être observées. Ces différences incluent les polymorphismes simples de nucléotides, les inversions et les changements en nombre de copies (gain ou perte d'ADN). Ces derniers varient de petits événements dits sous-microscopiques (moins de 1kb en taille), appelés CNVs (copy number variants) jusqu'à des événements plus large pouvant affecter des chromosomes entiers. Les petites variations sont généralement sans conséquence pour la cellule, toutefois certaines ont été impliquées dans la prédisposition à certaines maladies, et à des variations phénotypiques dans la population générale. Les réarrangements plus grands (par exemple, une copie additionnelle d'un chromosome appelée communément trisomie) ont des répercutions plus grave pour la santé, comme par exemple dans certains syndromes génomiques et dans le cancer. Les technologies à haut-débit telle les puces à ADN permettent la détection de CNVs à l'échelle du génome humain. La cartographie en 2006 des CNV du génome humain, a suscité un fort intérêt en génétique des populations et en génétique médicale. La détection de différences au sein et entre plusieurs populations est un élément clef pour élucider la contribution possible des CNVs dans les maladies. Toutefois l'analyse du génome reste une tâche difficile, la technologie évolue très rapidement créant de nouveaux besoins pour le développement d'outils, l'amélioration des précédents, et la comparaison des différentes méthodes. De plus, si le lien entre CNV et maladie a été établit, leur contribution précise n'est pas encore comprise. De même que les études sur la prédisposition aux maladies par des CNVs détectés dans la population générale n'ont pas encore été réalisées.Pendant mon doctorat, je me suis concentré sur trois axes principaux ayant attrait aux CNV. Dans le chapitre 3, je détaille mes travaux sur les méthodes d'analyses des puces à ADN. J'ai eu accès aux données du projet CoLaus, une étude de la population de Lausanne. Dans cette étude, le génome de plus de 6000 individus a été analysé avec des puces SNP et de nombreuses informations cliniques ont été récoltées. Pendant mes travaux, j'ai utilisé et comparé plusieurs méthodes de détection des CNVs. Les résultats n'étant pas complètement satisfaisant, j'ai implémenté ma propre méthode qui donne de meilleures performances que deux des trois autres méthodes utilisées. Je me suis aussi intéressé aux stratégies pour combiner les CNVs de différents individus en régions.Je me suis aussi intéressé à l'impact clinique des CNVs dans le cas des maladies génétiques communes (chapitre 4). Ce projet fut possible grâce à une étroite collaboration avec le Centre Hospitalier Universitaire Vaudois (CHUV) et l'Impérial College à Londres. Dans ce projet, j'ai été l'un des analystes principaux et j'ai travaillé sur l'impact clinique d'une délétion rare du chromosome 16p11 présente chez des patients atteints d'obésité. Dans cette collaboration multidisciplinaire, nous avons comparés 8'456 patients atteint d'obésité et 11 '856 individus de la population générale. Nous avons trouvés que la délétion était impliquée dans 0.7% des cas d'obésité morbide et était absente chez les contrôles sains (non-atteint d'obésité). Notre étude illustre l'importance des CNVs rares qui peuvent avoir un impact clinique très important. De plus, ceci permet d'envisager une alternative aux études d'associations pour améliorer notre compréhension de l'étiologie des maladies génétiques communes.Egalement, j'ai travaillé sur la détection d'altérations somatiques en nombres de copies (SCNA) et de leurs conséquences pour le cancer (chapitre 5). Ce projet fut une collaboration initiée par l'Institut Ludwig de Recherche contre le Cancer et impliquant l'Institut Suisse de Bioinformatique, le CHUV et les Universités de Lausanne et Genève. Je me suis concentré sur l'identification de gènes affectés par des SCNAs et avec une sur- ou sous-expression dans des lignées cellulaires dérivées de mélanomes métastatiques. Les données utilisées ont été générées par des puces ADN (CGH et SNP) et du séquençage à haut débit du transcriptome. Mes recherches ont montrées que peu de gènes sont récurrents entre les mélanomes, ce qui rend difficile l'interprétation des résultats. Pour contourner ces limitations, j'ai utilisé une analyse de réseaux pour définir si des réseaux de signalisations enrichis en gènes amplifiés ou perdus, étaient communs aux différents échantillons. En fait, parmi les 28 réseaux détectés, quatre réseaux sont potentiellement dérégulés chez six mélanomes, et six réseaux supplémentaires sont affectés chez cinq mélanomes. La validation de ces résultats avec deux larges jeux de données publiques, a confirmée tous ces réseaux sauf trois. Ceci démontre l'utilité de cette approche pour l'analyse de petits et de larges jeux de données.Résumé grand publicL'avènement de la biologie moléculaire, en particulier ces dix dernières années, a révolutionné la recherche en génétique médicale. Grâce à la disponibilité du génome humain de référence dès 2001, de nouvelles technologies telles que les puces à ADN sont apparues et ont permis d'étudier le génome dans son ensemble avec une résolution dite sous-microscopique jusque-là impossible par les techniques traditionnelles de cytogénétique. Un des exemples les plus importants est l'étude des variations structurales du génome, en particulier l'étude du nombre de copies des gènes. Il était établi dès 1959 avec l'identification de la trisomie 21 par le professeur Jérôme Lejeune que le gain d'un chromosome supplémentaire était à l'origine de syndrome génétique avec des répercussions graves pour la santé du patient. Ces observations ont également été réalisées en oncologie sur les cellules cancéreuses qui accumulent fréquemment des aberrations en nombre de copies (telles que la perte ou le gain d'un ou plusieurs chromosomes). Dès 2004, plusieurs groupes de recherches ont répertorié des changements en nombre de copies dans des individus provenant de la population générale (c'est-à-dire sans symptômes cliniques visibles). En 2006, le Dr. Richard Redon a établi la première carte de variation en nombre de copies dans la population générale. Ces découvertes ont démontrées que les variations dans le génome était fréquentes et que la plupart d'entre elles étaient bénignes, c'est-à-dire sans conséquence clinique pour la santé de l'individu. Ceci a suscité un très grand intérêt pour comprendre les variations naturelles entre individus mais aussi pour mieux appréhender la prédisposition génétique à certaines maladies.Lors de ma thèse, j'ai développé de nouveaux outils informatiques pour l'analyse de puces à ADN dans le but de cartographier ces variations à l'échelle génomique. J'ai utilisé ces outils pour établir les variations dans la population suisse et je me suis consacré par la suite à l'étude de facteurs pouvant expliquer la prédisposition aux maladies telles que l'obésité. Cette étude en collaboration avec le Centre Hospitalier Universitaire Vaudois a permis l'identification d'une délétion sur le chromosome 16 expliquant 0.7% des cas d'obésité morbide. Cette étude a plusieurs répercussions. Tout d'abord elle permet d'effectuer le diagnostique chez les enfants à naître afin de déterminer leur prédisposition à l'obésité. Ensuite ce locus implique une vingtaine de gènes. Ceci permet de formuler de nouvelles hypothèses de travail et d'orienter la recherche afin d'améliorer notre compréhension de la maladie et l'espoir de découvrir un nouveau traitement Enfin notre étude fournit une alternative aux études d'association génétique qui n'ont eu jusqu'à présent qu'un succès mitigé.Dans la dernière partie de ma thèse, je me suis intéressé à l'analyse des aberrations en nombre de copies dans le cancer. Mon choix s'est porté sur l'étude de mélanomes, impliqués dans le cancer de la peau. Le mélanome est une tumeur très agressive, elle est responsable de 80% des décès des cancers de la peau et est souvent résistante aux traitements utilisés en oncologie (chimiothérapie, radiothérapie). Dans le cadre d'une collaboration entre l'Institut Ludwig de Recherche contre le Cancer, l'Institut Suisse de Bioinformatique, le CHUV et les universités de Lausanne et Genève, nous avons séquencés l'exome (les gènes) et le transcriptome (l'expression des gènes) de sept mélanomes métastatiques, effectués des analyses du nombre de copies par des puces à ADN et des caryotypes. Mes travaux ont permis le développement de nouvelles méthodes d'analyses adaptées au cancer, d'établir la liste des réseaux de signalisation cellulaire affectés de façon récurrente chez le mélanome et d'identifier deux cibles thérapeutiques potentielles jusqu'alors ignorées dans les cancers de la peau.
Resumo:
We present a novel numerical algorithm for the simulation of seismic wave propagation in porous media, which is particularly suitable for the accurate modelling of surface wave-type phenomena. The differential equations of motion are based on Biot's theory of poro-elasticity and solved with a pseudospectral approach using Fourier and Chebyshev methods to compute the spatial derivatives along the horizontal and vertical directions, respectively. The time solver is a splitting algorithm that accounts for the stiffness of the differential equations. Due to the Chebyshev operator the grid spacing in the vertical direction is non-uniform and characterized by a denser spatial sampling in the vicinity of interfaces, which allows for a numerically stable and accurate evaluation of higher order surface wave modes. We stretch the grid in the vertical direction to increase the minimum grid spacing and reduce the computational cost. The free-surface boundary conditions are implemented with a characteristics approach, where the characteristic variables are evaluated at zero viscosity. The same procedure is used to model seismic wave propagation at the interface between a fluid and porous medium. In this case, each medium is represented by a different grid and the two grids are combined through a domain-decomposition method. This wavefield decomposition method accounts for the discontinuity of variables and is crucial for an accurate interface treatment. We simulate seismic wave propagation with open-pore and sealed-pore boundary conditions and verify the validity and accuracy of the algorithm by comparing the numerical simulations to analytical solutions based on zero viscosity obtained with the Cagniard-de Hoop method. Finally, we illustrate the suitability of our algorithm for more complex models of porous media involving viscous pore fluids and strongly heterogeneous distributions of the elastic and hydraulic material properties.
Resumo:
Using numerical simulations of pairs of long polymeric chains confined in microscopic cylinders, we investigate consequences of double-strand DNA breaks occurring in independent topological domains, such as these constituting bacterial chromosomes. Our simulations show a transition between segregated and mixed state upon linearization of one of the modelled topological domains. Our results explain how chromosomal organization into topological domains can fulfil two opposite conditions: (i) effectively repulse various loops from each other thus promoting chromosome separation and (ii) permit local DNA intermingling when one or more loops are broken and need to be repaired in a process that requires homology search between broken ends and their homologous sequences in closely positioned sister chromatid.