195 resultados para Statistical Genetics
Resumo:
Summary : During the evolutionary diversification of organisms, similar ecological constraints led to the recurrent appearances of the same traits (phenotypes) in distant lineages, a phenomenon called convergence. In most cases, the genetic origins of the convergent traits remain unknown, but recent studies traced the convergent phenotypes to recurrent alterations of the same gene or, in a few cases, to identical genetic changes. However, these cases remain anecdotal and there is a need for a study system that evolved several times independently and whose genetic determinism is well resolved and straightforward, such as C4 photosynthesis. This adaptation to warm environments, possibly driven by past atmospheric CO2 decreases, consists in a CO2-concentrating pump, created by numerous morphological and biochemical novelties. All genes encoding C4 enzymes already existed in C3 ancestors, and are supposed to have been recruited through gene duplication followed by neo-functionalization, to acquire the cell specific expression pattern and altered kinetic properties that characterize Ca-specific enzymes. These predictions have so far been tested only in species-poor and ecologically marginal C4 dicots. The monocots, and especially the grass family (Poaceae), the most important C4 family in terms of species number, ecological dominance and economical importance, have been largely under-considered as suitable study systems. This thesis aimed at understanding the evolution of the C4 trait in grasses at a molecular level and to use the genetics of C4 photosynthesis to infer the evolutionary history of the C4 phenotype and its driving selective pressures. A molecular phylogeny of grasses and affiliated monocots identified 17 to 18 independent acquisitions of the C4 pathway in the grass family. A relaxed molecular clock was used to date these events and the first C4 evolution was estimated in the Chloridoideae subfamily, between 32-25 million years ago, at a period when atmospheric CO2 abruptly declined. Likelihood models showed that after the COZ decline the probability of evolving the C4 pathway strongly increased, confirming low CO2 as a likely driver of C4 photosynthesis evolution. In order to depict the genetic changes linked to the numerous C4 origins, genes encoding phopshoenolpyruvate carboxylase (PEPC), the key-enzyme responsible for the initial fixation of atmospheric CO2 in the C4 pathway, were isolated from a large sample of C3 and C4 grasses. Phylogenetic analyses were used to reconstruct the evolutionary history of the PEPC multigene family and showed that the evolution of C4-specific PEPC had been driven by positive selection on 21 codons simultaneously in up to eight C4 lineages. These selective pressures led to numerous convergent genetic changes in many different C4 clades, highlighting the repeatability of some evolutionary processes, even at the molecular level. PEPC C4-adaptive changes were traced and used to show multiple appearances of the C, pathway in clades where species tree inferences were unable to differentiate multiple C4 appearances and a single appearance followed by C4 to C3 reversion. Further investigations of genes involved in some of the C4 subtypes only (genes encoding decarboxylating enzymes NADP-malic enzyme and phosphoenolpyruvate carboxykinase) showed that these C4-enzymes also evolved through strong positive selection and underwent parallel genetic changes during the different Ca origins. The adaptive changes on these subtype-specific C4 genes were used to retrace the history of the C4-subtypes phenotypes, which revealed that the evolution of C4-PEPC and C4-decarboxylating enzymes was in several cases disconnected, emphasizing the multiplicity of the C4 trait and the gradual acquisition of the features that create the CO2-pump. Finally, phylogenetic analyses of a gene encoding the Rubisco (the enzyme responsible for the fixation of CO2 into organic compounds in all photosynthetic organisms) showed that C4 evolution switched the selective pressures on this gene. Five codons were recurrently mutated to adapt the enzyme kinetics to the high CO2 concentrations of C4 photosynthetic cells. This knowledge could be used to introgress C4-like Rubisco in C3 crops, which could lead to an increased yield under predicted future high CO2 atmosphere. Globally, the phylogenetic framework adopted during this thesis demonstrated the widespread occurrence of genetic convergence on C4-related enzymes. The genetic traces of C4 photosynthesis evolution allowed reconstructing events that happened during the last 30 million years and proved the usefulness of studying genes directly responsible for phenotype variations when inferring evolutionary history of a given trait. Résumé Durant la diversification évolutive des organismes, des pressions écologiques similaires ont amené à l'apparition récurrente de certains traits (phénotypes) dans des lignées distantes, un phénomène appelé évolution convergente. Dans la plupart des cas, l'origine génétique des traits convergents reste inconnue mais des études récentes ont montré qu'ils étaient dus dans certains cas à des changements répétés du même gène ou, dans de rares cas, à des changements génétiques identiques. Malgré tout, ces cas restent anecdotiques et il y a un réel besoin d'un système d'étude qui ait évolué indépendamment de nombreuses fois et dont le déterminisme génétique soit clairement identifié. La photosynthèse dite en Ça répond à ces critères. Cette adaptation aux environnements chauds, dont l'évolution a pu être encouragé par des baisses passées de la concentration atmosphérique en CO2, est constituée de nombreuses nouveautés morphologiques et biochimiques qui créent une pompe à CO2. La totalité des gènes codant les enzymes Ç4 étaient déjà présents dans les ancêtres C3. Leur recrutement pour la photosynthèse Ç4 est supposé s'être fait par le biais de duplications géniques suivies par une néo-fonctionnalisation pour leur conférer l'expression cellule-spécifique et les propriétés cinétiques qui caractérisent les enzymes C4. Ces prédictions n'ont jusqu'à présent été testées que dans des familles C4 contenant peu d'espèces et ayant un rôle écologique marginal. Les graminées (Poaceae), qui sont la famille C4 la plus importante, tant en termes de nombre d'espèces que de dominance écologique et d'importance économique, ont toujours été considérés comme un système d'étude peu adapté et ont fait le sujet de peu d'investigations évolutives. Le but de cette thèse était de comprendre l'évolution de la photosynthèse en C4 chez les graminées au niveau génétique et d'utiliser les gènes pour inférer l'évolution du phénotype C4 ainsi que les pressions de sélection responsables de son évolution. Une phylogénie moléculaire de la famille des graminées et des monocotylédones apparentés a identifié 17 à 18 acquisitions indépendantes de la photosynthèse chez les graminées. Grâce à une méthode d'horloge moléculaire relâchée, ces évènements ont été datés et la première apparition C4 a été estimée dans la sous-famille des Chloridoideae, il y a 32 à 25 millions d'années, à une période où les concentrations atmosphériques de CO2 ont décliné abruptement. Des modèles de maximum de vraisemblance ont montré qu'à la suite du déclin de CO2, la probabilité d'évoluer la photosynthèse C4 a fortement augmenté, confirmant ainsi qu'une faible concentration de CO2 est une cause potentielle de l'évolution de la photosynthèse C4. Afin d'identifier les mécanismes génétiques responsables des évolutions répétées de la photosynthèse C4, un segment des gènes codant pour la phosphoénolpyruvate carboxylase (PEPC), l'enzyme responsable de la fixation initiale du CO2 atmosphérique chez les plantes C4, ont été séquencés dans une centaine de graminées C3 et C4. Des analyses phylogénétiques ont permis de reconstituer l'histoire évolutive de la famille multigénique des PEPC et ont montré que l'évolution de PEPC spécifiques à la photosynthèse Ça a été causée par de la sélection positive agissant sur 21 codons, et ce simultanément dans huit lignées C4 différentes. Cette sélection positive a conduit à un grand nombre de changements génétiques convergents dans de nombreux clades différents, ce qui illustre la répétabilité de certains phénomènes évolutifs, et ce même au niveau génétique. Les changements sur la PEPC liés au C4 ont été utilisés pour confirmer des évolutions indépendantes du phénotype C4 dans des clades où l'arbre des espèces était incapable de différencier des apparitions indépendantes d'une seule apparition suivie par une réversion de C4 en C3. En considérant des gènes codant des protéines impliquées uniquement dans certains sous-types C4 (deux décarboxylases, l'enzyme malique à NADP et la phosphoénolpyruvate carboxykinase), des études ultérieures ont montré que ces enzymes C4 avaient elles-aussi évolué sous forte sélection positive et subi des changements génétiques parallèles lors des différentes origines de la photosynthèse C4. Les changements adaptatifs sur ces gènes liés seulement à certains sous-types C4 ont été utilisés pour retracer l'histoire des phénotypes de sous-types C4, ce qui a révélé que les caractères formant le trait C4 ont, dans certains cas, évolué de manière déconnectée. Ceci souligne la multiplicité du trait C4 et l'acquisition graduelle de composants participant à la pompe à CO2 qu'est la photosynthèse C4. Finalement, des analyses phylogénétiques des gènes codant pour la Rubisco (l'enzyme responsable de la fixation du CO2 en carbones organiques dans tous les organismes photosynthétiques) ont montré que l'évolution de la photosynthèse Ça a changé les pressions de sélection sur ce gène. Cinq codons ont été mutés de façon répétée afin d'adapter les propriétés cinétiques de la Rubisco aux fortes concentrations de CO2 présentes dans les cellules photosynthétiques des plantes C4. Globalement, l'approche phylogénétique adoptée durant cette thèse de doctorat a permis de démontré des phénomène fréquents de convergence génétique sur les enzymes liées à la photosynthèse C4. Les traces génétiques de l'évolution de la photosynthèse C4 ont permis de reconstituer des évènements qui se sont produits durant les derniers 30 millions d'années et ont prouvé l'utilité d'étudier des gènes directement responsables des variations phénotypiques pour inférer l'histoire évolutive d'un trait donné.
Resumo:
The ATP-binding cassette (ABC) family of proteins comprise a group of membrane transporters involved in the transport of a wide variety of compounds, such as xenobiotics, vitamins, lipids, amino acids, and carbohydrates. Determining their regional expression patterns along the intestinal tract will further characterize their transport functions in the gut. The mRNA expression levels of murine ABC transporters in the duodenum, jejunum, ileum, and colon were examined using the Affymetrix MuU74v2 GeneChip set. Eight ABC transporters (Abcb2, Abcb3, Abcb9, Abcc3, Abcc6, Abcd1, Abcg5, and Abcg8) displayed significant differential gene expression along the intestinal tract, as determined by two statistical models (a global error assessment model and a classic ANOVA, both with a P < 0.01). Concordance with semiquantitative real-time PCR was high. Analyzing the promoters of the differentially expressed ABC transporters did not identify common transcriptional motifs between family members or with other genes; however, the expression profile for Abcb9 was highly correlated with fibulin-1, and both genes share a common complex promoter model involving the NFkappaB, zinc binding protein factor (ZBPF), GC-box factors SP1/GC (SP1F), and early growth response factor (EGRF) transcription binding motifs. The cellular location of another of the differentially expressed ABC transporters, Abcc3, was examined by immunohistochemistry. Staining revealed that the protein is consistently expressed in the basolateral compartment of enterocytes along the anterior-posterior axis of the intestine. Furthermore, the intensity of the staining pattern is concordant with the expression profile. This agrees with previous findings in which the mRNA, protein, and transport function of Abcc3 were increased in the rat distal intestine. These data reveal regional differences in gene expression profiles along the intestinal tract and demonstrate that a complete understanding of intestinal ABC transporter function can only be achieved by examining the physiologically distinct regions of the gut.
Resumo:
Pearson correlation coefficients were applied for the objective comparison of 30 black gel pen inks analysed by laser desorption ionization mass spectrometry (LDI-MS). The mass spectra were obtained for ink lines directly on paper using positive and negative ion modes at several laser intensities. This methodology has the advantage of taking into account the reproducibility of the results as well as the variability between spectra of different pens. A differentiation threshold could thus be selected in order to avoid the risk of false differentiation. Combining results from positive and negative mode yielded a discriminating power up to 85%, which was better than the one obtained previously with other optical comparison methodologies. The technique also allowed discriminating between pens from the same brand.
Resumo:
Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.
Resumo:
Accurate detection of subpopulation size determinations in bimodal populations remains problematic yet it represents a powerful way by which cellular heterogeneity under different environmental conditions can be compared. So far, most studies have relied on qualitative descriptions of population distribution patterns, on population-independent descriptors, or on arbitrary placement of thresholds distinguishing biological ON from OFF states. We found that all these methods fall short of accurately describing small population sizes in bimodal populations. Here we propose a simple, statistics-based method for the analysis of small subpopulation sizes for use in the free software environment R and test this method on real as well as simulated data. Four so-called population splitting methods were designed with different algorithms that can estimate subpopulation sizes from bimodal populations. All four methods proved more precise than previously used methods when analyzing subpopulation sizes of transfer competent cells arising in populations of the bacterium Pseudomonas knackmussii B13. The methods' resolving powers were further explored by bootstrapping and simulations. Two of the methods were not severely limited by the proportions of subpopulations they could estimate correctly, but the two others only allowed accurate subpopulation quantification when this amounted to less than 25% of the total population. In contrast, only one method was still sufficiently accurate with subpopulations smaller than 1% of the total population. This study proposes a number of rational approximations to quantifying small subpopulations and offers an easy-to-use protocol for their implementation in the open source statistical software environment R.
Resumo:
The consequences of variable rates of clonal reproduction on the population genetics of neutral markers are explored in diploid organisms within a subdivided population (island model). We use both analytical and stochastic simulation approaches. High rates of clonal reproduction will positively affect heterozygosity. As a consequence, nearly twice as many alleles per locus can be maintained and population differentiation estimated as F(ST) value is strongly decreased in purely clonal populations as compared to purely sexual ones. With increasing clonal reproduction, effective population size first slowly increases and then points toward extreme values when the reproductive system tends toward strict clonality. This reflects the fact that polymorphism is protected within individuals due to fixed heterozygosity. Contrarily, genotypic diversity smoothly decreases with increasing rates of clonal reproduction. Asexual populations thus maintain higher genetic diversity at each single locus but a lower number of different genotypes. Mixed clonal/sexual reproduction is nearly indistinguishable from strict sexual reproduction as long as the proportion of clonal reproduction is not strongly predominant for all quantities investigated, except for genotypic diversities (both at individual loci and over multiple loci).
Resumo:
In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).
Resumo:
The discovery of genes implicated in familial forms of Parkinson's disease (PD) has provided new insights into the molecular events leading to neurodegeneration. Clinically, patients with genetically determined PD can be difficult to distinguish from those with sporadic PD. Monogenic causes include autosomal dominantly (SNCA, LRRK2, VPS35, EIF4G1) as well as recessively (PARK2, PINK1, DJ-1) inherited mutations. Additional recessive forms of parkinsonism present with atypical signs, including very early disease onset, dystonia, dementia and pyramidal signs. New techniques in the search for phenotype-associated genes (next-generation sequencing, genome-wide association studies) have expanded the spectrum of both monogenic PD and variants that alter risk to develop PD. Examples of risk genes include the two lysosomal enzyme coding genes GBA and SMPD1, which are associated with a 5-fold and 9-fold increased risk of PD, respectively. It is hoped that further knowledge of the genetic makeup of PD will allow designing treatments that alter the course of the disease.
Resumo:
OBJECTIVE: To set-up an international cohort of patients suspected with Behçet's disease (BD). The cohort is aimed at defining an algorithm for definition of the disease in children. METHODS: International experts have defined the inclusion criteria as follows: recurrent oral aphthosis (ROA) plus one of following-genital ulceration, erythema nodosum, folliculitis, pustulous/acneiform lesions, positive pathergy test, uveitis, venous/arterial thrombosis and family history of BD. Onset of disease is <16 years, disease duration is ≤3 years, future follow-up duration is ≥4 years and informed consent is obtained. The expert committee has classified the included patients into: definite paediatric BD (PED-BD), probable PED-BD and no PED-BD. Statistical analysis is performed to compare the three groups of patients. Centres document their patients into a single database. RESULTS: At January 2010, 110 patients (56 males/54 females) have been included. Mean age at first symptom: 8.1 years (median 8.2 years). At inclusion, 38% had only one symptom associated with ROA, 31% had two and 31% had three or more symptoms. A total of 106 first evaluations have been done. Seventeen patients underwent the first-year evaluation, and 36 had no new symptoms, 12 had one and 9 had two. Experts have examined 48 files and classified 30 as definite and 18 as probable. Twenty-six patients classified as definite fulfilled the International Study Group criteria. Seventeen patients classified as probable did not meet the international criteria. CONCLUSION: The expert committee has classified the majority of patients in the BD group although they presented with few symptoms independently of BD classification criteria.
Resumo:
The aim of this research was to evaluate how fingerprint analysts would incorporate information from newly developed tools into their decision making processes. Specifically, we assessed effects using the following: (1) a quality tool to aid in the assessment of the clarity of the friction ridge details, (2) a statistical tool to provide likelihood ratios representing the strength of the corresponding features between compared fingerprints, and (3) consensus information from a group of trained fingerprint experts. The measured variables for the effect on examiner performance were the accuracy and reproducibility of the conclusions against the ground truth (including the impact on error rates) and the analyst accuracy and variation for feature selection and comparison.¦The results showed that participants using the consensus information from other fingerprint experts demonstrated more consistency and accuracy in minutiae selection. They also demonstrated higher accuracy, sensitivity, and specificity in the decisions reported. The quality tool also affected minutiae selection (which, in turn, had limited influence on the reported decisions); the statistical tool did not appear to influence the reported decisions.
Resumo:
Analysis of variance is commonly used in morphometry in order to ascertain differences in parameters between several populations. Failure to detect significant differences between populations (type II error) may be due to suboptimal sampling and lead to erroneous conclusions; the concept of statistical power allows one to avoid such failures by means of an adequate sampling. Several examples are given in the morphometry of the nervous system, showing the use of the power of a hierarchical analysis of variance test for the choice of appropriate sample and subsample sizes. In the first case chosen, neuronal densities in the human visual cortex, we find the number of observations to be of little effect. For dendritic spine densities in the visual cortex of mice and humans, the effect is somewhat larger. A substantial effect is shown in our last example, dendritic segmental lengths in monkey lateral geniculate nucleus. It is in the nature of the hierarchical model that sample size is always more important than subsample size. The relative weight to be attributed to subsample size thus depends on the relative magnitude of the between observations variance compared to the between individuals variance.
Resumo:
Although age-dependent effects on blood pressure (BP) have been reported, they have not been systematically investigated in large-scale genome-wide association studies (GWASs). We leveraged the infrastructure of three well-established consortia (CHARGE, GBPgen, and ICBP) and a nonstandard approach (age stratification and metaregression) to conduct a genome-wide search of common variants with age-dependent effects on systolic (SBP), diastolic (DBP), mean arterial (MAP), and pulse (PP) pressure. In a two-staged design using 99,241 individuals of European ancestry, we identified 20 genome-wide significant (p ≤ 5 × 10(-8)) loci by using joint tests of the SNP main effect and SNP-age interaction. Nine of the significant loci demonstrated nominal evidence of age-dependent effects on BP by tests of the interactions alone. Index SNPs in the EHBP1L1 (DBP and MAP), CASZ1 (SBP and MAP), and GOSR2 (PP) loci exhibited the largest age interactions, with opposite directions of effect in the young versus the old. The changes in the genetic effects over time were small but nonnegligible (up to 1.58 mm Hg over 60 years). The EHBP1L1 locus was discovered through gene-age interactions only in whites but had DBP main effects replicated (p = 8.3 × 10(-4)) in 8,682 Asians from Singapore, indicating potential interethnic heterogeneity. A secondary analysis revealed 22 loci with evidence of age-specific effects (e.g., only in 20 to 29-year-olds). Age can be used to select samples with larger genetic effect sizes and more homogenous phenotypes, which may increase statistical power. Age-dependent effects identified through novel statistical approaches can provide insight into the biology and temporal regulation underlying BP associations.
Resumo:
BACKGROUND: As part of EUROCAT's surveillance of congenital anomalies in Europe, a statistical monitoring system has been developed to detect recent clusters or long-term (10 year) time trends. The purpose of this article is to describe the system for the identification and investigation of 10-year time trends, conceived as a "screening" tool ultimately leading to the identification of trends which may be due to changing teratogenic factors.METHODS: The EUROCAT database consists of all cases of congenital anomalies including livebirths, fetal deaths from 20 weeks gestational age, and terminations of pregnancy for fetal anomaly. Monitoring of 10-year trends is performed for each registry for each of 96 non-independent EUROCAT congenital anomaly subgroups, while Pan-Europe analysis combines data from all registries. The monitoring results are reviewed, prioritized according to a prioritization strategy, and communicated to registries for investigation. Twenty-one registries covering over 4 million births, from 1999 to 2008, were included in monitoring in 2010.CONCLUSIONS: Significant increasing trends were detected for abdominal wall anomalies, gastroschisis, hypospadias, Trisomy 18 and renal dysplasia in the Pan-Europe analysis while 68 increasing trends were identified in individual registries. A decreasing trend was detected in over one-third of anomaly subgroups in the Pan-Europe analysis, and 16.9% of individual registry tests. Registry preliminary investigations indicated that many trends are due to changes in data quality, ascertainment, screening, or diagnostic methods. Some trends are inevitably chance phenomena related to multiple testing, while others seem to represent real and continuing change needing further investigation and response by regional/national public health authorities.