931 resultados para large data sets
Resumo:
In this study we present a global distribution pattern and budget of the minimum flux of particulate organic carbon to the sea floor (J POC alpha). The estimations are based on regionally specific correlations between the diffusive oxygen flux across the sediment-water interface, the total organic carbon content in surface sediments, and the oxygen concentration in bottom waters. For this, we modified the principal equation of Cai and Reimers [1995] as a basic monod reaction rate, applied within 11 regions where in situ measurements of diffusive oxygen uptake exist. By application of the resulting transfer functions to other regions with similar sedimentary conditions and areal interpolation, we calculated a minimum global budget of particulate organic carbon that actually reaches the sea floor of ~0.5 GtC yr**-1 (>1000 m water depth (wd)), whereas approximately 0.002-0.12 GtC yr**-1 is buried in the sediments (0.01-0.4% of surface primary production). Despite the fact that our global budget is in good agreement with previous studies, we found conspicuous differences among the distribution patterns of primary production, calculations based on particle trap collections of the POC flux, and J POC alpha of this study. These deviations, especially located at the southeastern and southwestern Atlantic Ocean, the Greenland and Norwegian Sea and the entire equatorial Pacific Ocean, strongly indicate a considerable influence of lateral particle transport on the vertical link between surface waters and underlying sediments. This observation is supported by sediment trap data. Furthermore, local differences in the availability and quality of the organic matter as well as different transport mechanisms through the water column are discussed.
Resumo:
During the SINOPS project, an optimal state of the art simulation of the marine silicon cycle is attempted employing a biogeochemical ocean general circulation model (BOGCM) through three particular time steps relevant for global (paleo-) climate. In order to tune the model optimally, results of the simulations are compared to a comprehensive data set of 'real' observations. SINOPS' scientific data management ensures that data structure becomes homogeneous throughout the project. Practical work routine comprises systematic progress from data acquisition, through preparation, processing, quality check and archiving, up to the presentation of data to the scientific community. Meta-information and analytical data are mapped by an n-dimensional catalogue in order to itemize the analytical value and to serve as an unambiguous identifier. In practice, data management is carried out by means of the online-accessible information system PANGAEA, which offers a tool set comprising a data warehouse, Graphical Information System (GIS), 2-D plot, cross-section plot, etc. and whose multidimensional data model promotes scientific data mining. Besides scientific and technical aspects, this alliance between scientific project team and data management crew serves to integrate the participants and allows them to gain mutual respect and appreciation.
Resumo:
Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.
Resumo:
Multi-frequency eddy current measurements are employed in estimating pressure tube (PT) to calandria tube (CT) gap in CANDU fuel channels, a critical inspection activity required to ensure fitness for service of fuel channels. In this thesis, a comprehensive characterization of eddy current gap data is laid out, in order to extract further information on fuel channel condition, and to identify generalized applications for multi-frequency eddy current data. A surface profiling technique, generalizable to multiple probe and conductive material configurations has been developed. This technique has allowed for identification of various pressure tube artefacts, has been independently validated (using ultrasonic measurements), and has been deployed and commissioned at Ontario Power Generation. Dodd and Deeds solutions to the electromagnetic boundary value problem associated with the PT to CT gap probe configuration were experimentally validated for amplitude response to changes in gap. Using the validated Dodd and Deeds solutions, principal components analysis (PCA) has been employed to identify independence and redundancies in multi-frequency eddy current data. This has allowed for an enhanced visualization of factors affecting gap measurement. Results of the PCA of simulation data are consistent with the skin depth equation, and are validated against PCA of physical experiments. Finally, compressed data acquisition has been realized, allowing faster data acquisition for multi-frequency eddy current systems with hardware limitations, and is generalizable to other applications where real time acquisition of large data sets is prohibitive.
Resumo:
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.
Resumo:
Abstract and Summary of Thesis: Background: Individuals with Major Mental Illness (such as schizophrenia and bipolar disorder) experience increased rates of physical health comorbidity compared to the general population. They also experience inequalities in access to certain aspects of healthcare. This ultimately leads to premature mortality. Studies detailing patterns of physical health comorbidity are limited by their definitions of comorbidity, single disease approach to comorbidity and by the study of heterogeneous groups. To date the investigation of possible sources of healthcare inequalities experienced by individuals with Major Mental Illness (MMI) is relatively limited. Moreover studies detailing the extent of premature mortality experienced by individuals with MMI vary both in terms of the measure of premature mortality reported and age of the cohort investigated, limiting their generalisability to the wider population. Therefore local and national data can be used to describe patterns of physical health comorbidity, investigate possible reasons for health inequalities and describe mortality rates. These findings will extend existing work in this area. Aims and Objectives: To review the relevant literature regarding: patterns of physical health comorbidity, evidence for inequalities in physical healthcare and evidence for premature mortality for individuals with MMI. To examine the rates of physical health comorbidity in a large primary care database and to assess for evidence for inequalities in access to healthcare using both routine primary care prescribing data and incentivised national Quality and Outcome Framework (QOF) data. Finally to examine the rates of premature mortality in a local context with a particular focus on cause of death across the lifespan and effect of International Classification of Disease Version 10 (ICD 10) diagnosis and socioeconomic status on rates and cause of death. Methods: A narrative review of the literature surrounding patterns of physical health comorbidity, the evidence for inequalities in physical healthcare and premature mortality in MMI was undertaken. Rates of physical health comorbidity and multimorbidity in schizophrenia and bipolar disorder were examined using a large primary care dataset (Scottish Programme for Improving Clinical Effectiveness in Primary Care (SPICE)). Possible inequalities in access to healthcare were investigated by comparing patterns of prescribing in individuals with MMI and comorbid physical health conditions with prescribing rates in individuals with physical health conditions without MMI using SPICE data. Potential inequalities in access to health promotion advice (in the form of smoking cessation) and prescribing of Nicotine Replacement Therapy (NRT) were also investigated using SPICE data. Possible inequalities in access to incentivised primary healthcare were investigated using National Quality and Outcome Framework (QOF) data. Finally a pre-existing case register (Glasgow Psychosis Clinical Information System (PsyCIS)) was linked to Scottish Mortality data (available from the Scottish Government Website) to investigate rates and primary cause of death in individuals with MMI. Rate and primary cause of death were compared to the local population and impact of age, socioeconomic status and ICD 10 diagnosis (schizophrenia vs. bipolar disorder) were investigated. Results: Analysis of the SPICE data found that sixteen out of the thirty two common physical comorbidities assessed, occurred significantly more frequently in individuals with schizophrenia. In individuals with bipolar disorder fourteen occurred more frequently. The most prevalent chronic physical health conditions in individuals with schizophrenia and bipolar disorder were: viral hepatitis (Odds Ratios (OR) 3.99 95% Confidence Interval (CI) 2.82-5.64 and OR 5.90 95% CI 3.16-11.03 respectively), constipation (OR 3.24 95% CI 3.01-3.49 and OR 2.84 95% CI 2.47-3.26 respectively) and Parkinson’s disease (OR 3.07 95% CI 2.43-3.89 and OR 2.52 95% CI 1.60-3.97 respectively). Both groups had significantly increased rates of multimorbidity compared to controls: in the schizophrenia group OR for two comorbidities was 1.37 95% CI 1.29-1.45 and in the bipolar disorder group OR was 1.34 95% CI 1.20-1.49. In the studies investigating inequalities in access to healthcare there was evidence of: under-recording of cardiovascular-related conditions for example in individuals with schizophrenia: OR for Atrial Fibrillation (AF) was 0.62 95% CI 0.52 - 0.73, for hypertension 0.71 95% CI 0.67 - 0.76, for Coronary Heart Disease (CHD) 0.76 95% CI 0.69 - 0.83 and for peripheral vascular disease (PVD) 0.83 95% CI 0.72 - 0.97. Similarly in individuals with bipolar disorder OR for AF was 0.56 95% CI 0.41-0.78, for hypertension 0.69 95% CI 0.62 - 0.77 and for CHD 0.77 95% CI 0.66 - 0.91. There was also evidence of less intensive prescribing for individuals with schizophrenia and bipolar disorder who had comorbid hypertension and CHD compared to individuals with hypertension and CHD who did not have schizophrenia or bipolar disorder. Rate of prescribing of statins for individuals with schizophrenia and CHD occurred significantly less frequently than in individuals with CHD without MMI (OR 0.67 95% CI 0.56-0.80). Rates of prescribing of 2 or more anti-hypertensives were lower in individuals with CHD and schizophrenia and CHD and bipolar disorder compared to individuals with CHD without MMI (OR 0.66 95% CI 0.56-0.78 and OR 0.55 95% CI 0.46-0.67, respectively). Smoking was more common in individuals with MMI compared to individuals without MMI (OR 2.53 95% CI 2.44-2.63) and was particularly increased in men (OR 2.83 95% CI 2.68-2.98). Rates of ex-smoking and non-smoking were lower in individuals with MMI (OR 0.79 95% CI 0.75-0.83 and OR 0.50 95% CI 0.48-0.52 respectively). However recorded rates of smoking cessation advice in smokers with MMI were significantly lower than the recorded rates of smoking cessation advice in smokers with diabetes (88.7% vs. 98.0%, p<0.001), smokers with CHD (88.9% vs. 98.7%, p<0.001) and smokers with hypertension (88.3% vs. 98.5%, p<0.001) without MMI. The odds ratio of NRT prescription was also significantly lower in smokers with MMI without diabetes compared to smokers with diabetes without MMI (OR 0.75 95% CI 0.69-0.81). Similar findings were found for smokers with MMI without CHD compared to smokers with CHD without MMI (OR 0.34 95% CI 0.31-0.38) and smokers with MMI without hypertension compared to smokers with hypertension without MMI (OR 0.71 95% CI 0.66-0.76). At a national level, payment and population achievement rates for the recording of body mass index (BMI) in MMI was significantly lower than the payment and population achievement rates for BMI recording in diabetes throughout the whole of the UK combined: payment rate 92.7% (Inter Quartile Range (IQR) 89.3-95.8 vs. 95.5% IQR 93.3-97.2, p<0.001 and population achievement rate 84.0% IQR 76.3-90.0 vs. 92.5% IQR 89.7-94.9, p<0.001 and for each country individually: for example in Scotland payment rate was 94.0% IQR 91.4-97.2 vs. 96.3% IQR 94.3-97.8, p<0.001. Exception rate was significantly higher for the recording of BMI in MMI than the exception rate for BMI recording in diabetes for the UK combined: 7.4% IQR 3.3-15.9 vs. 2.3% IQR 0.9-4.7, p<0.001 and for each country individually. For example in Scotland exception rate in MMI was 11.8% IQR 5.4-19.3 compared to 3.5% IQR 1.9-6.1 in diabetes. Similar findings were found for Blood Pressure (BP) recording: across the whole of the UK payment and population achievement rates for BP recording in MMI were also significantly reduced compared to payment and population achievement rates for the recording of BP in chronic kidney disease (CKD): payment rate: 94.1% IQR 90.9-97.1 vs.97.8% IQR 96.3-98.9 and p<0.001 and population achievement rate 87.0% IQR 81.3-91.7 vs. 97.1% IQR 95.5-98.4, p<0.001. Exception rates again were significantly higher for the recording of BP in MMI compared to CKD (6.4% IQR 3.0-13.1 vs. 0.3% IQR 0.0-1.0, p<0.001). There was also evidence of differences in rates of recording of BMI and BP in MMI across the UK. BMI and BP recording in MMI were significantly lower in Scotland compared to England (BMI:-1.5% 99% CI -2.7 to -0.3%, p<0.001 and BP: -1.8% 99% CI -2.7 to -0.9%, p<0.001). While rates of BMI and BP recording in diabetes and CKD were similar in Scotland compared to England (BMI: -0.5 99% CI -1.0 to 0.05, p=0.004 and BP: 0.02 99% CI -0.2 to 0.3, p=0.797). Data from the PsyCIS cohort showed an increase in Standardised Mortality Ratios (SMR) across the lifespan for individuals with MMI compared to the local Glasgow and wider Scottish populations (Glasgow SMR 1.8 95% CI 1.6-2.0 and Scotland SMR 2.7 95% CI 2.4-3.1). Increasing socioeconomic deprivation was associated with an increased overall rate of death in MMI (350.3 deaths/10,000 population/5 years in the least deprived quintile compared to 794.6 deaths/10,000 population/5 years in the most deprived quintile). No significant difference in rate of death for individuals with schizophrenia compared with bipolar disorder was reported (6.3% vs. 4.9%, p=0.086), but primary cause of death varied: with higher rates of suicide in individuals with bipolar disorder (22.4% vs. 11.7%, p=0.04). Discussion: Local and national datasets can be used for epidemiological study to inform local practice and complement existing national and international studies. While the strengths of this thesis include the large data sets used and therefore their likely representativeness to the wider population, some limitations largely associated with using secondary data sources are acknowledged. While this thesis has confirmed evidence of increased physical health comorbidity and multimorbidity in individuals with MMI, it is likely that these findings represent a significant under reporting and likely under recognition of physical health comorbidity in this population. This is likely due to a combination of patient, health professional and healthcare system factors and requires further investigation. Moreover, evidence of inequality in access to healthcare in terms of: physical health promotion (namely smoking cessation advice), recording of physical health indices (BMI and BP), prescribing of medications for the treatment of physical illness and prescribing of NRT has been found at a national level. While significant premature mortality in individuals with MMI within a Scottish setting has been confirmed, more work is required to further detail and investigate the impact of socioeconomic deprivation on cause and rate of death in this population. It is clear that further education and training is required for all healthcare staff to improve the recognition, diagnosis and treatment of physical health problems in this population with the aim of addressing the significant premature mortality that is seen. Conclusions: Future work lies in the challenge of designing strategies to reduce health inequalities and narrow the gap in premature mortality reported in individuals with MMI. Models of care that allow a much more integrated approach to diagnosing, monitoring and treating both the physical and mental health of individuals with MMI, particularly in areas of social and economic deprivation may be helpful. Strategies to engage this “hard to reach” population also need to be developed. While greater integration of psychiatric services with primary care and with specialist medical services is clearly vital the evidence on how best to achieve this is limited. While the National Health Service (NHS) is currently undergoing major reform, attention needs to be paid to designing better ways to improve the current disconnect between primary and secondary care. This should then help to improve physical, psychological and social outcomes for individuals with MMI.
Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
Resumo:
Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.
Resumo:
Imaging Spectroscopy (IS) is a promising tool for studying soil properties in large spatial domains. Going from point to image spectrometry is not only a journey from micro to macro scales, but also a long stage where problems such as dealing with data having a low signal-to-noise level, contamination of the atmosphere, large data sets, the BRDF effect and more are often encountered. In this paper we provide an up-to-date overview of some of the case studies that have used IS technology for soil science applications. Besides a brief discussion on the advantages and disadvantages of IS for studying soils, the following cases are comprehensively discussed: soil degradation (salinity, erosion, and deposition), soil mapping and classification, soil genesis and formation, soil contamination, soil water content, and soil swelling. We review these case studies and suggest that the 15 data be provided to the end-users as real reflectance and not as raw data and with better signal-to-noise ratios than presently exist. This is because converting the raw data into reflectance is a complicated stage that requires experience, knowledge, and specific infrastructures not available to many users, whereas quantitative spectral models require good quality data. These limitations serve as a barrier that impedes potential end-users, inhibiting researchers from trying this technique for their needs. The paper ends with a general call to the soil science audience to extend the utilization of the IS technique, and it provides some ideas on how to propel this technology forward to enable its widespread adoption in order to achieve a breakthrough in the field of soil science and remote sensing. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.
Resumo:
A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.
Resumo:
Asymptomatic Plasmodium infection carriers represent a major threat to malaria control worldwide as they are silent natural reservoirs and do not seek medical care. There are no standard criteria for asymptomaticPlasmodium infection; therefore, its diagnosis relies on the presence of the parasite during a specific period of symptomless infection. The antiparasitic immune response can result in reducedPlasmodium sp. load with control of disease manifestations, which leads to asymptomatic infection. Both the innate and adaptive immune responses seem to play major roles in asymptomatic Plasmodiuminfection; T regulatory cell activity (through the production of interleukin-10 and transforming growth factor-β) and B-cells (with a broad antibody response) both play prominent roles. Furthermore, molecules involved in the haem detoxification pathway (such as haptoglobin and haeme oxygenase-1) and iron metabolism (ferritin and activated c-Jun N-terminal kinase) have emerged in recent years as potential biomarkers and thus are helping to unravel the immune response underlying asymptomatic Plasmodium infection. The acquisition of large data sets and the use of robust statistical tools, including network analysis, associated with well-designed malaria studies will likely help elucidate the immune mechanisms responsible for asymptomatic infection.
Resumo:
Our understanding of the distribution of worldwide human genomic diversity has greatly increased over recent years thanks to the availability of large data sets derived from short tandem repeats (STRs), insertion deletion polymorphisms (indels) and single nucleotide polymorphisms (SNPs). A concern, however, is that the current picture of worldwide human genomic diversity may be inaccurate because of biases in the selection process of genetic markers (so-called 'ascertainment bias'). To evaluate this problem, we first compared the distribution of genomic diversity between these three types of genetic markers in the populations from the HGDP-CEPH panel for evidence of bias or incongruities. In a second step, using a very relaxed set of criteria to prevent the intrusion of bias, we developed a new set of unbiased STR markers and compared the results against those from available panels. Contrarily to recent claims, our results show that the STR markers suffer from no discernible bias, and can thus be used as a baseline reference for human genetic diversity and population differentiation. The bias on SNPs is moderate compared to that on the set of indels analysed, which we recommend should be avoided for work describing the distribution of human genetic diversity or making inference on human settlement history.
Resumo:
Ces dernières années, de nombreuses recherches ont mis en évidence les effets toxiques des micropolluants organiques pour les espèces de nos lacs et rivières. Cependant, la plupart de ces études se sont focalisées sur la toxicité des substances individuelles, alors que les organismes sont exposés tous les jours à des milliers de substances en mélange. Or les effets de ces cocktails ne sont pas négligeables. Cette thèse de doctorat s'est ainsi intéressée aux modèles permettant de prédire le risque environnemental de ces cocktails pour le milieu aquatique. Le principal objectif a été d'évaluer le risque écologique des mélanges de substances chimiques mesurées dans le Léman, mais aussi d'apporter un regard critique sur les méthodologies utilisées afin de proposer certaines adaptations pour une meilleure estimation du risque. Dans la première partie de ce travail, le risque des mélanges de pesticides et médicaments pour le Rhône et pour le Léman a été établi en utilisant des approches envisagées notamment dans la législation européenne. Il s'agit d'approches de « screening », c'est-à-dire permettant une évaluation générale du risque des mélanges. Une telle approche permet de mettre en évidence les substances les plus problématiques, c'est-à-dire contribuant le plus à la toxicité du mélange. Dans notre cas, il s'agit essentiellement de 4 pesticides. L'étude met également en évidence que toutes les substances, même en trace infime, contribuent à l'effet du mélange. Cette constatation a des implications en terme de gestion de l'environnement. En effet, ceci implique qu'il faut réduire toutes les sources de polluants, et pas seulement les plus problématiques. Mais l'approche proposée présente également un biais important au niveau conceptuel, ce qui rend son utilisation discutable, en dehors d'un screening, et nécessiterait une adaptation au niveau des facteurs de sécurité employés. Dans une deuxième partie, l'étude s'est portée sur l'utilisation des modèles de mélanges dans le calcul de risque environnemental. En effet, les modèles de mélanges ont été développés et validés espèce par espèce, et non pour une évaluation sur l'écosystème en entier. Leur utilisation devrait donc passer par un calcul par espèce, ce qui est rarement fait dû au manque de données écotoxicologiques à disposition. Le but a été donc de comparer, avec des valeurs générées aléatoirement, le calcul de risque effectué selon une méthode rigoureuse, espèce par espèce, avec celui effectué classiquement où les modèles sont appliqués sur l'ensemble de la communauté sans tenir compte des variations inter-espèces. Les résultats sont dans la majorité des cas similaires, ce qui valide l'approche utilisée traditionnellement. En revanche, ce travail a permis de déterminer certains cas où l'application classique peut conduire à une sous- ou sur-estimation du risque. Enfin, une dernière partie de cette thèse s'est intéressée à l'influence que les cocktails de micropolluants ont pu avoir sur les communautés in situ. Pour ce faire, une approche en deux temps a été adoptée. Tout d'abord la toxicité de quatorze herbicides détectés dans le Léman a été déterminée. Sur la période étudiée, de 2004 à 2009, cette toxicité due aux herbicides a diminué, passant de 4% d'espèces affectées à moins de 1%. Ensuite, la question était de savoir si cette diminution de toxicité avait un impact sur le développement de certaines espèces au sein de la communauté des algues. Pour ce faire, l'utilisation statistique a permis d'isoler d'autres facteurs pouvant avoir une influence sur la flore, comme la température de l'eau ou la présence de phosphates, et ainsi de constater quelles espèces se sont révélées avoir été influencées, positivement ou négativement, par la diminution de la toxicité dans le lac au fil du temps. Fait intéressant, une partie d'entre-elles avait déjà montré des comportements similaires dans des études en mésocosmes. En conclusion, ce travail montre qu'il existe des modèles robustes pour prédire le risque des mélanges de micropolluants sur les espèces aquatiques, et qu'ils peuvent être utilisés pour expliquer le rôle des substances dans le fonctionnement des écosystèmes. Toutefois, ces modèles ont bien sûr des limites et des hypothèses sous-jacentes qu'il est important de considérer lors de leur application. - Depuis plusieurs années, les risques que posent les micropolluants organiques pour le milieu aquatique préoccupent grandement les scientifiques ainsi que notre société. En effet, de nombreuses recherches ont mis en évidence les effets toxiques que peuvent avoir ces substances chimiques sur les espèces de nos lacs et rivières, quand elles se retrouvent exposées à des concentrations aiguës ou chroniques. Cependant, la plupart de ces études se sont focalisées sur la toxicité des substances individuelles, c'est à dire considérées séparément. Actuellement, il en est de même dans les procédures de régulation européennes, concernant la partie évaluation du risque pour l'environnement d'une substance. Or, les organismes sont exposés tous les jours à des milliers de substances en mélange, et les effets de ces "cocktails" ne sont pas négligeables. L'évaluation du risque écologique que pose ces mélanges de substances doit donc être abordé par de la manière la plus appropriée et la plus fiable possible. Dans la première partie de cette thèse, nous nous sommes intéressés aux méthodes actuellement envisagées à être intégrées dans les législations européennes pour l'évaluation du risque des mélanges pour le milieu aquatique. Ces méthodes sont basées sur le modèle d'addition des concentrations, avec l'utilisation des valeurs de concentrations des substances estimées sans effet dans le milieu (PNEC), ou à partir des valeurs des concentrations d'effet (CE50) sur certaines espèces d'un niveau trophique avec la prise en compte de facteurs de sécurité. Nous avons appliqué ces méthodes à deux cas spécifiques, le lac Léman et le Rhône situés en Suisse, et discuté les résultats de ces applications. Ces premières étapes d'évaluation ont montré que le risque des mélanges pour ces cas d'étude atteint rapidement une valeur au dessus d'un seuil critique. Cette valeur atteinte est généralement due à deux ou trois substances principales. Les procédures proposées permettent donc d'identifier les substances les plus problématiques pour lesquelles des mesures de gestion, telles que la réduction de leur entrée dans le milieu aquatique, devraient être envisagées. Cependant, nous avons également constaté que le niveau de risque associé à ces mélanges de substances n'est pas négligeable, même sans tenir compte de ces substances principales. En effet, l'accumulation des substances, même en traces infimes, atteint un seuil critique, ce qui devient plus difficile en terme de gestion du risque. En outre, nous avons souligné un manque de fiabilité dans ces procédures, qui peuvent conduire à des résultats contradictoires en terme de risque. Ceci est lié à l'incompatibilité des facteurs de sécurité utilisés dans les différentes méthodes. Dans la deuxième partie de la thèse, nous avons étudié la fiabilité de méthodes plus avancées dans la prédiction de l'effet des mélanges pour les communautés évoluant dans le système aquatique. Ces méthodes reposent sur le modèle d'addition des concentrations (CA) ou d'addition des réponses (RA) appliqués sur les courbes de distribution de la sensibilité des espèces (SSD) aux substances. En effet, les modèles de mélanges ont été développés et validés pour être appliqués espèce par espèce, et non pas sur plusieurs espèces agrégées simultanément dans les courbes SSD. Nous avons ainsi proposé une procédure plus rigoureuse, pour l'évaluation du risque d'un mélange, qui serait d'appliquer d'abord les modèles CA ou RA à chaque espèce séparément, et, dans une deuxième étape, combiner les résultats afin d'établir une courbe SSD du mélange. Malheureusement, cette méthode n'est pas applicable dans la plupart des cas, car elle nécessite trop de données généralement indisponibles. Par conséquent, nous avons comparé, avec des valeurs générées aléatoirement, le calcul de risque effectué selon cette méthode plus rigoureuse, avec celle effectuée traditionnellement, afin de caractériser la robustesse de cette approche qui consiste à appliquer les modèles de mélange sur les courbes SSD. Nos résultats ont montré que l'utilisation de CA directement sur les SSDs peut conduire à une sous-estimation de la concentration du mélange affectant 5 % ou 50% des espèces, en particulier lorsque les substances présentent un grand écart- type dans leur distribution de la sensibilité des espèces. L'application du modèle RA peut quant à lui conduire à une sur- ou sous-estimations, principalement en fonction de la pente des courbes dose- réponse de chaque espèce composant les SSDs. La sous-estimation avec RA devient potentiellement importante lorsque le rapport entre la EC50 et la EC10 de la courbe dose-réponse des espèces est plus petit que 100. Toutefois, la plupart des substances, selon des cas réels, présentent des données d' écotoxicité qui font que le risque du mélange calculé par la méthode des modèles appliqués directement sur les SSDs reste cohérent et surestimerait plutôt légèrement le risque. Ces résultats valident ainsi l'approche utilisée traditionnellement. Néanmoins, il faut garder à l'esprit cette source d'erreur lorsqu'on procède à une évaluation du risque d'un mélange avec cette méthode traditionnelle, en particulier quand les SSD présentent une distribution des données en dehors des limites déterminées dans cette étude. Enfin, dans la dernière partie de cette thèse, nous avons confronté des prédictions de l'effet de mélange avec des changements biologiques observés dans l'environnement. Dans cette étude, nous avons utilisé des données venant d'un suivi à long terme d'un grand lac européen, le lac Léman, ce qui offrait la possibilité d'évaluer dans quelle mesure la prédiction de la toxicité des mélanges d'herbicide expliquait les changements dans la composition de la communauté phytoplanctonique. Ceci à côté d'autres paramètres classiques de limnologie tels que les nutriments. Pour atteindre cet objectif, nous avons déterminé la toxicité des mélanges sur plusieurs années de 14 herbicides régulièrement détectés dans le lac, en utilisant les modèles CA et RA avec les courbes de distribution de la sensibilité des espèces. Un gradient temporel de toxicité décroissant a pu être constaté de 2004 à 2009. Une analyse de redondance et de redondance partielle, a montré que ce gradient explique une partie significative de la variation de la composition de la communauté phytoplanctonique, même après avoir enlevé l'effet de toutes les autres co-variables. De plus, certaines espèces révélées pour avoir été influencées, positivement ou négativement, par la diminution de la toxicité dans le lac au fil du temps, ont montré des comportements similaires dans des études en mésocosmes. On peut en conclure que la toxicité du mélange herbicide est l'un des paramètres clés pour expliquer les changements de phytoplancton dans le lac Léman. En conclusion, il existe diverses méthodes pour prédire le risque des mélanges de micropolluants sur les espèces aquatiques et celui-ci peut jouer un rôle dans le fonctionnement des écosystèmes. Toutefois, ces modèles ont bien sûr des limites et des hypothèses sous-jacentes qu'il est important de considérer lors de leur application, avant d'utiliser leurs résultats pour la gestion des risques environnementaux. - For several years now, the scientists as well as the society is concerned by the aquatic risk organic micropollutants may pose. Indeed, several researches have shown the toxic effects these substances may induce on organisms living in our lakes or rivers, especially when they are exposed to acute or chronic concentrations. However, most of the studies focused on the toxicity of single compounds, i.e. considered individually. The same also goes in the current European regulations concerning the risk assessment procedures for the environment of these substances. But aquatic organisms are typically exposed every day simultaneously to thousands of organic compounds. The toxic effects resulting of these "cocktails" cannot be neglected. The ecological risk assessment of mixtures of such compounds has therefore to be addressed by scientists in the most reliable and appropriate way. In the first part of this thesis, the procedures currently envisioned for the aquatic mixture risk assessment in European legislations are described. These methodologies are based on the mixture model of concentration addition and the use of the predicted no effect concentrations (PNEC) or effect concentrations (EC50) with assessment factors. These principal approaches were applied to two specific case studies, Lake Geneva and the River Rhône in Switzerland, including a discussion of the outcomes of such applications. These first level assessments showed that the mixture risks for these studied cases exceeded rapidly the critical value. This exceeding is generally due to two or three main substances. The proposed procedures allow therefore the identification of the most problematic substances for which management measures, such as a reduction of the entrance to the aquatic environment, should be envisioned. However, it was also showed that the risk levels associated with mixtures of compounds are not negligible, even without considering these main substances. Indeed, it is the sum of the substances that is problematic, which is more challenging in term of risk management. Moreover, a lack of reliability in the procedures was highlighted, which can lead to contradictory results in terms of risk. This result is linked to the inconsistency in the assessment factors applied in the different methods. In the second part of the thesis, the reliability of the more advanced procedures to predict the mixture effect to communities in the aquatic system were investigated. These established methodologies combine the model of concentration addition (CA) or response addition (RA) with species sensitivity distribution curves (SSD). Indeed, the mixture effect predictions were shown to be consistent only when the mixture models are applied on a single species, and not on several species simultaneously aggregated to SSDs. Hence, A more stringent procedure for mixture risk assessment is proposed, that would be to apply first the CA or RA models to each species separately and, in a second step, to combine the results to build an SSD for a mixture. Unfortunately, this methodology is not applicable in most cases, because it requires large data sets usually not available. Therefore, the differences between the two methodologies were studied with datasets created artificially to characterize the robustness of the traditional approach applying models on species sensitivity distribution. The results showed that the use of CA on SSD directly might lead to underestimations of the mixture concentration affecting 5% or 50% of species, especially when substances present a large standard deviation of the distribution from the sensitivity of the species. The application of RA can lead to over- or underestimates, depending mainly on the slope of the dose-response curves of the individual species. The potential underestimation with RA becomes important when the ratio between the EC50 and the EC10 for the dose-response curve of the species composing the SSD are smaller than 100. However, considering common real cases of ecotoxicity data for substances, the mixture risk calculated by the methodology applying mixture models directly on SSDs remains consistent and would rather slightly overestimate the risk. These results can be used as a theoretical validation of the currently applied methodology. Nevertheless, when assessing the risk of mixtures, one has to keep in mind this source of error with this classical methodology, especially when SSDs present a distribution of the data outside the range determined in this study Finally, in the last part of this thesis, we confronted the mixture effect predictions with biological changes observed in the environment. In this study, long-term monitoring of a European great lake, Lake Geneva, provides the opportunity to assess to what extent the predicted toxicity of herbicide mixtures explains the changes in the composition of the phytoplankton community next to other classical limnology parameters such as nutrients. To reach this goal, the gradient of the mixture toxicity of 14 herbicides regularly detected in the lake was calculated, using concentration addition and response addition models. A decreasing temporal gradient of toxicity was observed from 2004 to 2009. Redundancy analysis and partial redundancy analysis showed that this gradient explains a significant portion of the variation in phytoplankton community composition, even when having removed the effect of all other co-variables. Moreover, some species that were revealed to be influenced positively or negatively, by the decrease of toxicity in the lake over time, showed similar behaviors in mesocosms studies. It could be concluded that the herbicide mixture toxicity is one of the key parameters to explain phytoplankton changes in Lake Geneva. To conclude, different methods exist to predict the risk of mixture in the ecosystems. But their reliability varies depending on the underlying hypotheses. One should therefore carefully consider these hypotheses, as well as the limits of the approaches, before using the results for environmental risk management