979 resultados para Asymptotic Mean Squared Errors


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complete Basis Set and Gaussian-n methods were combined with CPCM continuum solvation methods to calculate pKa values for six carboxylic acids. An experimental value of −264.61 kcal/mol for the free energy of solvation of H+, ΔGs(H+), was combined with a value for Ggas(H+) of −6.28 kcal/mol to calculate pKa values with Cycle 1. The Complete Basis Set gas-phase methods used to calculate gas-phase free energies are very accurate, with mean unsigned errors of 0.3 kcal/mol and standard deviations of 0.4 kcal/mol. The CPCM solvation calculations used to calculate condensed-phase free energies are slightly less accurate than the gas-phase models, and the best method has a mean unsigned error and standard deviation of 0.4 and 0.5 kcal/mol, respectively. The use of Cycle 1 and the Complete Basis Set models combined with the CPCM solvation methods yielded pKa values accurate to less than half a pKa unit.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Endurance athletes have an increased risk of developing atrial fibrillation (AF) at 40 to 50 years of age. Signal-averaged P-wave analysis has been used for identifying patients at risk for AF. We evaluated the impact of lifetime training hours on signal-averaged P-wave duration and modifying factors. Nonelite men athletes scheduled to participate in the 2010 Grand Prix of Bern, a 10-mile race, were invited. Four hundred ninety-two marathon and nonmarathon runners applied for participation, 70 were randomly selected, and 60 entered the final analysis. Subjects were stratified according to their lifetime training hours (average endurance and strength training hours per week × 52 × training years) in low (<1,500 hours), medium (1,500 to 4,500 hours), and high (>4,500 hours) training groups. Mean age was 42 ± 7 years. From low to high training groups signal-averaged P-wave duration increased from 131 ± 6 to 142 ± 13 ms (p = 0.026), and left atrial volume increased from 24.8 ± 4.6 to 33.1 ± 6.2 ml/m(2) (p = 0.001). Parasympathetic tone expressed as root of the mean squared differences of successive normal-to-normal intervals increased from 34 ± 13 to 47 ± 16 ms (p = 0.002), and premature atrial contractions increased from 6.1 ± 7.4 to 10.8 ± 7.7 per 24 hours (p = 0.026). Left ventricular mass increased from 100.7 ± 9.0 to 117.1 ± 18.2 g/m(2) (p = 0.002). Left ventricular systolic and diastolic function and blood pressure at rest were normal in all athletes and showed no differences among training groups. Four athletes (6.7%) had a history of paroxysmal AF, as did 1 athlete in the medium training group and 3 athletes in the high training group (p = 0.252). In conclusion, in nonelite men athletes lifetime training hours are associated with prolongation of signal-averaged P-wave duration and an increase in left atrial volume. The altered left atrial substrate may facilitate occurrence of AF. Increased vagal tone and atrial ectopy may serve as modifying and triggering factors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Functional Magnetic Resonance Imaging (fMRI) is a non-invasive technique which is commonly used to quantify changes in blood oxygenation and flow coupled to neuronal activation. One of the primary goals of fMRI studies is to identify localized brain regions where neuronal activation levels vary between groups. Single voxel t-tests have been commonly used to determine whether activation related to the protocol differs across groups. Due to the generally limited number of subjects within each study, accurate estimation of variance at each voxel is difficult. Thus, combining information across voxels in the statistical analysis of fMRI data is desirable in order to improve efficiency. Here we construct a hierarchical model and apply an Empirical Bayes framework on the analysis of group fMRI data, employing techniques used in high throughput genomic studies. The key idea is to shrink residual variances by combining information across voxels, and subsequently to construct an improved test statistic in lieu of the classical t-statistic. This hierarchical model results in a shrinkage of voxel-wise residual sample variances towards a common value. The shrunken estimator for voxelspecific variance components on the group analyses outperforms the classical residual error estimator in terms of mean squared error. Moreover, the shrunken test-statistic decreases false positive rate when testing differences in brain contrast maps across a wide range of simulation studies. This methodology was also applied to experimental data regarding a cognitive activation task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Atmospheric turbulence near the ground severely limits the quality of imagery acquired over long horizontal paths. In defense, surveillance, and border security applications, there is interest in deploying man-portable, embedded systems incorporating image reconstruction methods to compensate turbulence effects. While many image reconstruction methods have been proposed, their suitability for use in man-portable embedded systems is uncertain. To be effective, these systems must operate over significant variations in turbulence conditions while subject to other variations due to operation by novice users. Systems that meet these requirements and are otherwise designed to be immune to the factors that cause variation in performance are considered robust. In addition robustness in design, the portable nature of these systems implies a preference for systems with a minimum level of computational complexity. Speckle imaging methods have recently been proposed as being well suited for use in man-portable horizontal imagers. In this work, the robustness of speckle imaging methods is established by identifying a subset of design parameters that provide immunity to the expected variations in operating conditions while minimizing the computation time necessary for image recovery. Design parameters are selected by parametric evaluation of system performance as factors external to the system are varied. The precise control necessary for such an evaluation is made possible using image sets of turbulence degraded imagery developed using a novel technique for simulating anisoplanatic image formation over long horizontal paths. System performance is statistically evaluated over multiple reconstruction using the Mean Squared Error (MSE) to evaluate reconstruction quality. In addition to more general design parameters, the relative performance the bispectrum and the Knox-Thompson phase recovery methods is also compared. As an outcome of this work it can be concluded that speckle-imaging techniques are robust to the variation in turbulence conditions and user controlled parameters expected when operating during the day over long horizontal paths. Speckle imaging systems that incorporate 15 or more image frames and 4 estimates of the object phase per reconstruction provide up to 45% reduction in MSE and 68% reduction in the deviation. In addition, Knox-Thompson phase recover method is shown to produce images in half the time required by the bispectrum. The quality of images reconstructed using Knox-Thompson and bispectrum methods are also found to be nearly identical. Finally, it is shown that certain blind image quality metrics can be used in place of the MSE to evaluate quality in field scenarios. Using blind metrics rather depending on user estimates allows for reconstruction quality that differs from the minimum MSE by as little as 1%, significantly reducing the deviation in performance due to user action.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The capability to detect combustion in a diesel engine has the potential of being an important control feature to meet increasingly stringent emission regulations, develop alternative combustion strategies, and use of biofuels. In this dissertation, block mounted accelerometers were investigated as potential feedback sensors for detecting combustion characteristics in a high-speed, high pressure common rail (HPCR), 1.9L diesel engine. Accelerometers were positioned in multiple placements and orientations on the engine, and engine testing was conducted under motored, single and pilot-main injection conditions. Engine tests were conducted at varying injection timings, engine loads, and engine speeds to observe the resulting time and frequency domain changes of the cylinder pressure and accelerometer signals. The frequency content of the cylinder pressure based signals and the accelerometer signals between 0.5 kHz and 6 kHz indicated a strong correlation with coherence values of nearly 1. The accelerometers were used to produce estimated combustion signals using the Frequency Response Functions (FRF) measured from the frequency domain characteristics of the cylinder pressure signals and the response of the accelerometers attached to the engine block. When compared to the actual combustion signals, the estimated combustion signals produced from the accelerometer response had Root Mean Square Errors (RMSE) between 7% and 25% of the actual signals peak value. Weighting the FRF’s from multiple test conditions along their frequency axis with the coherent output power reduced the median RMSE of the estimated combustion signals and the 95th percentile of RMSE produced from each test condition. The RMSE’s of the magnitude based combustion metrics including peak cylinder pressure, MPG, peak ROHR, and work estimated from the combustion signals produced by the accelerometer responses were between 15% and 50% of their actual value. The MPG measured from the estimated pressure gradient shared a direct relationship to the actual MPG. The location based combustion metrics such as the location of peak values and burn durations were capable of RMSE measurements as low as 0.9°. Overall, accelerometer based combustion sensing system was capable of detecting combustion and providing feedback regarding the in cylinder combustion process

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a vertically resolved zonal mean monthly mean global ozone data set spanning the period 1901 to 2007, called HISTOZ.1.0. It is based on a new approach that combines information from an ensemble of chemistry climate model (CCM) simulations with historical total column ozone information. The CCM simulations incorporate important external drivers of stratospheric chemistry and dynamics (in particular solar and volcanic effects, greenhouse gases and ozone depleting substances, sea surface temperatures, and the quasi-biennial oscillation). The historical total column ozone observations include ground-based measurements from the 1920s onward and satellite observations from 1970 to 1976. An off-line data assimilation approach is used to combine model simulations, observations, and information on the observation error. The period starting in 1979 was used for validation with existing ozone data sets and therefore only ground-based measurements were assimilated. Results demonstrate considerable skill from the CCM simulations alone. Assimilating observations provides additional skill for total column ozone. With respect to the vertical ozone distribution, assimilating observations increases on average the correlation with a reference data set, but does not decrease the mean squared error. Analyses of HISTOZ.1.0 with respect to the effects of El Niño–Southern Oscillation (ENSO) and of the 11 yr solar cycle on stratospheric ozone from 1934 to 1979 qualitatively confirm previous studies that focussed on the post-1979 period. The ENSO signature exhibits a much clearer imprint of a change in strength of the Brewer–Dobson circulation compared to the post-1979 period. The imprint of the 11 yr solar cycle is slightly weaker in the earlier period. Furthermore, the total column ozone increase from the 1950s to around 1970 at northern mid-latitudes is briefly discussed. Indications for contributions of a tropospheric ozone increase, greenhouse gases, and changes in atmospheric circulation are found. Finally, the paper points at several possible future improvements of HISTOZ.1.0.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A lack of quantitative high resolution paleoclimate data from the Southern Hemisphere limits the ability to examine current trends within the context of long-term natural climate variability. This study presents a temperature reconstruction for southern Tasmania based on analyses of a sediment core from Duckhole Lake (43.365°S, 146.875°E). The relationship between non-destructive whole core scanning reflectance spectroscopy measurements in the visible spectrum (380–730 nm) and the instrumental temperature record (ad 1911–2000) was used to develop a calibration-in-time reflectance spectroscopy-based temperature model. Results showed that a trough in reflectance from 650 to 700 nm, which represents chlorophyll and its derivatives, was significantly correlated to annual mean temperature. A calibration model was developed (R = 0.56, p auto < 0.05, root mean squared error of prediction (RMSEP) = 0.21°C, five-year filtered data, calibration period 1911–2000) and applied down-core to reconstruct annual mean temperatures in southern Tasmania over the last c. 950 years. This indicated that temperatures were initially cool c. ad 1050, but steadily increased until the late ad 1100s. After a brief cool period in the ad 1200s, temperatures again increased. Temperatures steadily decreased during the ad 1600s and remained relatively stable until the start of the 20th century when they rapidly decreased, before increasing from ad 1960s onwards. Comparisons with high resolution temperature records from western Tasmania, New Zealand and South America revealed some similarities, but also highlighted differences in temperature variability across the mid-latitudes of the Southern Hemisphere. These are likely due to a combination of factors including the spatial variability in climate between and within regions, and differences between records that document seasonal (i.e. warm season/late summer) versus annual temperature variability. This highlights the need for further records from the mid-latitudes of the Southern Hemisphere in order to constrain past natural spatial and seasonal/annual temperature variability in the region, and to accurately identify and attribute changes to natural variability and/or anthropogenic activities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-resolution, well-calibrated records of lake sediments are critically important for quantitative climate reconstructions, but they remain a methodological and analytical challenge. While several comprehensive paleotemperature reconstructions have been developed across Europe, only a few quantitative high-resolution studies exist for precipitation. Here we present a calibration and verification study of lithoclastic sediment proxies from proglacial Lake Oeschinen (46°30′N, 7°44′E, 1,580 m a.s.l., north–west Swiss Alps) that are sensitive to rainfall for the period AD 1901–2008. We collected two sediment cores, one in 2007 and another in 2011. The sediments are characterized by two facies: (A) mm-laminated clastic varves and (B) turbidites. The annual character of the laminae couplets was confirmed by radiometric dating (210Pb, 137Cs) and independent flood-layer chronomarkers. Individual varves consist of a dark sand-size spring-summer layer enriched in siliciclastic minerals and a lighter clay-size calcite-rich winter layer. Three subtypes of varves are distinguished: Type I with a 1–1.5 mm fining upward sequence; Type II with a distinct fine-sand base up to 3 mm thick; and Type III containing multiple internal microlaminae caused by individual summer rainstorm deposits. Delta-fan surface samples and sediment trap data fingerprint different sediment source areas and transport processes from the watershed and confirm the instant response of sediment flux to rainfall and erosion. Based on a highly accurate, precise and reproducible chronology, we demonstrate that sediment accumulation (varve thickness) is a quantitative predictor for cumulative boreal alpine spring (May–June) and spring/summer (May–August) rainfall (rMJ = 0.71, rMJJA = 0.60, p < 0.01). Bootstrap-based verification of the calibration model reveals a root mean squared error of prediction (RMSEPMJ = 32.7 mm, RMSEPMJJA = 57.8 mm) which is on the order of 10–13 % of mean MJ and MJJA cumulative precipitation, respectively. These results highlight the potential of the Lake Oeschinen sediments for high-resolution reconstructions of past rainfall conditions in the northern Swiss Alps, central and eastern France and south-west Germany.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of group-randomized trials is particularly widespread in the evaluation of health care, educational, and screening strategies. Group-randomized trials represent a subset of a larger class of designs often labeled nested, hierarchical, or multilevel and are characterized by the randomization of intact social units or groups, rather than individuals. The application of random effects models to group-randomized trials requires the specification of fixed and random components of the model. The underlying assumption is usually that these random components are normally distributed. This research is intended to determine if the Type I error rate and power are affected when the assumption of normality for the random component representing the group effect is violated. ^ In this study, simulated data are used to examine the Type I error rate, power, bias and mean squared error of the estimates of the fixed effect and the observed intraclass correlation coefficient (ICC) when the random component representing the group effect possess distributions with non-normal characteristics, such as heavy tails or severe skewness. The simulated data are generated with various characteristics (e.g. number of schools per condition, number of students per school, and several within school ICCs) observed in most small, school-based, group-randomized trials. The analysis is carried out using SAS PROC MIXED, Version 6.12, with random effects specified in a random statement and restricted maximum likelihood (REML) estimation specified. The results from the non-normally distributed data are compared to the results obtained from the analysis of data with similar design characteristics but normally distributed random effects. ^ The results suggest that the violation of the normality assumption for the group component by a skewed or heavy-tailed distribution does not appear to influence the estimation of the fixed effect, Type I error, and power. Negative biases were detected when estimating the sample ICC and dramatically increased in magnitude as the true ICC increased. These biases were not as pronounced when the true ICC was within the range observed in most group-randomized trials (i.e. 0.00 to 0.05). The normally distributed group effect also resulted in bias ICC estimates when the true ICC was greater than 0.05. However, this may be a result of higher correlation within the data. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose The aim was to test the impact of body mass index (BMI) and gender on infectious complications after polytrauma. Methods A total of 651 patients were included in this retrospective study, with an Injury Severity Score (ISS) C16 and age C16 years. The sample was subdivided into three groups: BMI\25 kg/m2, BMI 25–30 kg/m2, and BMI[30 kg/m2, and a female and a male group. Infectious complications were observed for 31 days after admission. Data are given as mean ± standard errors of the means. Analysis of variance, Kruskal–Wallis test, v2 tests, and Pearson’s correlation were used for the analyses and the significance level was set at P\0.05. Results The overall infection rates were 31.0 % in the BMI\25 kg/m2 group, 29.0 % in the BMI 25–30 kg/m2 group, and 24.5 % in the BMI[30 kg/m2 group (P = 0.519). The female patients developed significantly fewer infectious complications than the male patients (26.8 vs. 73.2 %; P\0.001). The incidence of death was significantly decreased according to the BMI group (8.8 vs. 7.2 vs. 1.5 %; P\0.0001) and the female population had a significantly lower mortality rate (4.1 vs. 13.4 %; P\0.0001). Pearson’s correlations between the Abbreviated Injury Scale (AIS) score and the corresponding infectious foci were not significant. Conclusion Higher BMI seems to be protective against polytrauma-associated death but not polytrauma-associated infections, and female gender protects against both polytrauma- associated infections and death. Understanding gender-specific immunomodulation could improve the outcome of polytrauma patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, we develop an adaptive framework for Monte Carlo rendering, and more specifically for Monte Carlo Path Tracing (MCPT) and its derivatives. MCPT is attractive because it can handle a wide variety of light transport effects, such as depth of field, motion blur, indirect illumination, participating media, and others, in an elegant and unified framework. However, MCPT is a sampling-based approach, and is only guaranteed to converge in the limit, as the sampling rate grows to infinity. At finite sampling rates, MCPT renderings are often plagued by noise artifacts that can be visually distracting. The adaptive framework developed in this thesis leverages two core strategies to address noise artifacts in renderings: adaptive sampling and adaptive reconstruction. Adaptive sampling consists in increasing the sampling rate on a per pixel basis, to ensure that each pixel value is below a predefined error threshold. Adaptive reconstruction leverages the available samples on a per pixel basis, in an attempt to have an optimal trade-off between minimizing the residual noise artifacts and preserving the edges in the image. In our framework, we greedily minimize the relative Mean Squared Error (rMSE) of the rendering by iterating over sampling and reconstruction steps. Given an initial set of samples, the reconstruction step aims at producing the rendering with the lowest rMSE on a per pixel basis, and the next sampling step then further reduces the rMSE by distributing additional samples according to the magnitude of the residual rMSE of the reconstruction. This iterative approach tightly couples the adaptive sampling and adaptive reconstruction strategies, by ensuring that we only sample densely regions of the image where adaptive reconstruction cannot properly resolve the noise. In a first implementation of our framework, we demonstrate the usefulness of our greedy error minimization using a simple reconstruction scheme leveraging a filterbank of isotropic Gaussian filters. In a second implementation, we integrate a powerful edge aware filter that can adapt to the anisotropy of the image. Finally, in a third implementation, we leverage auxiliary feature buffers that encode scene information (such as surface normals, position, or texture), to improve the robustness of the reconstruction in the presence of strong noise.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Correct predictions of future blood glucose levels in individuals with Type 1 Diabetes (T1D) can be used to provide early warning of upcoming hypo-/hyperglycemic events and thus to improve the patient's safety. To increase prediction accuracy and efficiency, various approaches have been proposed which combine multiple predictors to produce superior results compared to single predictors. Three methods for model fusion are presented and comparatively assessed. Data from 23 T1D subjects under sensor-augmented pump (SAP) therapy were used in two adaptive data-driven models (an autoregressive model with output correction - cARX, and a recurrent neural network - RNN). Data fusion techniques based on i) Dempster-Shafer Evidential Theory (DST), ii) Genetic Algorithms (GA), and iii) Genetic Programming (GP) were used to merge the complimentary performances of the prediction models. The fused output is used in a warning algorithm to issue alarms of upcoming hypo-/hyperglycemic events. The fusion schemes showed improved performance with lower root mean square errors, lower time lags, and higher correlation. In the warning algorithm, median daily false alarms (DFA) of 0.25%, and 100% correct alarms (CA) were obtained for both event types. The detection times (DT) before occurrence of events were 13.0 and 12.1 min respectively for hypo-/hyperglycemic events. Compared to the cARX and RNN models, and a linear fusion of the two, the proposed fusion schemes represents a significant improvement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chrysophyte cysts are recognized as powerful proxies of cold-season temperatures. In this paper we use the relationship between chrysophyte assemblages and the number of days below 4 °C (DB4 °C) in the epilimnion of a lake in northern Poland to develop a transfer function and to reconstruct winter severity in Poland for the last millennium. DB4 °C is a climate variable related to the length of the winter. Multivariate ordination techniques were used to study the distribution of chrysophytes from sediment traps of 37 low-land lakes distributed along a variety of environmental and climatic gradients in northern Poland. Of all the environmental variables measured, stepwise variable selection and individual Redundancy analyses (RDA) identified DB4 °C as the most important variable for chrysophytes, explaining a portion of variance independent of variables related to water chemistry (conductivity, chlorides, K, sulfates), which were also important. A quantitative transfer function was created to estimate DB4 °C from sedimentary assemblages using partial least square regression (PLS). The two-component model (PLS-2) had a coefficient of determination of View the MathML sourceRcross2 = 0.58, with root mean squared error of prediction (RMSEP, based on leave-one-out) of 3.41 days. The resulting transfer function was applied to an annually-varved sediment core from Lake Żabińskie, providing a new sub-decadal quantitative reconstruction of DB4 °C with high chronological accuracy for the period AD 1000–2010. During Medieval Times (AD 1180–1440) winters were generally shorter (warmer) except for a decade with very long and severe winters around AD 1260–1270 (following the AD 1258 volcanic eruption). The 16th and 17th centuries and the beginning of the 19th century experienced very long severe winters. Comparison with other European cold-season reconstructions and atmospheric indices for this region indicates that large parts of the winter variability (reconstructed DB4 °C) is due to the interplay between the oscillations of the zonal flow controlled by the North Atlantic Oscillation (NAO) and the influence of continental anticyclonic systems (Siberian High, East Atlantic/Western Russia pattern). Differences with other European records are attributed to geographic climatological differences between Poland and Western Europe (Low Countries, Alps). Striking correspondence between the combined volcanic and solar forcing and the DB4 °C reconstruction prior to the 20th century suggests that winter climate in Poland responds mostly to natural forced variability (volcanic and solar) and the influence of unforced variability is low.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE To investigate if image registration of diffusion tensor imaging (DTI) allows omitting respiratory triggering for both transplanted and native kidneys MATERIALS AND METHODS: Nine kidney transplant recipients and eight healthy volunteers underwent renal DTI on a 3T scanner with and without respiratory triggering. DTI images were registered using a multimodal nonrigid registration algorithm. Apparent diffusion coefficient (ADC), the contribution of perfusion (FP ), and the fractional anisotropy (FA) were determined. Relative root mean square errors (RMSE) of the fitting and the standard deviations of the derived parameters within the regions of interest (SDROI ) were evaluated as quality criteria. RESULTS Registration significantly reduced RMSE in all DTI-derived parameters of triggered and nontriggered measurements in cortex and medulla of both transplanted and native kidneys (P < 0.05 for all). In addition, SDROI values were lower with registration for all 16 parameters in transplanted kidneys (14 of 16 SDROI values were significantly reduced, P < 0.04) and for 15 of 16 parameters in native kidneys (9 of 16 SDROI values were significantly reduced, P < 0.05). Comparing triggered versus nontriggered DTI in transplanted kidneys revealed no significant difference for RMSE (P > 0.14) and for SDROI (P > 0.13) of all parameters. In contrast, in native kidneys relative RMSE from triggered scans were significantly lower than those from nontriggered scans (P < 0.02), while SDROI was slightly higher in triggered compared to nontriggered measurements in 15 out of 16 comparisons (significantly for two, P < 0.05). CONCLUSION Registration improves the quality of DTI in native and transplanted kidneys. Diffusion parameters in renal allografts can be measured without respiratory triggering. In native kidneys, respiratory triggering appears advantageous. J. Magn. Reson. Imaging 2016.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Surface sediments from 68 small lakes in the Alps and 9 well-dated sediment core samples that cover a gradient of total phosphorus (TP) concentrations of 6 to 520 μg TP l-1 were studied for diatom, chrysophyte cyst, cladocera, and chironomid assemblages. Inference models for mean circulation log10 TP were developed for diatoms, chironomids, and benthic cladocera using weighted-averaging partial least squares. After screening for outliers, the final transfer functions have coefficients of determination (r2, as assessed by cross-validation, of 0.79 (diatoms), 0.68 (chironomids), and 0.49 (benthic cladocera). Planktonic cladocera and chrysophytes show very weak relationships to TP and no TP inference models were developed for these biota. Diatoms showed the best relationship with TP, whereas the other biota all have large secondary gradients, suggesting that variables other than TP have a strong influence on their composition and abundance. Comparison with other diatom – TP inference models shows that our model has high predictive power and a low root mean squared error of prediction, as assessed by cross-validation.