884 resultados para Prediction error method


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic anticipation is defined as a decrease in age of onset or increase in severity as the disorder is transmitted through subsequent generations. Anticipation has been noted in the literature for over a century. Recently, anticipation in several diseases including Huntington's Disease, Myotonic Dystrophy and Fragile X Syndrome were shown to be caused by expansion of triplet repeats. Anticipation effects have also been observed in numerous mental disorders (e.g. Schizophrenia, Bipolar Disorder), cancers (Li-Fraumeni Syndrome, Leukemia) and other complex diseases. ^ Several statistical methods have been applied to determine whether anticipation is a true phenomenon in a particular disorder, including standard statistical tests and newly developed affected parent/affected child pair methods. These methods have been shown to be inappropriate for assessing anticipation for a variety of reasons, including familial correlation and low power. Therefore, we have developed family-based likelihood modeling approaches to model the underlying transmission of the disease gene and penetrance function and hence detect anticipation. These methods can be applied in extended families, thus improving the power to detect anticipation compared with existing methods based only upon parents and children. The first method we have proposed is based on the regressive logistic hazard model. This approach models anticipation by a generational covariate. The second method allows alleles to mutate as they are transmitted from parents to offspring and is appropriate for modeling the known triplet repeat diseases in which the disease alleles can become more deleterious as they are transmitted across generations. ^ To evaluate the new methods, we performed extensive simulation studies for data simulated under different conditions to evaluate the effectiveness of the algorithms to detect genetic anticipation. Results from analysis by the first method yielded empirical power greater than 87% based on the 5% type I error critical value identified in each simulation depending on the method of data generation and current age criteria. Analysis by the second method was not possible due to the current formulation of the software. The application of this method to Huntington's Disease and Li-Fraumeni Syndrome data sets revealed evidence for a generation effect in both cases. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been hypothesized that results from the short term bioassays will ultimately provide information that will be useful for human health hazard assessment. Although toxicologic test systems have become increasingly refined, to date, no investigator has been able to provide qualitative or quantitative methods which would support the use of short term tests in this capacity.^ Historically, the validity of the short term tests have been assessed using the framework of the epidemiologic/medical screens. In this context, the results of the carcinogen (long term) bioassay is generally used as the standard. However, this approach is widely recognized as being biased and, because it employs qualitative data, cannot be used in the setting of priorities. In contrast, the goal of this research was to address the problem of evaluating the utility of the short term tests for hazard assessment using an alternative method of investigation.^ Chemical carcinogens were selected from the list of carcinogens published by the International Agency for Research on Carcinogens (IARC). Tumorigenicity and mutagenicity data on fifty-two chemicals were obtained from the Registry of Toxic Effects of Chemical Substances (RTECS) and were analyzed using a relative potency approach. The relative potency framework allows for the standardization of data "relative" to a reference compound. To avoid any bias associated with the choice of the reference compound, fourteen different compounds were used.^ The data were evaluated in a format which allowed for a comparison of the ranking of the mutagenic relative potencies of the compounds (as estimated using short term data) vs. the ranking of the tumorigenic relative potencies (as estimated from the chronic bioassays). The results were statistically significant (p $<$.05) for data standardized to thirteen of the fourteen reference compounds. Although this was a preliminary investigation, it offers evidence that the short term test systems may be of utility in ranking the hazards represented by chemicals which may be human carcinogens. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the biomedical studies, the general data structures have been the matched (paired) and unmatched designs. Recently, many researchers are interested in Meta-Analysis to obtain a better understanding from several clinical data of a medical treatment. The hybrid design, which is combined two data structures, may create the fundamental question for statistical methods and the challenges for statistical inferences. The applied methods are depending on the underlying distribution. If the outcomes are normally distributed, we would use the classic paired and two independent sample T-tests on the matched and unmatched cases. If not, we can apply Wilcoxon signed rank and rank sum test on each case. ^ To assess an overall treatment effect on a hybrid design, we can apply the inverse variance weight method used in Meta-Analysis. On the nonparametric case, we can use a test statistic which is combined on two Wilcoxon test statistics. However, these two test statistics are not in same scale. We propose the Hybrid Test Statistic based on the Hodges-Lehmann estimates of the treatment effects, which are medians in the same scale.^ To compare the proposed method, we use the classic meta-analysis T-test statistic on the combined the estimates of the treatment effects from two T-test statistics. Theoretically, the efficiency of two unbiased estimators of a parameter is the ratio of their variances. With the concept of Asymptotic Relative Efficiency (ARE) developed by Pitman, we show ARE of the hybrid test statistic relative to classic meta-analysis T-test statistic using the Hodges-Lemann estimators associated with two test statistics.^ From several simulation studies, we calculate the empirical type I error rate and power of the test statistics. The proposed statistic would provide effective tool to evaluate and understand the treatment effect in various public health studies as well as clinical trials.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geostrophic surface velocities can be derived from the gradients of the mean dynamic topography-the difference between the mean sea surface and the geoid. Therefore, independently observed mean dynamic topography data are valuable input parameters and constraints for ocean circulation models. For a successful fit to observational dynamic topography data, not only the mean dynamic topography on the particular ocean model grid is required, but also information about its inverse covariance matrix. The calculation of the mean dynamic topography from satellite-based gravity field models and altimetric sea surface height measurements, however, is not straightforward. For this purpose, we previously developed an integrated approach to combining these two different observation groups in a consistent way without using the common filter approaches (Becker et al. in J Geodyn 59(60):99-110, 2012, doi:10.1016/j.jog.2011.07.0069; Becker in Konsistente Kombination von Schwerefeld, Altimetrie und hydrographischen Daten zur Modellierung der dynamischen Ozeantopographie, 2012, http://nbn-resolving.de/nbn:de:hbz:5n-29199). Within this combination method, the full spectral range of the observations is considered. Further, it allows the direct determination of the normal equations (i.e., the inverse of the error covariance matrix) of the mean dynamic topography on arbitrary grids, which is one of the requirements for ocean data assimilation. In this paper, we report progress through selection and improved processing of altimetric data sets. We focus on the preprocessing steps of along-track altimetry data from Jason-1 and Envisat to obtain a mean sea surface profile. During this procedure, a rigorous variance propagation is accomplished, so that, for the first time, the full covariance matrix of the mean sea surface is available. The combination of the mean profile and a combined GRACE/GOCE gravity field model yields a mean dynamic topography model for the North Atlantic Ocean that is characterized by a defined set of assumptions. We show that including the geodetically derived mean dynamic topography with the full error structure in a 3D stationary inverse ocean model improves modeled oceanographic features over previous estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Uncertainty information for global leaf area index (LAI) products is important for global modeling studies but usually difficult to systematically obtain at a global scale. Here, we present a new method that cross-validates existing global LAI products and produces consistent uncertainty information. The method is based on a triple collocation error model (TCEM) that assumes errors among LAI products are not correlated. Global monthly absolute and relative uncertainties, in 0.05° spatial resolutions, were generated for MODIS, CYCLOPES, and GLOBCARBON LAI products, with reasonable agreement in terms of spatial patterns and biome types. CYCLOPES shows the lowest absolute and relative uncertainties, followed by GLOBCARBON and MODIS. Grasses, crops, shrubs, and savannas usually have lower uncertainties than forests in association with the relatively larger forest LAI. With their densely vegetated canopies, tropical regions exhibit the highest absolute uncertainties but the lowest relative uncertainties, the latter of which tend to increase with higher latitudes. The estimated uncertainties of CYCLOPES generally meet the quality requirements (± 0.5) proposed by the Global Climate Observing System (GCOS), whereas for MODIS and GLOBCARBON only non-forest biome types have met the requirement. Nevertheless, none of the products seems to be within a relative uncertainty requirements of 20%. Further independent validation and comparative studies are expected to provide a fair assessment of uncertainties derived from TCEM. Overall, the proposed TCEM is straightforward and could be automated for the systematic processing of real time remote sensing observations to provide theoretical uncertainty information for a wider range of land products.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A quick new method is described for the quantification of absolute nannofossil proportions in deep-sea sediments. This method (SMS) is the combination of Spiking a sample with Microbeads and Spraying it on a cover slide. It is suitable for scanning electron microscope (SEM) analyses and for light microscope (LM) analyses. Repeated preparation and counting of the same sample (30 times) revealed a standard deviation of 10.5%. The application of tracer microbeads with different diameters and densities revealed no statistically significant differences between counts. The SMS-method yielded coccolith numbers that are statistically not significantly different from values obtained from the filtration-method. However, coccolith counts obtained by the random settling method are three times higher than the values obtained by the SMS- and the filtration-method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

State-of-the-art process-based models have shown to be applicable to the simulation and prediction of coastal morphodynamics. On annual to decadal temporal scales, these models may show limitations in reproducing complex natural morphological evolution patterns, such as the movement of bars and tidal channels, e.g. the observed decadal migration of the Medem Channel in the Elbe Estuary, German Bight. Here a morphodynamic model is shown to simulate the hydrodynamics and sediment budgets of the domain to some extent, but fails to adequately reproduce the pronounced channel migration, due to the insufficient implementation of bank erosion processes. In order to allow for long-term simulations of the domain, a nudging method has been introduced to update the model-predicted bathymetries with observations. The model-predicted bathymetry is nudged towards true states in annual time steps. Sensitivity analysis of a user-defined correlation length scale, for the definition of the background error covariance matrix during the nudging procedure, suggests that the optimal error correlation length is similar to the grid cell size, here 80-90 m. Additionally, spatially heterogeneous correlation lengths produce more realistic channel depths than do spatially homogeneous correlation lengths. Consecutive application of the nudging method compensates for the (stand-alone) model prediction errors and corrects the channel migration pattern, with a Brier skill score of 0.78. The proposed nudging method in this study serves as an analytical approach to update model predictions towards a predefined 'true' state for the spatiotemporal interpolation of incomplete morphological data in long-term simulations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Three methodologies to assess As bioaccessibility were evaluated using playgroundsoil collected from 16 playgrounds in Madrid, Spain: two (Simplified Bioaccessibility Extraction Test: SBET, and hydrochloric acid-extraction: HCl) assess gastric-only bioaccessibility and the third (Physiologically Based Extraction Test: PBET) evaluates mouth–gastric–intestinal bioaccessibility. Aqua regia-extractable (pseudo total) As contents, which are routinely employed in riskassessments, were used as the reference to establish the following percentages of bioaccessibility: SBET – 63.1; HCl – 51.8; PBET – 41.6, the highest values associated with the gastric-only extractions. For Madridplaygroundsoils – characterised by a very uniform, weakly alkaline pH, and low Fe oxide and organic matter contents – the statistical analysis of the results indicates that, in contrast with other studies, the highest percentage of As in the samples was bound to carbonates and/or present as calcium arsenate. As opposed to the As bound to Fe oxides, this As is readily released in the gastric environment as the carbonate matrix is decomposed and calcium arsenate is dissolved, but some of it is subsequently sequestered in unavailable forms as the pH is raised to 5.5 to mimic intestinal conditions. The HCl extraction can be used as a simple and reliable (i.e. low residual standard error) proxy for the more expensive, time consuming, and error-prone PBET methodology. The HCl method would essentially halve the estimate of carcinogenic risk for children playing in Madridplaygroundsoils, providing a more representative value of associated risk than the pseudo-total concentrations used at present

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Determining as accurate as possible spent nuclear fuel isotopic content is gaining importance due to its safety and economic implications. Since nowadays higher burn ups are achievable through increasing initial enrichments, more efficient burn up strategies within the reactor cores and the extension of the irradiation periods, establishing and improving computation methodologies is mandatory in order to carry out reliable criticality and isotopic prediction calculations. Several codes (WIMSD5, SERPENT 1.1.7, SCALE 6.0, MONTEBURNS 2.0 and MCNP-ACAB) and methodologies are tested here and compared to consolidated benchmarks (OECD/NEA pin cell moderated with light water) with the purpose of validating them and reviewing the state of the isotopic prediction capabilities. These preliminary comparisons will suggest what can be generally expected of these codes when applied to real problems. In the present paper, SCALE 6.0 and MONTEBURNS 2.0 are used to model the same reported geometries, material compositions and burn up history of the Spanish Van de llós II reactor cycles 7-11 and to reproduce measured isotopies after irradiation and decay times. We analyze comparisons between measurements and each code results for several grades of geometrical modelization detail, using different libraries and cross-section treatment methodologies. The power and flux normalization method implemented in MONTEBURNS 2.0 is discussed and a new normalization strategy is developed to deal with the selected and similar problems, further options are included to reproduce temperature distributions of the materials within the fuel assemblies and it is introduced a new code to automate series of simulations and manage material information between them. In order to have a realistic confidence level in the prediction of spent fuel isotopic content, we have estimated uncertainties using our MCNP-ACAB system. This depletion code, which combines the neutron transport code MCNP and the inventory code ACAB, propagates the uncertainties in the nuclide inventory assessing the potential impact of uncertainties in the basic nuclear data: cross-section, decay data and fission yields

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new and effective method for reduction of truncation errors in partial spherical near-field (SNF) measurements is proposed. The method is useful when measuring electrically large antennas, where the measurement time with the classical SNF technique is prohibitively long and an acquisition over the whole spherical surface is not practical. Therefore, to reduce the data acquisition time, partial sphere measurement is usually made, taking samples over a portion of the spherical surface in the direction of the main beam. But in this case, the radiation pattern is not known outside the measured angular sector as well as a truncation error is present in the calculated far-field pattern within this sector. The method is based on the Gerchberg-Papoulis algorithm used to extrapolate functions and it is able to extend the valid region of the calculated far-field pattern up to the whole forward hemisphere. To verify the effectiveness of the method, several examples are presented using both simulated and measured truncated near-field data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Salamanca has been considered among the most polluted cities in Mexico. The vehicular park, the industry and the emissions produced by agriculture, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Particulate Matter less than 10 μg/m3 in diameter (PM10). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and air pollutant concentrations of PM10. Before the prediction, Fuzzy c-Means clustering algorithm have been implemented in order to find relationship among pollutant and meteorological variables. These relationship help us to get additional information that will be used for predicting. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of PM10 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results shown that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours