40 resultados para Cross-validation
Resumo:
Background: More accurate coronary heart disease (CHD) prediction, specifically in middle-aged men, is needed to reduce the burden of disease more effectively. We hypothesised that a multilocus genetic risk score could refine CHD prediction beyond classic risk scores and obtain more precise risk estimates using a prospective cohort design.
Methods: Using data from nine prospective European cohorts, including 26,221 men, we selected in a case-cohort setting 4,818 healthy men at baseline, and used Cox proportional hazards models to examine associations between CHD and risk scores based on genetic variants representing 13 genomic regions. Over follow-up (range: 5-18 years), 1,736 incident CHD events occurred. Genetic risk scores were validated in men with at least 10 years of follow-up (632 cases, 1361 non-cases). Genetic risk score 1 (GRS1) combined 11 SNPs and two haplotypes, with effect estimates from previous genome-wide association studies. GRS2 combined 11 SNPs plus 4 SNPs from the haplotypes with coefficients estimated from these prospective cohorts using 10-fold cross-validation. Scores were added to a model adjusted for classic risk factors comprising the Framingham risk score and 10-year risks were derived.
Results: Both scores improved net reclassification (NRI) over the Framingham score (7.5%, p = 0.017 for GRS1, 6.5%, p = 0.044 for GRS2) but GRS2 also improved discrimination (c-index improvement 1.11%, p = 0.048). Subgroup analysis on men aged 50-59 (436 cases, 603 non-cases) improved net reclassification for GRS1 (13.8%) and GRS2 (12.5%). Net reclassification improvement remained significant for both scores when family history of CHD was added to the baseline model for this male subgroup improving prediction of early onset CHD events.
Conclusions: Genetic risk scores add precision to risk estimates for CHD and improve prediction beyond classic risk factors, particularly for middle aged men.
Resumo:
Model selection between competing models is a key consideration in the discovery of prognostic multigene signatures. The use of appropriate statistical performance measures as well as verification of biological significance of the signatures is imperative to maximise the chance of external validation of the generated signatures. Current approaches in time-to-event studies often use only a single measure of performance in model selection, such as logrank test p-values, or dichotomise the follow-up times at some phase of the study to facilitate signature discovery. In this study we improve the prognostic signature discovery process through the application of the multivariate partial Cox model combined with the concordance index, hazard ratio of predictions, independence from available clinical covariates and biological enrichment as measures of signature performance. The proposed framework was applied to discover prognostic multigene signatures from early breast cancer data. The partial Cox model combined with the multiple performance measures were used in both guiding the selection of the optimal panel of prognostic genes and prediction of risk within cross validation without dichotomising the follow-up times at any stage. The signatures were successfully externally cross validated in independent breast cancer datasets, yielding a hazard ratio of 2.55 [1.44, 4.51] for the top ranking signature.
Resumo:
Background: The increasing prevalence of bovine tuberculosis (bTB) in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs) for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates. Methodology/Principal Findings: We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC) curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC- curve (AUC). The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls). All individuals (592 cases and 559 controls) were genotyped for 727,252 loci (Illumina Bead Chip). The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale) and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40). ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data. Conclusions/Significance: These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking bTB phenotypes. However, a larger training population will be required to improve prediction accuracies. © 2014 Tsairidou et al.
Resumo:
The use of handheld near infrared (NIR) instrumentation, as a tool for rapid analysis, has the potential to be used widely in the animal feed sector. A comparison was made between handheld NIR and benchtop instruments in terms of proximate analysis of poultry feed using off-the-shelf calibration models and including statistical analysis. Additionally, melamine adulterated soya bean products were used to develop qualitative and quantitative calibration models from the NIRS spectral data with excellent calibration models and prediction statistics obtained. With regards to the quantitative approach, the coefficients of determination (R2) were found to be 0.94-0.99 with the corresponding values for the root mean square error of calibration and prediction were found to be 0.081-0.215 % and 0.095-0.288 % respectively. In addition, cross validation was used to further validate the models with the root mean square error of cross validation found to be 0.101-0.212 %. Furthermore, by adopting a qualitative approach with the spectral data and applying Principal Component Analysis, it was possible to discriminate between adulterated and pure samples.
Resumo:
The melting of high-latitude permafrost peatlands is a major concern due to a potential positive feedback on global climate change. We examine the ecology of testate amoebae in permafrost peatlands, based on sites in Sweden (~ 200 km north of the Arctic Circle). Multivariate statistical analysis confirms that water-table depth and moisture content are the dominant controls on the distribution of testate amoebae, corroborating the results from studies in mid-latitude peatlands. We present a new testate amoeba-based water table transfer function and thoroughly test it for the effects of spatial autocorrelation, clustered sampling design and uneven sampling gradients. We find that the transfer function has good predictive power; the best-performing model is based on tolerance-downweighted weighted averaging with inverse deshrinking (performance statistics with leave-one-out cross validation: R2 = 0.87, RMSEP = 5.25 cm). The new transfer function was applied to a short core from Stordalen mire, and reveals a major shift in peatland ecohydrology coincident with the onset of the Little Ice Age (c. AD 1400). We also applied the model to an independent contemporary dataset from Stordalen and find that it outperforms predictions based on other published transfer functions. The new transfer function will enable palaeohydrological reconstruction from permafrost peatlands in Northern Europe, thereby permitting greatly improved understanding of the long-term ecohydrological dynamics of these important carbon stores as well as their responses to recent climate change.
Resumo:
Tropical peatlands represent globally important carbon sinks with a unique biodiversity and are currently threatened by climate change and human activities. It is now imperative that proxy methods are developed to understand the ecohydrological dynamics of these systems and for testing peatland development models. Testate amoebae have been used as environmental indicators in ecological and palaeoecological studies of peatlands, primarily in ombrotrophic Sphagnum-dominated peatlands in the mid- and high-latitudes. We present the first ecological analysis of testate amoebae in a tropical peatland, a nutrient-poor domed bog in western (Peruvian) Amazonia. Litter samples were collected from different hydrological microforms (hummock to pool) along a transect from the edge to the interior of the peatland. We recorded 47 taxa from 21 genera. The most common taxa are Cryptodifflugia oviformis, Euglypha rotunda type, Phryganella acropodia, Pseudodifflugia fulva type and Trinema lineare. One species found only in the southern hemisphere, Argynnia spicata, is present. Arcella spp., Centropyxis aculeata and Lesqueresia spiralis are indicators of pools containing standing water. Canonical correspondence analysis and non-metric multidimensional scaling illustrate that water table depth is a significant control on the distribution of testate amoebae, similar to the results from mid- and high-latitude peatlands. A transfer function model for water table based on weighted averaging partial least-squares (WAPLS) regression is presented and performs well under cross-validation (r 2apparent=0.76,RMSE=4.29;r2jack=0.68,RMSEP=5.18. The transfer function was applied to a 1-m peat core, and sample-specific reconstruction errors were generated using bootstrapping. The reconstruction generally suggests near-surface water tables over the last 3,000 years, with a shift to drier conditions at c. cal. 1218-1273 AD
Resumo:
The analysis of policy-based party;;competition will not make serious progress beyond the constraints of (a) the unitary actor assumption and (b) a static approach to analyzing party competition between elections until a method is available for deriving; reliable and valid time-series estimates of the policy positions of large numbers of political actors. Retrospective estimation of these positions;In past party systems will require a method for estimating policy positions from political texts.
Previous hand-coding content analysis schemes deal with policy emphasis rather than policy positions. We propose a new hand-coding scheme for policy positions, together with a new English language computer,coding scheme that is compatible with this. We apply both schemes; to party manifestos from Britain and Ireland in 1992 and 1997 and cross validate the resulting estimates with :those derived from quite independent expert surveys and with previous,manifesto analyses.
There is a high degree of cross validation between coding methods. including computer coding. This implies that it is indeed possible to use computer-coded content analysis to derive reliable and valid estimates of policy positions from political texts. This will allow vast Volumes of text to be coded, including texts generated by individuals and other internal party actors, allowing the empirical elaboration of dynamic rather than static models of party competition that move beyond the unitary actor assumption.
Resumo:
Artificial neural network (ANN) methods are used to predict forest characteristics. The data source is the Southeast Alaska (SEAK) Grid Inventory, a ground survey compiled by the USDA Forest Service at several thousand sites. The main objective of this article is to predict characteristics at unsurveyed locations between grid sites. A secondary objective is to evaluate the relative performance of different ANNs. Data from the grid sites are used to train six ANNs: multilayer perceptron, fuzzy ARTMAP, probabilistic, generalized regression, radial basis function, and learning vector quantization. A classification and regression tree method is used for comparison. Topographic variables are used to construct models: latitude and longitude coordinates, elevation, slope, and aspect. The models classify three forest characteristics: crown closure, species land cover, and tree size/structure. Models are constructed using n-fold cross-validation. Predictive accuracy is calculated using a method that accounts for the influence of misclassification as well as measuring correct classifications. The probabilistic and generalized regression networks are found to be the most accurate. The predictions of the ANN models are compared with a classification of the Tongass national forest in southeast Alaska based on the interpretation of satellite imagery and are found to be of similar accuracy.
Resumo:
We present a novel method for the light-curve characterization of Pan-STARRS1 Medium Deep Survey (PS1 MDS) extragalactic sources into stochastic variables (SVs) and burst-like (BL) transients, using multi-band image-differencing time-series data. We select detections in difference images associated with galaxy hosts using a star/galaxy catalog extracted from the deep PS1 MDS stacked images, and adopt a maximum a posteriori formulation to model their difference-flux time-series in four Pan-STARRS1 photometric bands gP1, rP1, iP1, and zP1. We use three deterministic light-curve models to fit BL transients; a Gaussian, a Gamma distribution, and an analytic supernova (SN) model, and one stochastic light-curve model, the Ornstein-Uhlenbeck process, in order to fit variability that is characteristic of active galactic nuclei (AGNs). We assess the quality of fit of the models band-wise and source-wise, using their estimated leave-out-one cross-validation likelihoods and corrected Akaike information criteria. We then apply a K-means clustering algorithm on these statistics, to determine the source classification in each band. The final source classification is derived as a combination of the individual filter classifications, resulting in two measures of classification quality, from the averages across the photometric filters of (1) the classifications determined from the closest K-means cluster centers, and (2) the square distances from the clustering centers in the K-means clustering spaces. For a verification set of AGNs and SNe, we show that SV and BL occupy distinct regions in the plane constituted by these measures. We use our clustering method to characterize 4361 extragalactic image difference detected sources, in the first 2.5 yr of the PS1 MDS, into 1529 BL, and 2262 SV, with a purity of 95.00% for AGNs, and 90.97% for SN based on our verification sets. We combine our light-curve classifications with their nuclear or off-nuclear host galaxy offsets, to define a robust photometric sample of 1233 AGNs and 812 SNe. With these two samples, we characterize their variability and host galaxy properties, and identify simple photometric priors that would enable their real-time identification in future wide-field synoptic surveys.
Resumo:
One of the major challenges in systems biology is to understand the complex responses of a biological system to external perturbations or internal signalling depending on its biological conditions. Genome-wide transcriptomic profiling of cellular systems under various chemical perturbations allows the manifestation of certain features of the chemicals through their transcriptomic expression profiles. The insights obtained may help to establish the connections between human diseases, associated genes and therapeutic drugs. The main objective of this study was to systematically analyse cellular gene expression data under various drug treatments to elucidate drug-feature specific transcriptomic signatures. We first extracted drug-related information (drug features) from the collected textual description of DrugBank entries using text-mining techniques. A novel statistical method employing orthogonal least square learning was proposed to obtain drug-feature-specific signatures by integrating gene expression with DrugBank data. To obtain robust signatures from noisy input datasets, a stringent ensemble approach was applied with the combination of three techniques: resampling, leave-one-out cross validation, and aggregation. The validation experiments showed that the proposed method has the capacity of extracting biologically meaningful drug-feature-specific gene expression signatures. It was also shown that most of signature genes are connected with common hub genes by regulatory network analysis. The common hub genes were further shown to be related to general drug metabolism by Gene Ontology analysis. Each set of genes has relatively few interactions with other sets, indicating the modular nature of each signature and its drug-feature-specificity. Based on Gene Ontology analysis, we also found that each set of drug feature (DF)-specific genes were indeed enriched in biological processes related to the drug feature. The results of these experiments demonstrated the pot- ntial of the method for predicting certain features of new drugs using their transcriptomic profiles, providing a useful methodological framework and a valuable resource for drug development and characterization.
Resumo:
In this paper, a novel and effective lip-based biometric identification approach with the Discrete Hidden Markov Model Kernel (DHMMK) is developed. Lips are described by shape features (both geometrical and sequential) on two different grid layouts: rectangular and polar. These features are then specifically modeled by a DHMMK, and learnt by a support vector machine classifier. Our experiments are carried out in a ten-fold cross validation fashion on three different datasets, GPDS-ULPGC Face Dataset, PIE Face Dataset and RaFD Face Dataset. Results show that our approach has achieved an average classification accuracy of 99.8%, 97.13%, and 98.10%, using only two training images per class, on these three datasets, respectively. Our comparative studies further show that the DHMMK achieved a 53% improvement against the baseline HMM approach. The comparative ROC curves also confirm the efficacy of the proposed lip contour based biometrics learned by DHMMK. We also show that the performance of linear and RBF SVM is comparable under the frame work of DHMMK.
Resumo:
Quantitative structure-property relationship (QSPR) models were firstly established for the hydrophobic substituent constant (πX) using the theoretical descriptors derived solely from electrostatic potentials (EPSs) at the substituent atoms. The descriptors introduced are found to be related to hydrogen-bond basicity, hydrogen-bond acidity, cavity, or dipolarity/polarizability terms in linear solvation energy relationship, which endows the models good interpretability. The predictive capabilities of the models constructed were also verified by rigorous Monte Carlo cross-validation. Then, eight groups of meta- or para- disubstituted benzenes and one group of substituted pyridines were investigated. QSPR models for individual systems were achieved with the ESP-derived descriptors. Additionally, two QSPR models were also established for Rekker's fragment constants (foct), which is a secondary-treatment quantity and reflects average contribution of the fragment to logP. It has been demonstrated that the descriptors derived from ESPs at the fragments, can be well used to quantitatively express the relationship between fragment structures and their hydrophobic properties, regardless of the attached parent structure or the valence state. Finally, the relations of Hammett σ constant and ESP quantities were explored. It implies that σ and π, which are essential in classic QSAR and represent different type of contributions to biological activities, are also complementary in interaction site.
Novel Metabolite Biomarkers of Huntington's Disease As Detected by High-Resolution Mass Spectrometry
Resumo:
Huntington's disease (HD) is a fatal autosomal-dominant neurodegenerative disorder that affects approximately 3-10 people per 100 000 in the Western world. The median age of onset is 40 years, with death typically following 15-20 years later. In this study, we biochemically profiled post-mortem frontal lobe and striatum from HD sufferers (n = 14) and compared their profiles with controls (n = 14). LC-LTQ-Orbitrap-MS detected a total of 5579 and 5880 features for frontal lobe and striatum, respectively. An ROC curve combining two spectral features from frontal lobe had an AUC value of 0.916 (0.794 to 1.000) and following statistical cross-validation had an 83% predictive accuracy for HD. Similarly, two striatum biomarkers gave an ROC AUC of 0.935 (0.806 to 1.000) and after statistical cross-validation predicted HD with 91.8% accuracy. A range of metabolite disturbances were evident including but-2-enoic acid and uric acid, which were altered in both frontal lobe and striatum. A total of seven biochemical pathways (three in frontal lobe and four in striatum) were significantly altered as a result of HD. This study highlights the utility of high-resolution metabolomics for the study of HD. Further characterization of the brain metabolome could lead to the identification of new biomarkers and novel treatment strategies for HD.
Resumo:
Urothelial cancer (UC) is highly recurrent and can progress from non-invasive (NMIUC) to a more aggressive muscle-invasive (MIUC) subtype that invades the muscle tissue layer of the bladder. We present a proof of principle study that network-based features of gene pairs can be used to improve classifier performance and the functional analysis of urothelial cancer gene expression data. In the first step of our procedure each individual sample of a UC gene expression dataset is inflated by gene pair expression ratios that are defined based on a given network structure. In the second step an elastic net feature selection procedure for network-based signatures is applied to discriminate between NMIUC and MIUC samples. We performed a repeated random subsampling cross validation in three independent datasets. The network signatures were characterized by a functional enrichment analysis and studied for the enrichment of known cancer genes. We observed that the network-based gene signatures from meta collections of proteinprotein interaction (PPI) databases such as CPDB and the PPI databases HPRD and BioGrid improved the classification performance compared to single gene based signatures. The network based signatures that were derived from PPI databases showed a prominent enrichment of cancer genes (e.g., TP53, TRIM27 and HNRNPA2Bl). We provide a novel integrative approach for large-scale gene expression analysis for the identification and development of novel diagnostical targets in bladder cancer. Further, our method allowed to link cancer gene associations to network-based expression signatures that are not observed in gene-based expression signatures.
Resumo:
In order to predict compressive strength of geopolymers prepared from alumina-silica natural products, based on the effect of Al 2 O 3 /SiO 2, Na 2 O/Al 2 O 3, Na 2 O/H 2 O, and Na/[Na+K], more than 50 pieces of data were gathered from the literature. The data was utilized to train and test a multilayer artificial neural network (ANN). Therefore a multilayer feedforward network was designed with chemical compositions of alumina silicate and alkali activators as inputs and compressive strength as output. In this study, a feedforward network with various numbers of hidden layers and neurons were tested to select the optimum network architecture. The developed three-layer neural network simulator model used the feedforward back propagation architecture, demonstrated its ability in training the given input/output patterns. The cross-validation data was used to show the validity and high prediction accuracy of the network. This leads to the optimum chemical composition and the best paste can be made from activated alumina-silica natural products using alkaline hydroxide, and alkaline silicate. The research results are in agreement with mechanism of geopolymerization.
Read More: http://ascelibrary.org/doi/abs/10.1061/(ASCE)MT.1943-5533.0000829