917 resultados para Cross-validation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many problems in early vision are ill posed. Edge detection is a typical example. This paper applies regularization techniques to the problem of edge detection. We derive an optimal filter for edge detection with a size controlled by the regularization parameter $\\ lambda $ and compare it to the Gaussian filter. A formula relating the signal-to-noise ratio to the parameter $\\lambda $ is derived from regularization analysis for the case of small values of $\\lambda$. We also discuss the method of Generalized Cross Validation for obtaining the optimal filter scale. Finally, we use our framework to explain two perceptual phenomena: coarsely quantized images becoming recognizable by either blurring or adding noise.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND:In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.RESULTS:We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.CONCLUSION:A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Spotting patterns of interest in an input signal is a very useful task in many different fields including medicine, bioinformatics, economics, speech recognition and computer vision. Example instances of this problem include spotting an object of interest in an image (e.g., a tumor), a pattern of interest in a time-varying signal (e.g., audio analysis), or an object of interest moving in a specific way (e.g., a human's body gesture). Traditional spotting methods, which are based on Dynamic Time Warping or hidden Markov models, use some variant of dynamic programming to register the pattern and the input while accounting for temporal variation between them. At the same time, those methods often suffer from several shortcomings: they may give meaningless solutions when input observations are unreliable or ambiguous, they require a high complexity search across the whole input signal, and they may give incorrect solutions if some patterns appear as smaller parts within other patterns. In this thesis, we develop a framework that addresses these three problems, and evaluate the framework's performance in spotting and recognizing hand gestures in video. The first contribution is a spatiotemporal matching algorithm that extends the dynamic programming formulation to accommodate multiple candidate hand detections in every video frame. The algorithm finds the best alignment between the gesture model and the input, and simultaneously locates the best candidate hand detection in every frame. This allows for a gesture to be recognized even when the hand location is highly ambiguous. The second contribution is a pruning method that uses model-specific classifiers to reject dynamic programming hypotheses with a poor match between the input and model. Pruning improves the efficiency of the spatiotemporal matching algorithm, and in some cases may improve the recognition accuracy. The pruning classifiers are learned from training data, and cross-validation is used to reduce the chance of overpruning. The third contribution is a subgesture reasoning process that models the fact that some gesture models can falsely match parts of other, longer gestures. By integrating subgesture reasoning the spotting algorithm can avoid the premature detection of a subgesture when the longer gesture is actually being performed. Subgesture relations between pairs of gestures are automatically learned from training data. The performance of the approach is evaluated on two challenging video datasets: hand-signed digits gestured by users wearing short sleeved shirts, in front of a cluttered background, and American Sign Language (ASL) utterances gestured by ASL native signers. The experiments demonstrate that the proposed method is more accurate and efficient than competing approaches. The proposed approach can be generally applied to alignment or search problems with multiple input observations, that use dynamic programming to find a solution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. For classifier comparison we used two performance metrics: The receiving operator characteristic (ROC) area under the curve [area under the ROC curve (AUC)] and the normalized partial area under the curve (pAUC). This study used four classifiers: Linear discriminant analysis (LDA), artificial neural network (ANN), and two variants of our decision-fusion technique, AUC-optimized (DF-A) and pAUC-optimized (DF-P) decision fusion. We applied each of these classifiers with 100-fold cross-validation to two heterogeneous breast cancer data sets: One of mass lesion features and a much more challenging one of microcalcification lesion features. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p < 0.02) and achieved AUC=0.85 +/- 0.01. The DF-P surpassed the other classifiers in terms of pAUC (p < 0.01) and reached pAUC=0.38 +/- 0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved AUC=0.94 +/- 0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC=0.57 +/- 0.07 to 0.67 +/- 0.05, p > 0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p < 0.04). In conclusion, decision fusion directly optimized clinically significant performance measures, such as AUC and pAUC, and sometimes outperformed two well-known machine-learning techniques when applied to two different breast cancer data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. RESULTS: We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. CONCLUSIONS: permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

© Institute of Mathematical Statistics, 2014.Motivated by recent findings in the field of consumer science, this paper evaluates the causal effect of debit cards on household consumption using population-based data from the Italy Survey on Household Income and Wealth (SHIW). Within the Rubin Causal Model, we focus on the estimand of population average treatment effect for the treated (PATT). We consider three existing estimators, based on regression, mixed matching and regression, propensity score weighting, and propose a new doubly-robust estimator. Semiparametric specification based on power series for the potential outcomes and the propensity score is adopted. Cross-validation is used to select the order of the power series. We conduct a simulation study to compare the performance of the estimators. The key assumptions, overlap and unconfoundedness, are systematically assessed and validated in the application. Our empirical results suggest statistically significant positive effects of debit cards on the monthly household spending in Italy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an approach for detecting local damage in large scale frame structures by utilizing regularization methods for ill-posed problems. A direct relationship between the change in stiffness caused by local damage and the measured modal data for the damaged structure is developed, based on the perturbation method for structural dynamic systems. Thus, the measured incomplete modal data can be directly adopted in damage identification without requiring model reduction techniques, and common regularization methods could be effectively employed to solve the developed equations. Damage indicators are appropriately chosen to reflect both the location and severity of local damage in individual components of frame structures such as in brace members and at beam-column joints. The Truncated Singular Value Decomposition solution incorporating the Generalized Cross Validation method is introduced to evaluate the damage indicators for the cases when realistic errors exist in modal data measurements. Results for a 16-story building model structure show that structural damage can be correctly identified at detailed level using only limited information on the measured noisy modal data for the damaged structure.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Current knowledge about the spread of pathogens in aquatic environments is scarce probably because bacteria, viruses, algae and their toxins tend to occur at low concentrations in water, making them very difficult to measure directly. The purpose of this study was the development and validation of tools to detect pathogens in freshwater systems close to an urban area. In order to evaluate anthropogenic impacts on water microbiological quality, a phylogenetic microarray was developed in the context of the EU project µAQUA to detect simultaneously numerous pathogens and applied to samples from two different locations close to an urban area located upstream and downstream of Rome in the Tiber River. Furthermore, human enteric viruses were also detected. Fifty liters of water were collected and concentrated using a hollow-fiber ultrafiltration approach. The resultant concentrate was further size-fractionated through a series of decreasing pore size filters. RNA was extracted from pooled filters and hybridized to the newly designed microarray to detect pathogenic bacteria, protozoa and toxic cyanobacteria. Diatoms as indicators of the water quality status, were also included in the microarray to evaluate water quality. The microarray results gave positive signals for bacteria, diatoms, cyanobacteria and protozoa. Cross validation of the microarray was performed using standard microbiological methods for the bacteria. The presence of oral-fecal transmitted human enteric-viruses were detected using q-PCR. Significant concentrations of Salmonella, Clostridium, Campylobacter and Staphylococcus as well as Hepatitis E Virus (HEV), noroviruses GI (NoGGI) and GII (NoGII) and human adenovirus 41 (ADV 41) were found in the Mezzocammino site, whereas lower concentrations of other bacteria and only the ADV41 virus was recovered at the Castel Giubileo site. This study revealed that the pollution level in the Tiber River was considerably higher downstream rather than upstream of Rome and the downstream location was contaminated by emerging and re-emerging pathogens.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aim: Ecological niche modelling can provide valuable insight into species' environmental preferences and aid the identification of key habitats for populations of conservation concern. Here, we integrate biologging, satellite remote-sensing and ensemble ecological niche models (EENMs) to identify predictable foraging habitats for a globally important population of the grey-headed albatross (GHA) Thalassarche chrysostoma. Location: Bird Island, South Georgia; Southern Atlantic Ocean. Methods: GPS and geolocation-immersion loggers were used to track at-sea movements and activity patterns of GHA over two breeding seasons (n = 55; brood-guard). Immersion frequency (landings per 10-min interval) was used to define foraging events. EENM combining Generalized Additive Models (GAM), MaxEnt, Random Forest (RF) and Boosted Regression Trees (BRT) identified the biophysical conditions characterizing the locations of foraging events, using time-matched oceanographic predictors (Sea Surface Temperature, SST; chlorophyll a, chl-a; thermal front frequency, TFreq; depth). Model performance was assessed through iterative cross-validation and extrapolative performance through cross-validation among years. Results: Predictable foraging habitats identified by EENM spanned neritic (<500 m), shelf break and oceanic waters, coinciding with a set of persistent biophysical conditions characterized by particular thermal ranges (3–8 °C, 12–13 °C), elevated primary productivity (chl-a > 0.5 mg m−3) and frequent manifestation of mesoscale thermal fronts. Our results confirm previous indications that GHA exploit enhanced foraging opportunities associated with frontal systems and objectively identify the APFZ as a region of high foraging habitat suitability. Moreover, at the spatial and temporal scales investigated here, the performance of multi-model ensembles was superior to that of single-algorithm models, and cross-validation among years indicated reasonable extrapolative performance. Main conclusions: EENM techniques are useful for integrating the predictions of several single-algorithm models, reducing potential bias and increasing confidence in predictions. Our analysis highlights the value of EENM for use with movement data in identifying at-sea habitats of wide-ranging marine predators, with clear implications for conservation and management.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aim: Ecological niche modelling can provide valuable insight into species' environmental preferences and aid the identification of key habitats for populations of conservation concern. Here, we integrate biologging, satellite remote-sensing and ensemble ecological niche models (EENMs) to identify predictable foraging habitats for a globally important population of the grey-headed albatross (GHA) Thalassarche chrysostoma. Location: Bird Island, South Georgia; Southern Atlantic Ocean. Methods: GPS and geolocation-immersion loggers were used to track at-sea movements and activity patterns of GHA over two breeding seasons (n = 55; brood-guard). Immersion frequency (landings per 10-min interval) was used to define foraging events. EENM combining Generalized Additive Models (GAM), MaxEnt, Random Forest (RF) and Boosted Regression Trees (BRT) identified the biophysical conditions characterizing the locations of foraging events, using time-matched oceanographic predictors (Sea Surface Temperature, SST; chlorophyll a, chl-a; thermal front frequency, TFreq; depth). Model performance was assessed through iterative cross-validation and extrapolative performance through cross-validation among years. Results: Predictable foraging habitats identified by EENM spanned neritic (<500 m), shelf break and oceanic waters, coinciding with a set of persistent biophysical conditions characterized by particular thermal ranges (3–8 °C, 12–13 °C), elevated primary productivity (chl-a > 0.5 mg m−3) and frequent manifestation of mesoscale thermal fronts. Our results confirm previous indications that GHA exploit enhanced foraging opportunities associated with frontal systems and objectively identify the APFZ as a region of high foraging habitat suitability. Moreover, at the spatial and temporal scales investigated here, the performance of multi-model ensembles was superior to that of single-algorithm models, and cross-validation among years indicated reasonable extrapolative performance. Main conclusions: EENM techniques are useful for integrating the predictions of several single-algorithm models, reducing potential bias and increasing confidence in predictions. Our analysis highlights the value of EENM for use with movement data in identifying at-sea habitats of wide-ranging marine predators, with clear implications for conservation and management.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper NOx emissions modelling for real-time operation and control of a 200 MWe coal-fired power generation plant is studied. Three model types are compared. For the first model the fundamentals governing the NOx formation mechanisms and a system identification technique are used to develop a grey-box model. Then a linear AutoRegressive model with eXogenous inputs (ARX) model and a non-linear ARX model (NARX) are built. Operation plant data is used for modelling and validation. Model cross-validation tests show that the developed grey-box model is able to consistently produce better overall long-term prediction performance than the other two models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This is the first paper that introduces a nonlinearity test for principal component models. The methodology involves the division of the data space into disjunct regions that are analysed using principal component analysis using the cross-validation principle. Several toy examples have been successfully analysed and the nonlinearity test has subsequently been applied to data from an internal combustion engine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A comparative molecular field analysis (CoMFA) of alkanoic acid 3-oxo-cyclohex-1-enyl ester and 2-acylcyclohexane-1,3-dione derivatives of 4-hydroxyphenylpyruvate dioxygenase inhibitors has been performed to determine the factors required for the activity of these compounds. The substrate's conformation abstracted from dynamic modeling of the enzyme-substrate complex was used to build the initial structures of the inhibitors. Satisfactory results were obtained after an all-space searching procedure, performing a leave-one out (LOO) cross-validation study with cross-validation q(2) and conventional r(2) values of 0.779 and 0.989, respectively. The results provide the tools for predicting the affinity of related compounds, and for guiding the design and synthesis of new HPPD ligands with predetermined affinities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ground-penetrating radar (GPR) is a rapid geophysical technique that we have used to assess four illegally buried waste locations in Northern Ireland. GPR allowed informed positioning of the less-rapid, if more accurate use of electrical resistivity imaging (ERI). In conductive waste, GPR signal loss can be used to map the areal extent of waste, allowing ERI survey lines to be positioned. In less conductive waste the geometry of the burial can be ascertained from GPR alone, allowing rapid assessment. In both circumstances, the conjunctive use of GPR and ERI is considered best practice for cross-validation of results and enhancing data interpretation.