955 resultados para prediction accuracy


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Genética e Melhoramento Animal - FCAV

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Schon seit einigen Jahrzehnten wird die Sportwissenschaft durch computergestützte Methoden in ihrer Arbeit unterstützt. Mit der stetigen Weiterentwicklung der Technik kann seit einigen Jahren auch zunehmend die Sportpraxis von deren Einsatz profitieren. Mathematische und informatische Modelle sowie Algorithmen werden zur Leistungsoptimierung sowohl im Mannschafts- als auch im Individualsport genutzt. In der vorliegenden Arbeit wird das von Prof. Perl im Jahr 2000 entwickelte Metamodell PerPot an den ausdauerorientierten Laufsport angepasst. Die Änderungen betreffen sowohl die interne Modellstruktur als auch die Art der Ermittlung der Modellparameter. Damit das Modell in der Sportpraxis eingesetzt werden kann, wurde ein Kalibrierungs-Test entwickelt, mit dem die spezifischen Modellparameter an den jeweiligen Sportler individuell angepasst werden. Mit dem angepassten Modell ist es möglich, aus gegebenen Geschwindigkeitsprofilen die korrespondierenden Herzfrequenzverläufe abzubilden. Mit dem auf den Athleten eingestellten Modell können anschliessend Simulationen von Läufen durch die Eingabe von Geschwindigkeitsprofilen durchgeführt werden. Die Simulationen können in der Praxis zur Optimierung des Trainings und der Wettkämpfe verwendet werden. Das Training kann durch die Ermittlung einer simulativ bestimmten individuellen anaeroben Schwellenherzfrequenz optimal gesteuert werden. Die statistische Auswertung der PerPot-Schwelle zeigt signifikante Übereinstimmungen mit den in der Sportpraxis üblichen invasiv bestimmten Laktatschwellen. Die Wettkämpfe können durch die Ermittlung eines optimalen Geschwindigkeitsprofils durch verschiedene simulationsbasierte Optimierungsverfahren unterstützt werden. Bei der neuesten Methode erhält der Athlet sogar im Laufe des Wettkampfs aktuelle Prognosen, die auf den Geschwindigkeits- und Herzfrequenzdaten basieren, die während des Wettkampfs gemessen werden. Die mit PerPot optimierten Wettkampfzielzeiten für die Athleten zeigen eine hohe Prognosegüte im Vergleich zu den tatsächlich erreichten Zielzeiten.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with approximately 90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Soil degradation is a major problem in the agriculturally dominated country of Tajikistan, which makes it necessary to determine and monitor the state of soils. For this purpose a soil spectral library was established as it enables the determination of soil properties with relatively low costs and effort. A total of 1465 soil samples were collected from three 10x10 km test sites in western Tajikistan. The diffuse reflectance of the samples was measured with a FieldSpec PRO FR from ASD in the spectral range from 380 to 2500 nm in laboratory. 166 samples were finally selected based on their spectral information and analysed on total C and N, organic C, pH, CaCO₃, extractable P, exchangeable Ca, Mg and K, and the fractions clay, silt and sand. Multiple linear regression was used to set up the models. Two third of the chemically analysed samples were used to calibrate the models, one third was used for hold-out validation. Very good prediction accuracy was obtained for total C (R² = 0.76, RMSEP = 4.36 g kg⁻¹), total N (R² = 0.83, RMSEP = 0.30 g kg⁻¹) and organic C (R² = 0.81, RMSEP = 3.30 g kg⁻¹), good accuracy for pH (R² = 0.61, RMSEP = 0.157) and CaCO3(R² = 0.72, RMSEP = 4.63 %). No models could be developed for extractable P, exchangeable Ca, Mg and K, and the fractions clay, silt and sand. It can be concluded that the spectral library approach has a high potential to substitute standard laboratory methods where rapid and inexpensive analysis is required.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study investigates predictors of outcome in a secondary analysis of dropout and completer data from a randomized controlled effectiveness trial comparing CBTp to a wait-list group (Lincoln et al., 2012). Eighty patients with DSM-IV psychotic disorders seeking outpatient treatment were included. Predictors were assessed at baseline. Symptom outcome was assessed at post-treatment and at one-year follow-up. The predictor x group interactions indicate that a longer duration of disorder predicted less improvement in negative symptoms in the CBTp but not in the wait-list group whereas jumping-to-conclusions was associated with poorer outcome only in the wait-list group. There were no CBTp specific predictors of improvement in positive symptoms. However, in the combined sample (immediate CBTp+the delayed CBTp group) baseline variables predicted significant amounts of positive and negative symptom variance at post-therapy and one-year follow-up after controlling for pre-treatment symptoms. Lack of insight and low social functioning were the main predictors of drop-out, contributing to a prediction accuracy of 87%. The findings indicate that higher baseline symptom severity, poorer functioning, neurocognitive deficits, reasoning biases and comorbidity pose no barrier to improvement during CBTp. However, in line with previous predictor-research, the findings imply that patients need to receive treatment earlier.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present an independent calibration model for the determination of biogenic silica (BSi) in sediments, developed from analysis of synthetic sediment mixtures and application of Fourier transform infrared spectroscopy (FTIRS) and partial least squares regression (PLSR) modeling. In contrast to current FTIRS applications for quantifying BSi, this new calibration is independent from conventional wet-chemical techniques and their associated measurement uncertainties. This approach also removes the need for developing internal calibrations between the two methods for individual sediments records. For the independent calibration, we produced six series of different synthetic sediment mixtures using two purified diatom extracts, with one extract mixed with quartz sand, calcite, 60/40 quartz/calcite and two different natural sediments, and a second extract mixed with one of the natural sediments. A total of 306 samples—51 samples per series—yielded BSi contents ranging from 0 to 100 %. The resulting PLSR calibration model between the FTIR spectral information and the defined BSi concentration of the synthetic sediment mixtures exhibits a strong cross-validated correlation ( R2cv = 0.97) and a low root-mean square error of cross-validation (RMSECV = 4.7 %). Application of the independent calibration to natural lacustrine and marine sediments yields robust BSi reconstructions. At present, the synthetic mixtures do not include the variation in organic matter that occurs in natural samples, which may explain the somewhat lower prediction accuracy of the calibration model for organic-rich samples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Correct predictions of future blood glucose levels in individuals with Type 1 Diabetes (T1D) can be used to provide early warning of upcoming hypo-/hyperglycemic events and thus to improve the patient's safety. To increase prediction accuracy and efficiency, various approaches have been proposed which combine multiple predictors to produce superior results compared to single predictors. Three methods for model fusion are presented and comparatively assessed. Data from 23 T1D subjects under sensor-augmented pump (SAP) therapy were used in two adaptive data-driven models (an autoregressive model with output correction - cARX, and a recurrent neural network - RNN). Data fusion techniques based on i) Dempster-Shafer Evidential Theory (DST), ii) Genetic Algorithms (GA), and iii) Genetic Programming (GP) were used to merge the complimentary performances of the prediction models. The fused output is used in a warning algorithm to issue alarms of upcoming hypo-/hyperglycemic events. The fusion schemes showed improved performance with lower root mean square errors, lower time lags, and higher correlation. In the warning algorithm, median daily false alarms (DFA) of 0.25%, and 100% correct alarms (CA) were obtained for both event types. The detection times (DT) before occurrence of events were 13.0 and 12.1 min respectively for hypo-/hyperglycemic events. Compared to the cARX and RNN models, and a linear fusion of the two, the proposed fusion schemes represents a significant improvement.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A robust and reliable risk assessment procedure for hydrologic hazards deserves particular attention to the role of transported woody material during flash floods or debris flows. At present, woody material transport phenomena are not systematically considered within the procedures for the elaboration of hazard maps. The consequence is a risk of losing prediction accuracy and of underestimating hazard impacts. Transported woody material frequently interferes with the sediment regulation capacity of open check dams and moreover, when obstruction phenomena at critical crosssections of the stream occur, inundations can be triggered. The paper presents a procedure for the determination of the relative propensity of mountain streams to the entrainment and delivery of recruited woody material on the basis of empirical indicators. The procedure provided the basis for the elaboration of a hazard index map for all torrent catchments of the Autonomous Province of Bolzano/Bozen. The plausibility of the results has been thoroughly checked by a backward oriented analysis on natural hazard events, documented since 1998 at the Department of Hydraulic Engineering of the aforementioned Alpine Province. The procedure provides hints for the consideration of the effects, induced by woody material transport, during the elaboration of hazard zone maps.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Secchi depth is a measure of water transparency. In the Baltic Sea region, Secchi depth maps are used to assess eutrophication and as input for habitat models. Due to their spatial and temporal coverage, satellite data would be the most suitable data source for such maps. But the Baltic Sea's optical properties are so different from the open ocean that globally calibrated standard models suffer from large errors. Regional predictive models that take the Baltic Sea's special optical properties into account are thus needed. This paper tests how accurately generalized linear models (GLMs) and generalized additive models (GAMs) with MODIS/Aqua and auxiliary data as inputs can predict Secchi depth at a regional scale. It uses cross-validation to test the prediction accuracy of hundreds of GAMs and GLMs with up to 5 input variables. A GAM with 3 input variables (chlorophyll a, remote sensing reflectance at 678 nm, and long-term mean salinity) made the most accurate predictions. Tested against field observations not used for model selection and calibration, the best model's mean absolute error (MAE) for daily predictions was 1.07 m (22%), more than 50% lower than for other publicly available Baltic Sea Secchi depth maps. The MAE for predicting monthly averages was 0.86 m (15%). Thus, the proposed model selection process was able to find a regional model with good prediction accuracy. It could be useful to find predictive models for environmental variables other than Secchi depth, using data from other satellite sensors, and for other regions where non-standard remote sensing models are needed for prediction and mapping. Annual and monthly mean Secchi depth maps for 2003-2012 come with this paper as Supplementary materials.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

So far, the majority of reports on on-line measurement considered soil properties with direct spectral responses in near infrared spectroscopy (NIRS). This work reports on the results of on-line measurement of soil properties with indirect spectral responses, e.g. pH, cation exchange capacity (CEC), exchangeable calcium (Caex) and exchangeable magnesium (Mgex) in one field in Bedfordshire in the UK. The on-line sensor consisted of a subsoiler coupled with an AgroSpec mobile, fibre type, visible and near infrared (vis–NIR) spectrophotometer (tec5 Technology for Spectroscopy, Germany), with a measurement range 305–2200 nm to acquire soil spectra in diffuse reflectance mode. General calibration models for the studied soil properties were developed with a partial least squares regression (PLSR) with one-leave-out cross validation, using spectra measured under non-mobile laboratory conditions of 160 soil samples collected from different fields in four farms in Europe, namely, Czech Republic, Denmark, Netherland and UK. A group of 25 samples independent from the calibration set was used as independent validation set. Higher accuracy was obtained for laboratory scanning as compared to on-line scanning of the 25 independent samples. The prediction accuracy for the laboratory and on-line measurements was classified as excellent/very good for pH (RPD = 2.69 and 2.14 and r2 = 0.86 and 0.78, respectively), and moderately good for CEC (RPD = 1.77 and 1.61 and r2 = 0.68 and 0.62, respectively) and Mgex (RPD = 1.72 and 1.49 and r2 = 0.66 and 0.67, respectively). For Caex, very good accuracy was calculated for laboratory method (RPD = 2.19 and r2 = 0.86), as compared to the poor accuracy reported for the on-line method (RPD = 1.30 and r2 = 0.61). The ability of collecting large number of data points per field area (about 12,800 point per 21 ha) and the simultaneous analysis of several soil properties without direct spectral response in the NIR range at relatively high operational speed and appreciable accuracy, encourage the recommendation of the on-line measurement system for site specific fertilisation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the current uncertain context that affects both the world economy and the energy sector, with the rapid increase in the prices of oil and gas and the very unstable political situation that affects some of the largest raw materials’ producers, there is a need for developing efficient and powerful quantitative tools that allow to model and forecast fossil fuel prices, CO2 emission allowances prices as well as electricity prices. This will improve decision making for all the agents involved in energy issues. Although there are papers focused on modelling fossil fuel prices, CO2 prices and electricity prices, the literature is scarce on attempts to consider all of them together. This paper focuses on both building a multivariate model for the aforementioned prices and comparing its results with those of univariate ones, in terms of prediction accuracy (univariate and multivariate models are compared for a large span of days, all in the first 4 months in 2011) as well as extracting common features in the volatilities of the prices of all these relevant magnitudes. The common features in volatility are extracted by means of a conditionally heteroskedastic dynamic factor model which allows to solve the curse of dimensionality problem that commonly arises when estimating multivariate GARCH models. Additionally, the common volatility factors obtained are useful for improving the forecasting intervals and have a nice economical interpretation. Besides, the results obtained and methodology proposed can be useful as a starting point for risk management or portfolio optimization under uncertainty in the current context of energy markets.