978 resultados para Prediction algorithms
Resumo:
A new parameter-estimation algorithm, which minimises the cross-validated prediction error for linear-in-the-parameter models, is proposed, based on stacked regression and an evolutionary algorithm. It is initially shown that cross-validation is very important for prediction in linear-in-the-parameter models using a criterion called the mean dispersion error (MDE). Stacked regression, which can be regarded as a sophisticated type of cross-validation, is then introduced based on an evolutionary algorithm, to produce a new parameter-estimation algorithm, which preserves the parsimony of a concise model structure that is determined using the forward orthogonal least-squares (OLS) algorithm. The PRESS prediction errors are used for cross-validation, and the sunspot and Canadian lynx time series are used to demonstrate the new algorithms.
Resumo:
We describe a model-data fusion (MDF) inter-comparison project (REFLEX), which compared various algorithms for estimating carbon (C) model parameters consistent with both measured carbon fluxes and states and a simple C model. Participants were provided with the model and with both synthetic net ecosystem exchange (NEE) of CO2 and leaf area index (LAI) data, generated from the model with added noise, and observed NEE and LAI data from two eddy covariance sites. Participants endeavoured to estimate model parameters and states consistent with the model for all cases over the two years for which data were provided, and generate predictions for one additional year without observations. Nine participants contributed results using Metropolis algorithms, Kalman filters and a genetic algorithm. For the synthetic data case, parameter estimates compared well with the true values. The results of the analyses indicated that parameters linked directly to gross primary production (GPP) and ecosystem respiration, such as those related to foliage allocation and turnover, or temperature sensitivity of heterotrophic respiration, were best constrained and characterised. Poorly estimated parameters were those related to the allocation to and turnover of fine root/wood pools. Estimates of confidence intervals varied among algorithms, but several algorithms successfully located the true values of annual fluxes from synthetic experiments within relatively narrow 90% confidence intervals, achieving >80% success rate and mean NEE confidence intervals <110 gC m−2 year−1 for the synthetic case. Annual C flux estimates generated by participants generally agreed with gap-filling approaches using half-hourly data. The estimation of ecosystem respiration and GPP through MDF agreed well with outputs from partitioning studies using half-hourly data. Confidence limits on annual NEE increased by an average of 88% in the prediction year compared to the previous year, when data were available. Confidence intervals on annual NEE increased by 30% when observed data were used instead of synthetic data, reflecting and quantifying the addition of model error. Finally, our analyses indicated that incorporating additional constraints, using data on C pools (wood, soil and fine roots) would help to reduce uncertainties for model parameters poorly served by eddy covariance data.
Resumo:
Data assimilation algorithms are a crucial part of operational systems in numerical weather prediction, hydrology and climate science, but are also important for dynamical reconstruction in medical applications and quality control for manufacturing processes. Usually, a variety of diverse measurement data are employed to determine the state of the atmosphere or to a wider system including land and oceans. Modern data assimilation systems use more and more remote sensing data, in particular radiances measured by satellites, radar data and integrated water vapor measurements via GPS/GNSS signals. The inversion of some of these measurements are ill-posed in the classical sense, i.e. the inverse of the operator H which maps the state onto the data is unbounded. In this case, the use of such data can lead to significant instabilities of data assimilation algorithms. The goal of this work is to provide a rigorous mathematical analysis of the instability of well-known data assimilation methods. Here, we will restrict our attention to particular linear systems, in which the instability can be explicitly analyzed. We investigate the three-dimensional variational assimilation and four-dimensional variational assimilation. A theory for the instability is developed using the classical theory of ill-posed problems in a Banach space framework. Further, we demonstrate by numerical examples that instabilities can and will occur, including an example from dynamic magnetic tomography.
Resumo:
Most of the operational Sea Surface Temperature (SST) products derived from satellite infrared radiometry use multi-spectral algorithms. They show, in general, reasonable performances with root mean square (RMS) residuals around 0.5 K when validated against buoy measurements, but have limitations, particularly a component of the retrieval error that relates to such algorithms' limited ability to cope with the full variability of atmospheric absorption and emission. We propose to use forecast atmospheric profiles and a radiative transfer model to simulate the algorithmic errors of multi-spectral algorithms. In the practical case of SST derived from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard Meteosat Second Generation (MSG), we demonstrate that simulated algorithmic errors do explain a significant component of the actual errors observed for the non linear (NL) split window algorithm in operational use at the Centre de Météorologie Spatiale (CMS). The simulated errors, used as correction terms, reduce significantly the regional biases of the NL algorithm as well as the standard deviation of the differences with drifting buoy measurements. The availability of atmospheric profiles associated with observed satellite-buoy differences allows us to analyze the origins of the main algorithmic errors observed in the SEVIRI field of view: a negative bias in the inter-tropical zone, and a mid-latitude positive bias. We demonstrate how these errors are explained by the sensitivity of observed brightness temperatures to the vertical distribution of water vapour, propagated through the SST retrieval algorithm.
Resumo:
Historic geomagnetic activity observations have been used to reveal centennial variations in the open solar flux and the near-Earth heliospheric conditions (the interplanetary magnetic field and the solar wind speed). The various methods are in very good agreement for the past 135 years when there were sufficient reliable magnetic observatories in operation to eliminate problems due to site-specific errors and calibration drifts. This review underlines the physical principles that allow these reconstructions to be made, as well as the details of the various algorithms employed and the results obtained. Discussion is included of: the importance of the averaging timescale; the key differences between “range” and “interdiurnal variability” geomagnetic data; the need to distinguish source field sector structure from heliospherically-imposed field structure; the importance of ensuring that regressions used are statistically robust; and uncertainty analysis. The reconstructions are exceedingly useful as they provide calibration between the in-situ spacecraft measurements from the past five decades and the millennial records of heliospheric behaviour deduced from measured abundances of cosmogenic radionuclides found in terrestrial reservoirs. Continuity of open solar flux, using sunspot number to quantify the emergence rate, is the basis of a number of models that have been very successful in reproducing the variation derived from geomagnetic activity. These models allow us to extend the reconstructions back to before the development of the magnetometer and to cover the Maunder minimum. Allied to the radionuclide data, the models are revealing much about how the Sun and heliosphere behaved outside of grand solar maxima and are providing a means of predicting how solar activity is likely to evolve now that the recent grand maximum (that had prevailed throughout the space age) has come to an end.
Resumo:
In this work, genetic algorithms concepts along with a rotamer library for proteins side chains and implicit solvation potential are used to optimize the tertiary structure of peptides. We starting from the known PDB structure of its backbone which is kept fixed while the side chains allowed adopting the conformations present in the rotamer library. It was used rotamer library independent of backbone and a implicit solvation potential. The structure of Mastoporan-X was predicted using several force fields with a growing complexity; we started it with a field where the only present interaction was Lennard-Jones. We added the Coulombian term and we considered the solvation effects through a term proportional to the solvent accessible area. This paper present good and interesting results obtained using the potential with solvation term and rotamer library. Hence, the algorithm (called YODA) presented here can be a good tool to the prediction problem. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
Predictability is related to the uncertainty in the outcome of future events during the evolution of the state of a system. The cluster weighted modeling (CWM) is interpreted as a tool to detect such an uncertainty and used it in spatially distributed systems. As such, the simple prediction algorithm in conjunction with the CWM forms a powerful set of methods to relate predictability and dimension.
Resumo:
Este artigo apresenta uma aplicação do método para determinação espectrofotométrica simultânea dos íons divalentes de cobre, manganês e zinco à análise de medicamento polivitamínico/polimineral. O método usa 4-(2-piridilazo) resorcinol (PAR), calibração multivariada e técnicas de seleção de variáveis e foi otimizado o empregando-se o algoritmo das projeções sucessivas (APS) e o algoritmo genético (AG), para escolha dos comprimentos de onda mais informativos para a análise. Com essas técnicas, foi possível construir modelos de calibração por regressão linear múltipla (RLM-APS e RLM-AG). Os resultados obtidos foram comparados com modelos de regressão em componentes principais (PCR) e nos mínimos quadrados parciais (PLS). Demonstra-se a partir do erro médio quadrático de previsão (RMSEP) que os modelos apresentam desempenhos semelhantes ao prever as concentrações dos três analitos no medicamento. Todavia os modelos RLM são mais simples pois requerem um número muito menor de comprimentos de onda e são mais fáceis de interpretar que os baseados em variáveis latentes.
Resumo:
This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Resumo:
Due to the growing interest in social networks, link prediction has received significant attention. Link prediction is mostly based on graph-based features, with some recent approaches focusing on domain semantics. We propose algorithms for link prediction that use a probabilistic ontology to enhance the analysis of the domain and the unavoidable uncertainty in the task (the ontology is specified in the probabilistic description logic crALC). The scalability of the approach is investigated, through a combination of semantic assumptions and graph-based features. We evaluate empirically our proposal, and compare it with standard solutions in the literature.
Resumo:
This work is structured as follows: In Section 1 we discuss the clinical problem of heart failure. In particular, we present the phenomenon known as ventricular mechanical dyssynchrony: its impact on cardiac function, the therapy for its treatment and the methods for its quantification. Specifically, we describe the conductance catheter and its use for the measurement of dyssynchrony. At the end of the Section 1, we propose a new set of indexes to quantify the dyssynchrony that are studied and validated thereafter. In Section 2 we describe the studies carried out in this work: we report the experimental protocols, we present and discuss the results obtained. Finally, we report the overall conclusions drawn from this work and we try to envisage future works and possible clinical applications of our results. Ancillary studies that were carried out during this work mainly to investigate several aspects of cardiac resynchronization therapy (CRT) are mentioned in Appendix. -------- Ventricular mechanical dyssynchrony plays a regulating role already in normal physiology but is especially important in pathological conditions, such as hypertrophy, ischemia, infarction, or heart failure (Chapter 1,2.). Several prospective randomized controlled trials supported the clinical efficacy and safety of cardiac resynchronization therapy (CRT) in patients with moderate or severe heart failure and ventricular dyssynchrony. CRT resynchronizes ventricular contraction by simultaneous pacing of both left and right ventricle (biventricular pacing) (Chapter 1.). Currently, the conductance catheter method has been used extensively to assess global systolic and diastolic ventricular function and, more recently, the ability of this instrument to pick-up multiple segmental volume signals has been used to quantify mechanical ventricular dyssynchrony. Specifically, novel indexes based on volume signals acquired with the conductance catheter were introduced to quantify dyssynchrony (Chapter 3,4.). Present work was aimed to describe the characteristics of the conductancevolume signals, to investigate the performance of the indexes of ventricular dyssynchrony described in literature and to introduce and validate improved dyssynchrony indexes. Morevoer, using the conductance catheter method and the new indexes, the clinical problem of the ventricular pacing site optimization was addressed and the measurement protocol to adopt for hemodynamic tests on cardiac pacing was investigated. In accordance to the aims of the work, in addition to the classical time-domain parameters, a new set of indexes has been extracted, based on coherent averaging procedure and on spectral and cross-spectral analysis (Chapter 4.). Our analyses were carried out on patients with indications for electrophysiologic study or device implantation (Chapter 5.). For the first time, besides patients with heart failure, indexes of mechanical dyssynchrony based on conductance catheter were extracted and studied in a population of patients with preserved ventricular function, providing information on the normal range of such a kind of values. By performing a frequency domain analysis and by applying an optimized coherent averaging procedure (Chapter 6.a.), we were able to describe some characteristics of the conductance-volume signals (Chapter 6.b.). We unmasked the presence of considerable beat-to-beat variations in dyssynchrony that seemed more frequent in patients with ventricular dysfunction and to play a role in discriminating patients. These non-recurrent mechanical ventricular non-uniformities are probably the expression of the substantial beat-to-beat hemodynamic variations, often associated with heart failure and due to cardiopulmonary interaction and conduction disturbances. We investigated how the coherent averaging procedure may affect or refine the conductance based indexes; in addition, we proposed and tested a new set of indexes which quantify the non-periodic components of the volume signals. Using the new set of indexes we studied the acute effects of the CRT and the right ventricular pacing, in patients with heart failure and patients with preserved ventricular function. In the overall population we observed a correlation between the hemodynamic changes induced by the pacing and the indexes of dyssynchrony, and this may have practical implications for hemodynamic-guided device implantation. The optimal ventricular pacing site for patients with conventional indications for pacing remains controversial. The majority of them do not meet current clinical indications for CRT pacing. Thus, we carried out an analysis to compare the impact of several ventricular pacing sites on global and regional ventricular function and dyssynchrony (Chapter 6.c.). We observed that right ventricular pacing worsens cardiac function in patients with and without ventricular dysfunction unless the pacing site is optimized. CRT preserves left ventricular function in patients with normal ejection fraction and improves function in patients with poor ejection fraction despite no clinical indication for CRT. Moreover, the analysis of the results obtained using new indexes of regional dyssynchrony, suggests that pacing site may influence overall global ventricular function depending on its relative effects on regional function and synchrony. Another clinical problem that has been investigated in this work is the optimal right ventricular lead location for CRT (Chapter 6.d.). Similarly to the previous analysis, using novel parameters describing local synchrony and efficiency, we tested the hypothesis and we demonstrated that biventricular pacing with alternative right ventricular pacing sites produces acute improvement of ventricular systolic function and improves mechanical synchrony when compared to standard right ventricular pacing. Although no specific right ventricular location was shown to be superior during CRT, the right ventricular pacing site that produced the optimal acute hemodynamic response varied between patients. Acute hemodynamic effects of cardiac pacing are conventionally evaluated after stabilization episodes. The applied duration of stabilization periods in most cardiac pacing studies varied considerably. With an ad hoc protocol (Chapter 6.e.) and indexes of mechanical dyssynchrony derived by conductance catheter we demonstrated that the usage of stabilization periods during evaluation of cardiac pacing may mask early changes in systolic and diastolic intra-ventricular dyssynchrony. In fact, at the onset of ventricular pacing, the main dyssynchrony and ventricular performance changes occur within a 10s time span, initiated by the changes in ventricular mechanical dyssynchrony induced by aberrant conduction and followed by a partial or even complete recovery. It was already demonstrated in normal animals that ventricular mechanical dyssynchrony may act as a physiologic modulator of cardiac performance together with heart rate, contractile state, preload and afterload. The present observation, which shows the compensatory mechanism of mechanical dyssynchrony, suggests that ventricular dyssynchrony may be regarded as an intrinsic cardiac property, with baseline dyssynchrony at increased level in heart failure patients. To make available an independent system for cardiac output estimation, in order to confirm the results obtained with conductance volume method, we developed and validated a novel technique to apply the Modelflow method (a method that derives an aortic flow waveform from arterial pressure by simulation of a non-linear three-element aortic input impedance model, Wesseling et al. 1993) to the left ventricular pressure signal, instead of the arterial pressure used in the classical approach (Chapter 7.). The results confirmed that in patients without valve abnormalities, undergoing conductance catheter evaluations, the continuous monitoring of cardiac output using the intra-ventricular pressure signal is reliable. Thus, cardiac output can be monitored quantitatively and continuously with a simple and low-cost method. During this work, additional studies were carried out to investigate several areas of uncertainty of CRT. The results of these studies are briefly presented in Appendix: the long-term survival in patients treated with CRT in clinical practice, the effects of CRT in patients with mild symptoms of heart failure and in very old patients, the limited thoracotomy as a second choice alternative to transvenous implant for CRT delivery, the evolution and prognostic significance of diastolic filling pattern in CRT, the selection of candidates to CRT with echocardiographic criteria and the prediction of response to the therapy.
Resumo:
The first part of my thesis presents an overview of the different approaches used in the past two decades in the attempt to forecast epileptic seizure on the basis of intracranial and scalp EEG. Past research could reveal some value of linear and nonlinear algorithms to detect EEG features changing over different phases of the epileptic cycle. However, their exact value for seizure prediction, in terms of sensitivity and specificity, is still discussed and has to be evaluated. In particular, the monitored EEG features may fluctuate with the vigilance state and lead to false alarms. Recently, such a dependency on vigilance states has been reported for some seizure prediction methods, suggesting a reduced reliability. An additional factor limiting application and validation of most seizure-prediction techniques is their computational load. For the first time, the reliability of permutation entropy [PE] was verified in seizure prediction on scalp EEG data, contemporarily controlling for its dependency on different vigilance states. PE was recently introduced as an extremely fast and robust complexity measure for chaotic time series and thus suitable for online application even in portable systems. The capability of PE to distinguish between preictal and interictal state has been demonstrated using Receiver Operating Characteristics (ROC) analysis. Correlation analysis was used to assess dependency of PE on vigilance states. Scalp EEG-Data from two right temporal epileptic lobe (RTLE) patients and from one patient with right frontal lobe epilepsy were analysed. The last patient was included only in the correlation analysis, since no datasets including seizures have been available for him. The ROC analysis showed a good separability of interictal and preictal phases for both RTLE patients, suggesting that PE could be sensitive to EEG modifications, not visible on visual inspection, that might occur well in advance respect to the EEG and clinical onset of seizures. However, the simultaneous assessment of the changes in vigilance showed that: a) all seizures occurred in association with the transition of vigilance states; b) PE was sensitive in detecting different vigilance states, independently of seizure occurrences. Due to the limitations of the datasets, these results cannot rule out the capability of PE to detect preictal states. However, the good separability between pre- and interictal phases might depend exclusively on the coincidence of epileptic seizure onset with a transition from a state of low vigilance to a state of increased vigilance. The finding of a dependency of PE on vigilance state is an original finding, not reported in literature, and suggesting the possibility to classify vigilance states by means of PE in an authomatic and objectic way. The second part of my thesis provides the description of a novel behavioral task based on motor imagery skills, firstly introduced (Bruzzo et al. 2007), in order to study mental simulation of biological and non-biological movement in paranoid schizophrenics (PS). Immediately after the presentation of a real movement, participants had to imagine or re-enact the very same movement. By key release and key press respectively, participants had to indicate when they started and ended the mental simulation or the re-enactment, making it feasible to measure the duration of the simulated or re-enacted movements. The proportional error between duration of the re-enacted/simulated movement and the template movement were compared between different conditions, as well as between PS and healthy subjects. Results revealed a double dissociation between the mechanisms of mental simulation involved in biological and non-biologial movement simulation. While for PS were found large errors for simulation of biological movements, while being more acurate than healthy subjects during simulation of non-biological movements. Healthy subjects showed the opposite relationship, making errors during simulation of non-biological movements, but being most accurate during simulation of non-biological movements. However, the good timing precision during re-enactment of the movements in all conditions and in both groups of participants suggests that perception, memory and attention, as well as motor control processes were not affected. Based upon a long history of literature reporting the existence of psychotic episodes in epileptic patients, a longitudinal study, using a slightly modified behavioral paradigm, was carried out with two RTLE patients, one patient with idiopathic generalized epilepsy and one patient with extratemporal lobe epilepsy. Results provide strong evidence for a possibility to predict upcoming seizures in RTLE patients behaviorally. In the last part of the thesis it has been validated a behavioural strategy based on neurobiofeedback training, to voluntarily control seizures and to reduce there frequency. Three epileptic patients were included in this study. The biofeedback was based on monitoring of slow cortical potentials (SCPs) extracted online from scalp EEG. Patients were trained to produce positive shifts of SCPs. After a training phase patients were monitored for 6 months in order to validate the ability of the learned strategy to reduce seizure frequency. Two of the three refractory epileptic patients recruited for this study showed improvements in self-management and reduction of ictal episodes, even six months after the last training session.
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
Prediction of glycemic profile is an important task for both early recognition of hypoglycemia and enhancement of the control algorithms for optimization of insulin infusion rate. Adaptive models for glucose prediction and recognition of hypoglycemia based on statistical and artificial intelligence techniques are presented.