Biblioteca Digital

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Veja mais

The LH5 model for data mining

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the age of E-Business many companies faced with massive data sets that must be analysed for gaining a competitive edge. these data sets are in many instances incomplete and quite often not of very high quality. Although statistical analysis can be used to pre-process these data sets, this technique has its own limitations. In this paper we are presenting a system - and its underlying model - that can be used to test the integrity of existing data and pre-process the data into clearer data sets to be mined. LH5 is a rule-based system, capable of self-learning and is illustrated using a medical data set.

Veja mais

Mid-Holocene ENSO: Issues in quantitative model-proxy data comparisons

Relevância:

100.00% 100.00%

Publicador:

Veja mais

Estimation of systematic error in an equatorial ocean model using data assimilation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Assimilation of temperature observations into an ocean model near the equator often results in a dynamically unbalanced state with unrealistic overturning circulations. The way in which these circulations arise from systematic errors in the model or its forcing is discussed. A scheme is proposed, based on the theory of state augmentation, which uses the departures of the model state from the observations to update slowly evolving bias fields. Results are summarized from an experiment applying this bias correction scheme to an ocean general circulation model. They show that the method produces more balanced analyses and a better fit to the temperature observations.

Veja mais

An experiment in the Assimilation of Data in Dynamical Analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A system for continuous data assimilation is presented and discussed. To simulate the dynamical development a channel version of a balanced barotropic model is used and geopotential (height) data are assimilated into the models computations as data become available. In the first experiment the updating is performed every 24th, 12th and 6th hours with a given network. The stations are distributed at random in 4 groups in order to simulate 4 areas with different density of stations. Optimum interpolation is performed for the difference between the forecast and the valid observations. The RMS-error of the analyses is reduced in time, and the error being smaller the more frequent the updating is performed. The updating every 6th hour yields an error in the analysis less than the RMS-error of the observation. In a second experiment the updating is performed by data from a moving satellite with a side-scan capability of about 15°. If the satellite data are analysed at every time step before they are introduced into the system the error of the analysis is reduced to a value below the RMS-error of the observation already after 24 hours and yields as a whole a better result than updating from a fixed network. If the satellite data are introduced without any modification the error of the analysis is reduced much slower and it takes about 4 days to reach a comparable result to the one where the data have been analysed.

Veja mais

Fast adaptive real-time classification for data streams with concept drift

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.

Veja mais

Temporal variability of soil CO2 emission after conventional and reduced tillage described by an exponential decay in time model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A quantificação do impacto das práticas de preparo sobre as perdas de carbono do solo é dependente da habilidade de se descrever a variabilidade temporal da emissão de CO2 do solo após preparo. Tem sido sugerido que as grandes quantidades de CO2 emitido após o preparo do solo podem servir como um indicador das modificações nos estoques de carbono do solo em longo termo. Neste trabalho é apresentado um modelo de duas partes baseado na temperatura e na umidade do solo e que inclui um termo exponencial decrescente do tempo que é eficiente no ajuste das emissões intermediárias após preparo: arado de disco seguido de uma passagem com a grade niveladora (convencional) e escarificador de arrasto seguido da passagem com rolo destorroador (reduzido). As emissões após o preparo do solo são descritas utilizando-se estimativa não linear com um coeficiente de determinação (R²) tão alto quanto 0.98 após preparo reduzido. Os resultados indicam que nas previsões da emissão de CO2 após o preparo do solo é importante considerar um termo exponencial decrescente no tempo após preparo.

Veja mais

Modelo de tempo de falha acelerado com fração de cura : uma abordagem unificada

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we study the accelerated failure-time generalized Gamma regression models with a unified approach. The models attempt to estimate simultaneously the effects of covariates on the acceleration/deceleration of the timing of a given event and the surviving fraction. The method is implemented in the free statistical software R. Finally the model is applied to a real dataset referring to the time until the return of the disease in patients diagnosed with breast cancer

Veja mais

UNIFORMLY ACCELERATED FINITE-TIME DETECTORS

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The behavior of uniformly accelerated detectors in the Minkowski and Rindler vacua is analyzed when the detector is coupled to a scalar field during a finite amount of time T. We point out that the logarithmic ultraviolet divergences reported in the literature are due to the instantaneous switching of the detector. We explicitly show this by considering a detector switched on and off continuously. The usual Planckian spectrum for the excitation probability is recovered in the limit T --> infinity.

Veja mais

Influence diagnostics in heteroscedastic and/or autoregressive nonlinear elliptical models for correlated data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose nonlinear elliptical models for correlated data with heteroscedastic and/or autoregressive structures. Our aim is to extend the models proposed by Russo et al. [22] by considering a more sophisticated scale structure to deal with variations in data dispersion and/or a possible autocorrelation among measurements taken throughout the same experimental unit. Moreover, to avoid the possible influence of outlying observations or to take into account the non-normal symmetric tails of the data, we assume elliptical contours for the joint distribution of random effects and errors, which allows us to attribute different weights to the observations. We propose an iterative algorithm to obtain the maximum-likelihood estimates for the parameters and derive the local influence curvatures for some specific perturbation schemes. The motivation for this work comes from a pharmacokinetic indomethacin data set, which was analysed previously by Bocheng and Xuping [1] under normality.

Veja mais

A formal model of data privacy

Relevância:

100.00% 100.00%

Publicador:

Veja mais

BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial

Relevância:

100.00% 100.00%

Publicador:

Resumo:

CONTEXT: It is uncertain whether intensified heart failure therapy guided by N-terminal brain natriuretic peptide (BNP) is superior to symptom-guided therapy. OBJECTIVE: To compare 18-month outcomes of N-terminal BNP-guided vs symptom-guided heart failure therapy. DESIGN, SETTING, AND PATIENTS: Randomized controlled multicenter Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) of 499 patients aged 60 years or older with systolic heart failure (ejection fraction < or = 45%), New York Heart Association (NYHA) class of II or greater, prior hospitalization for heart failure within 1 year, and N-terminal BNP level of 2 or more times the upper limit of normal. The study had an 18-month follow-up and it was conducted at 15 outpatient centers in Switzerland and Germany between January 2003 and June 2008. INTERVENTION: Uptitration of guideline-based treatments to reduce symptoms to NYHA class of II or less (symptom-guided therapy) and BNP level of 2 times or less the upper limit of normal and symptoms to NYHA class of II or less (BNP-guided therapy). MAIN OUTCOME MEASURES: Primary outcomes were 18-month survival free of all-cause hospitalizations and quality of life as assessed by structured validated questionnaires. RESULTS: Heart failure therapy guided by N-terminal BNP and symptom-guided therapy resulted in similar rates of survival free of all-cause hospitalizations (41% vs 40%, respectively; hazard ratio [HR], 0.91 [95% CI, 0.72-1.14]; P = .39). Patients' quality-of-life metrics improved over 18 months of follow-up but these improvements were similar in both the N-terminal BNP-guided and symptom-guided strategies. Compared with the symptom-guided group, survival free of hospitalization for heart failure, a secondary end point, was higher among those in the N-terminal BNP-guided group (72% vs 62%, respectively; HR, 0.68 [95% CI, 0.50-0.92]; P = .01). Heart failure therapy guided by N-terminal BNP improved outcomes in patients aged 60 to 75 years but not in those aged 75 years or older (P < .02 for interaction) CONCLUSION: Heart failure therapy guided by N-terminal BNP did not improve overall clinical outcomes or quality of life compared with symptom-guided treatment. TRIAL REGISTRATION: isrctn.org Identifier: ISRCTN43596477.

Veja mais

838 resultados para Accelerated failure time Model. Correlated data. Imputation. Residuals analysis

Filtro por publicador