978 resultados para Prediction algorithms
Resumo:
Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.
Resumo:
The objectives of this study were to develop a computerized method to screen for potentially avoidable hospital readmissions using routinely collected data and a prediction model to adjust rates for case mix. We studied hospital information system data of a random sample of 3,474 inpatients discharged alive in 1997 from a university hospital and medical records of those (1,115) readmitted within 1 year. The gold standard was set on the basis of the hospital data and medical records: all readmissions were classified as foreseen readmissions, unforeseen readmissions for a new affection, or unforeseen readmissions for a previously known affection. The latter category was submitted to a systematic medical record review to identify the main cause of readmission. Potentially avoidable readmissions were defined as a subgroup of unforeseen readmissions for a previously known affection occurring within an appropriate interval, set to maximize the chance of detecting avoidable readmissions. The computerized screening algorithm was strictly based on routine statistics: diagnosis and procedures coding and admission mode. The prediction was based on a Poisson regression model. There were 454 (13.1%) unforeseen readmissions for a previously known affection within 1 year. Fifty-nine readmissions (1.7%) were judged avoidable, most of them occurring within 1 month, which was the interval used to define potentially avoidable readmissions (n = 174, 5.0%). The intra-sample sensitivity and specificity of the screening algorithm both reached approximately 96%. Higher risk for potentially avoidable readmission was associated with previous hospitalizations, high comorbidity index, and long length of stay; lower risk was associated with surgery and delivery. The model offers satisfactory predictive performance and a good medical plausibility. The proposed measure could be used as an indicator of inpatient care outcome. However, the instrument should be validated using other sets of data from various hospitals.
Resumo:
Abstract Background: Many complex systems can be represented and analysed as networks. The recent availability of large-scale datasets, has made it possible to elucidate some of the organisational principles and rules that govern their function, robustness and evolution. However, one of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially for Mollicutes. If we could harness predicted interactions, such as those from a Protein-Protein Association Networks (PPAN), combining several protein-protein network function-inference methods with semantic similarity calculations, the use of protein-protein interactions for functional inference in this species would become more potentially useful. Results: In this work we show that using PPAN data combined with other approximations, such as functional module detection, orthology exploitation methods and Gene Ontology (GO)-based information measures helps to predict protein function in Mycoplasma genitalium. Conclusions: To our knowledge, the proposed method is the first that combines functional module detection among species, exploiting an orthology procedure and using information theory-based GO semantic similarity in PPAN of the Mycoplasma species. The results of an evaluation show a higher recall than previously reported methods that focused on only one organism network.
Resumo:
This paper proposes a very fast method for blindly approximating a nonlinear mapping which transforms a sum of random variables. The estimation is surprisingly good even when the basic assumption is not satisfied.We use the method for providing a good initialization for inverting post-nonlinear mixtures and Wiener systems. Experiments show that the algorithm speed is strongly improved and the asymptotic performance is preserved with a very low extra computational cost.
Resumo:
In this paper, we present a comprehensive study of different Independent Component Analysis (ICA) algorithms for the calculation of coherency and sharpness of electroencephalogram (EEG) signals, in order to investigate the possibility of early detection of Alzheimer’s disease (AD). We found that ICA algorithms can help in the artifact rejection and noise reduction, improving the discriminative property of features in high frequency bands (specially in high alpha and beta ranges). In addition to different ICA algorithms, the optimum number of selected components is investigated, in order to help decision processes for future works.
Resumo:
In this paper we present a quantitative comparisons of different independent component analysis (ICA) algorithms in order to investigate their potential use in preprocessing (such as noise reduction and feature extraction) the electroencephalogram (EEG) data for early detection of Alzhemier disease (AD) or discrimination between AD (or mild cognitive impairment, MCI) and age-match control subjects.
Resumo:
BACKGROUND AND PURPOSE: Several prognostic scores have been developed to predict the risk of symptomatic intracranial hemorrhage (sICH) after ischemic stroke thrombolysis. We compared the performance of these scores in a multicenter cohort. METHODS: We merged prospectively collected data of patients with consecutive ischemic stroke who received intravenous thrombolysis in 7 stroke centers. We identified and evaluated 6 scores that can provide an estimate of the risk of sICH in hyperacute settings: MSS (Multicenter Stroke Survey); HAT (Hemorrhage After Thrombolysis); SEDAN (blood sugar, early infarct signs, [hyper]dense cerebral artery sign, age, NIH Stroke Scale); GRASPS (glucose at presentation, race [Asian], age, sex [male], systolic blood pressure at presentation, and severity of stroke at presentation [NIH Stroke Scale]); SITS (Safe Implementation of Thrombolysis in Stroke); and SPAN (stroke prognostication using age and NIH Stroke Scale)-100 positive index. We included only patients with available variables for all scores. We calculated the area under the receiver operating characteristic curve (AUC-ROC) and also performed logistic regression and the Hosmer-Lemeshow test. RESULTS: The final cohort comprised 3012 eligible patients, of whom 221 (7.3%) had sICH per National Institute of Neurological Disorders and Stroke, 141 (4.7%) per European Cooperative Acute Stroke Study II, and 86 (2.9%) per Safe Implementation of Thrombolysis in Stroke criteria. The performance of the scores assessed with AUC-ROC for predicting European Cooperative Acute Stroke Study II sICH was: MSS, 0.63 (95% confidence interval, 0.58-0.68); HAT, 0.65 (0.60-0.70); SEDAN, 0.70 (0.66-0.73); GRASPS, 0.67 (0.62-0.72); SITS, 0.64 (0.59-0.69); and SPAN-100 positive index, 0.56 (0.50-0.61). SEDAN had significantly higher AUC-ROC values compared with all other scores, except for GRASPS where the difference was nonsignificant. SPAN-100 performed significantly worse compared with other scores. The discriminative ranking of the scores was the same for the National Institute of Neurological Disorders and Stroke, and Safe Implementation of Thrombolysis in Stroke definitions, with SEDAN performing best, GRASPS second, and SPAN-100 worst. CONCLUSIONS: SPAN-100 had the worst predictive power, and SEDAN constantly the highest predictive power. However, none of the scores had better than moderate performance.
Resumo:
Abstract: Asthma prevalence in children and adolescents in Spain is 10-17%. It is the most common chronic illness during childhood. Prevalence has been increasing over the last 40 years and there is considerable evidence that, among other factors, continued exposure to cigarette smoke results in asthma in children. No statistical or simulation model exist to forecast the evolution of childhood asthma in Europe. Such a model needs to incorporate the main risk factors that can be managed by medical authorities, such as tobacco (OR = 1.44), to establish how they affect the present generation of children. A simulation model using conditional probability and discrete event simulation for childhood asthma was developed and validated by simulating realistic scenario. The parameters used for the model (input data) were those found in the bibliography, especially those related to the incidence of smoking in Spain. We also used data from a panel of experts from the Hospital del Mar (Barcelona) related to actual evolution and asthma phenotypes. The results obtained from the simulation established a threshold of a 15-20% smoking population for a reduction in the prevalence of asthma. This is still far from the current level in Spain, where 24% of people smoke. We conclude that more effort must be made to combat smoking and other childhood asthma risk factors, in order to significantly reduce the number of cases. Once completed, this simulation methodology can realistically be used to forecast the evolution of childhood asthma as a function of variation in different risk factors.
Resumo:
BACKGROUND: Active screening by mobile teams is considered the best method for detecting human African trypanosomiasis (HAT) caused by Trypanosoma brucei gambiense but the current funding context in many post-conflict countries limits this approach. As an alternative, non-specialist health care workers (HCWs) in peripheral health facilities could be trained to identify potential cases who need testing based on their symptoms. We explored the predictive value of syndromic referral algorithms to identify symptomatic cases of HAT among a treatment-seeking population in Nimule, South Sudan. METHODOLOGY/PRINCIPAL FINDINGS: Symptom data from 462 patients (27 cases) presenting for a HAT test via passive screening over a 7 month period were collected to construct and evaluate over 14,000 four item syndromic algorithms considered simple enough to be used by peripheral HCWs. For comparison, algorithms developed in other settings were also tested on our data, and a panel of expert HAT clinicians were asked to make referral decisions based on the symptom dataset. The best performing algorithms consisted of three core symptoms (sleep problems, neurological problems and weight loss), with or without a history of oedema, cervical adenopathy or proximity to livestock. They had a sensitivity of 88.9-92.6%, a negative predictive value of up to 98.8% and a positive predictive value in this context of 8.4-8.7%. In terms of sensitivity, these out-performed more complex algorithms identified in other studies, as well as the expert panel. The best-performing algorithm is predicted to identify about 9/10 treatment-seeking HAT cases, though only 1/10 patients referred would test positive. CONCLUSIONS/SIGNIFICANCE: In the absence of regular active screening, improving referrals of HAT patients through other means is essential. Systematic use of syndromic algorithms by peripheral HCWs has the potential to increase case detection and would increase their participation in HAT programmes. The algorithms proposed here, though promising, should be validated elsewhere.
Resumo:
The major objective of this research project was to use thermal analysis techniques in conjunction with x-ray analysis methods to identify and explain chemical reactions that promote aggregate related deterioration in portland cement concrete. Twenty-two different carbonate aggregate samples were subjected to a chemical testing scheme that included: • bulk chemistry (major, minor and selected trace elements) • bulk mineralogy (minor phases concentrated by acid extraction) • solid-solution in the major carbonate phases • crystallite size determinations for the major carbonate phases • a salt treatment study to evaluate the impact of deicer salts Test results from these different studies were then compared to information that had been obtained using thermogravimetric analysis techniques. Since many of the limestones and dolomites that were used in the study had extensive field service records it was possible to correlate many of the variables with service life. The results of this study have indicated that thermogravimetric analysis can play an important role in categorizing carbonate aggregates. In fact, with modern automated thermal analysis systems it should be possible to utilize such methods on a quality control basis. Strong correlations were found between several of the variables that were monitored in this study. In fact, several of the variables exhibited significant correlations to concrete service life. When the full data set was utilized (n = 18), the significant correlations to service life can be summarized as follows ( a = 5% level): • Correlation coefficient, r, = -0.73 for premature TG loss versus service life. • Correlation coefficient, r, = 0.74 for relative crystallite size versus service life. • Correlation coefficient, r, = 0.53 for ASTM C666 durability factor versus service life. • Correlation coefficient, r, = -0.52 for acid-insoluble residue versus service life. Separation of the carbonate aggregates into their mineralogical categories (i.e., calcites and dolomites) tended to increase the correlation coefficients for some specific variables (r sometimes approached 0.90); however, the reliability of such correlations was questionable because of the small number of samples that were present in this study.
Resumo:
The objective of this work was to determine the viability equation constants for cottonseed and to detect the occurrence and depletion of hardseededness. Three seedlots of Brazilian cultivars IAC-19 and IAC-20 were tested, using 12 moisture content levels, ranging from 2.2 to 21.7% and three storage temperatures, 40, 50 and 65ºC. Seed moisture content level was reached from the initial value (around 8.8%) either by rehydration, in a closed container, or by drying in desiccators containing silica gel, both at 20ºC. Twelve seed subsamples for each moisture content/temperature treatment were sealed in laminated aluminium-foil packets and stored in incubators at those temperatures, until complete survival curves were obtained. Seed equilibrium relative humidity was recorded. Hardseededness was detected at moisture content levels below 6% and its releasing was achieved either naturally, during storage period, or artificially through seed coat removal. The viability equation quantified the response of seed longevity to storage environment well with K E = 9.240, C W = 5.190, C H = 0.03965 and C Q = 0.000426. The lower limit estimated for application of this equation at 65ºC was 3.6% moisture content.