Biblioteca Digital

313 resultados para prediction accuracy

em Queensland University of Technology - ePrints Archive

Near real-time prediction of Zenith Tropospheric Delays for position estimation over long-baseline CORS network

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents the preliminary results in establishing a strategy for predicting Zenith Tropospheric Delay (ZTD) and relative ZTD (rZTD) between Continuous Operating Reference Stations (CORS) in near real-time. It is anticipated that the predicted ZTD or rZTD can assist the network-based Real-Time Kinematic (RTK) performance over long inter-station distances, ultimately, enabling a cost effective method of delivering precise positioning services to sparsely populated regional areas, such as Queensland. This research firstly investigates two ZTD solutions: 1) the post-processed IGS ZTD solution and 2) the near Real-Time ZTD solution. The near Real-Time solution is obtained through the GNSS processing software package (Bernese) that has been deployed for this project. The predictability of the near Real-Time Bernese solution is analyzed and compared to the post-processed IGS solution where it acts as the benchmark solution. The predictability analyses were conducted with various prediction time of 15, 30, 45, and 60 minutes to determine the error with respect to timeliness. The predictability of ZTD and relative ZTD is determined (or characterized) by using the previously estimated ZTD as the predicted ZTD of current epoch. This research has shown that both the ZTD and relative ZTD predicted errors are random in nature; the STD grows from a few millimeters to sub-centimeters while the predicted delay interval ranges from 15 to 60 minutes. Additionally, the RZTD predictability shows very little dependency on the length of tested baselines of up to 1000 kilometers. Finally, the comparison of near Real-Time Bernese solution with IGS solution has shown a slight degradation in the prediction accuracy. The less accurate NRT solution has an STD error of 1cm within the delay of 50 minutes. However, some larger errors of up to 10cm are observed.

Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.

Prediction of transcriptional regulatory interactions in bacteria : a comparative genomics approach

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Frequency-motivated observer design for the prediction of parametric roll resonance control applications in marine systems

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This work deals with estimators for predicting when parametric roll resonance is going to occur in surface vessels. The roll angle of the vessel is modeled as a second-order linear oscillatory system with unknown parameters. Several algorithms are used to estimate the parameters and eigenvalues of the system based on data gathered experimentally on a 1:45 scale model of a tanker. Based on the estimated eigenvalues, the system predicts whether or not parametric roll occurred. A prediction accuracy of 100% is achieved for regular waves, and up to 87.5% for irregular waves.

Wavelet based approaches and high resolution methods for complex process models

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many industrial processes and systems can be modelled mathematically by a set of Partial Differential Equations (PDEs). Finding a solution to such a PDF model is essential for system design, simulation, and process control purpose. However, major difficulties appear when solving PDEs with singularity. Traditional numerical methods, such as finite difference, finite element, and polynomial based orthogonal collocation, not only have limitations to fully capture the process dynamics but also demand enormous computation power due to the large number of elements or mesh points for accommodation of sharp variations. To tackle this challenging problem, wavelet based approaches and high resolution methods have been recently developed with successful applications to a fixedbed adsorption column model. Our investigation has shown that recent advances in wavelet based approaches and high resolution methods have the potential to be adopted for solving more complicated dynamic system models. This chapter will highlight the successful applications of these new methods in solving complex models of simulated-moving-bed (SMB) chromatographic processes. A SMB process is a distributed parameter system and can be mathematically described by a set of partial/ordinary differential equations and algebraic equations. These equations are highly coupled; experience wave propagations with steep front, and require significant numerical effort to solve. To demonstrate the numerical computing power of the wavelet based approaches and high resolution methods, a single column chromatographic process modelled by a Transport-Dispersive-Equilibrium linear model is investigated first. Numerical solutions from the upwind-1 finite difference, wavelet-collocation, and high resolution methods are evaluated by quantitative comparisons with the analytical solution for a range of Peclet numbers. After that, the advantages of the wavelet based approaches and high resolution methods are further demonstrated through applications to a dynamic SMB model for an enantiomers separation process. This research has revealed that for a PDE system with a low Peclet number, all existing numerical methods work well, but the upwind finite difference method consumes the most time for the same degree of accuracy of the numerical solution. The high resolution method provides an accurate numerical solution for a PDE system with a medium Peclet number. The wavelet collocation method is capable of catching up steep changes in the solution, and thus can be used for solving PDE models with high singularity. For the complex SMB system models under consideration, both the wavelet based approaches and high resolution methods are good candidates in terms of computation demand and prediction accuracy on the steep front. The high resolution methods have shown better stability in achieving steady state in the specific case studied in this Chapter.

Freeway traffic estimation in Beijing based on particle filter

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Short-term traffic flow data is characterized by rapid and dramatic fluctuations. It reflects the nature of the frequent congestion in the lane, which shows a strong nonlinear feature. Traffic state estimation based on the data gained by electronic sensors is critical for much intelligent traffic management and the traffic control. In this paper, a solution to freeway traffic estimation in Beijing is proposed using a particle filter, based on macroscopic traffic flow model, which estimates both traffic density and speed.Particle filter is a nonlinear prediction method, which has obvious advantages for traffic flows prediction. However, with the increase of sampling period, the volatility of the traffic state curve will be much dramatic. Therefore, the prediction accuracy will be affected and difficulty of forecasting is raised. In this paper, particle filter model is applied to estimate the short-term traffic flow. Numerical study is conducted based on the Beijing freeway data with the sampling period of 2 min. The relatively high accuracy of the results indicates the superiority of the proposed model.

Machine prognostics based on health state probability estimation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The ability to accurately predict the remaining useful life of machine components is critical for machine continuous operation and can also improve productivity and enhance system’s safety. In condition-based maintenance (CBM), maintenance is performed based on information collected through condition monitoring and assessment of the machine health. Effective diagnostics and prognostics are important aspects of CBM for maintenance engineers to schedule a repair and to acquire replacement components before the components actually fail. Although a variety of prognostic methodologies have been reported recently, their application in industry is still relatively new and mostly focused on the prediction of specific component degradations. Furthermore, they required significant and sufficient number of fault indicators to accurately prognose the component faults. Hence, sufficient usage of health indicators in prognostics for the effective interpretation of machine degradation process is still required. Major challenges for accurate longterm prediction of remaining useful life (RUL) still remain to be addressed. Therefore, continuous development and improvement of a machine health management system and accurate long-term prediction of machine remnant life is required in real industry application. This thesis presents an integrated diagnostics and prognostics framework based on health state probability estimation for accurate and long-term prediction of machine remnant life. In the proposed model, prior empirical (historical) knowledge is embedded in the integrated diagnostics and prognostics system for classification of impending faults in machine system and accurate probability estimation of discrete degradation stages (health states). The methodology assumes that machine degradation consists of a series of degraded states (health states) which effectively represent the dynamic and stochastic process of machine failure. The estimation of discrete health state probability for the prediction of machine remnant life is performed using the ability of classification algorithms. To employ the appropriate classifier for health state probability estimation in the proposed model, comparative intelligent diagnostic tests were conducted using five different classifiers applied to the progressive fault data of three different faults in a high pressure liquefied natural gas (HP-LNG) pump. As a result of this comparison study, SVMs were employed in heath state probability estimation for the prediction of machine failure in this research. The proposed prognostic methodology has been successfully tested and validated using a number of case studies from simulation tests to real industry applications. The results from two actual failure case studies using simulations and experiments indicate that accurate estimation of health states is achievable and the proposed method provides accurate long-term prediction of machine remnant life. In addition, the results of experimental tests show that the proposed model has the capability of providing early warning of abnormal machine operating conditions by identifying the transitional states of machine fault conditions. Finally, the proposed prognostic model is validated through two industrial case studies. The optimal number of health states which can minimise the model training error without significant decrease of prediction accuracy was also examined through several health states of bearing failure. The results were very encouraging and show that the proposed prognostic model based on health state probability estimation has the potential to be used as a generic and scalable asset health estimation tool in industrial machinery.

Predicting residue-wise contact orders in proteins by support vector regression

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Acceptability-based QoE models for mobile video

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Quality of experience (QoE) measures the overall perceived quality of mobile video delivery from subjective user experience and objective system performance. Current QoE computing models have two main limitations: 1) insufficient consideration of the factors influencing QoE, and; 2) limited studies on QoE models for acceptability prediction. In this paper, a set of novel acceptability-based QoE models, denoted as A-QoE, is proposed based on the results of comprehensive user studies on subjective quality acceptance assessments. The models are able to predict users’ acceptability and pleasantness in various mobile video usage scenarios. Statistical regression analysis has been used to build the models with a group of influencing factors as independent predictors, including encoding parameters and bitrate, video content characteristics, and mobile device display resolution. The performance of the proposed A-QoE models has been compared with three well-known objective Video Quality Assessment metrics: PSNR, SSIM and VQM. The proposed A-QoE models have high prediction accuracy and usage flexibility. Future user-centred mobile video delivery systems can benefit from applying the proposed QoE-based management to optimize video coding and quality delivery decisions.

Physiotherapists have accurate expectations of their patients’ future health-related quality of life after first assessment in a subacute rehabilitation setting

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND Expectations held by health professionals and their patients are likely to affect treatment choices in subacute inpatient rehabilitation settings for older adults. There is a scarcity of empirical evidence evaluating whether health professionals expectations of the quality of their patients' future health states are accurate. METHODS A prospective longitudinal cohort investigation was implemented to examine agreement (kappa coefficients, exact agreement, limits-of-agreement, and intraclass-correlation coefficients) between physiotherapists' (n = 23) prediction of patients' discharge health-related quality of life (reported on the EQ-5D-3L) and the actual health-related quality of life self-reported by patients (n = 272) at their discharge assessment (using the EQ-5D-3L). The mini-mental state examination was used as an indicator of patients' cognitive ability. RESULTS Overall, 232 (85%) patients had all assessment data completed and were included in analysis. Kappa coefficients (exact agreement) ranged between 0.37-0.57 (58%-83%) across EQ-5D-3L domains in the lower cognition group and 0.53-0.68 (81%-85%) in the better cognition group. CONCLUSIONS Physiotherapists in this subacute rehabilitation setting predicted their patients' discharge health-related quality of life with substantial accuracy. Physiotherapists are likely able to provide their patients with sound information regarding potential recovery and health-related quality of life on discharge. The prediction accuracy was higher among patients with better cognition than patients with poorer cognition.

Lateralization of temporal lobe epilepsy based on resting-state functional magnetic resonance imaging and machine learning

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lateralization of temporal lobe epilepsy (TLE) is critical for successful outcome of surgery to relieve seizures. TLE affects brain regions beyond the temporal lobes and has been associated with aberrant brain networks, based on evidence from functional magnetic resonance imaging. We present here a machine learning-based method for determining the laterality of TLE, using features extracted from resting-state functional connectivity of the brain. A comprehensive feature space was constructed to include network properties within local brain regions, between brain regions, and across the whole network. Feature selection was performed based on random forest and a support vector machine was employed to train a linear model to predict the laterality of TLE on unseen patients. A leave-one-patient-out cross validation was carried out on 12 patients and a prediction accuracy of 83% was achieved. The importance of selected features was analyzed to demonstrate the contribution of resting-state connectivity attributes at voxel, region, and network levels to TLE lateralization.

Clinical Accuracy of the MedGem Indirect Calorimeter for Measuring Resting Energy Expenditure in Cancer Patients

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: To compare, in patients with cancer and in healthy subjects, measured resting energy expenditure (REE) from traditional indirect calorimetry to a new portable device (MedGem) and predicted REE. DESIGN: Cross-sectional clinical validation study. SETTING: Private radiation oncology centre, Brisbane, Australia. SUBJECTS: Cancer patients (n = 18) and healthy subjects (n = 17) aged 37-86 y, with body mass indices ranging from 18 to 42 kg/m(2). INTERVENTIONS: Oxygen consumption (VO(2)) and REE were measured by VMax229 (VM) and MedGem (MG) indirect calorimeters in random order after a 12-h fast and 30-min rest. REE was also calculated from the MG without adjustment for nitrogen excretion (MGN) and estimated from Harris-Benedict prediction equations. Data were analysed using the Bland and Altman approach, based on a clinically acceptable difference between methods of 5%. RESULTS: The mean bias (MGN-VM) was 10% and limits of agreement were -42 to 21% for cancer patients; mean bias -5% with limits of -45 to 35% for healthy subjects. Less than half of the cancer patients (n = 7, 46.7%) and only a third (n = 5, 33.3%) of healthy subjects had measured REE by MGN within clinically acceptable limits of VM. Predicted REE showed a mean bias (HB-VM) of -5% for cancer patients and 4% for healthy subjects, with limits of agreement of -30 to 20% and -27 to 34%, respectively. CONCLUSIONS: Limits of agreement for the MG and Harris Benedict equations compared to traditional indirect calorimetry were similar but wide, indicating poor clinical accuracy for determining the REE of individual cancer patients and healthy subjects.

Application of data mining techniques in the prediction of coronary artery disease : use of anaesthesia time-series and patient risk factor data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Best practices in prediction for decision-making : lessons from the atmospheric and earth sciences

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Predictions that result from scientific research hold great appeal for decision-makers who are grappling with complex and controversial environmental issues, by promising to enhance their ability to determine a need for and outcomes of alternative decisions. A problem exists in that decision-makers and scientists in the public and private sectors solicit, produce, and use such predictions with little understanding of their accuracy or utility, and often without systematic evaluation or mechanisms of accountability. In order to contribute to a more effective role for ecological science in support of decision-making, this paper discusses three ``best practices'' for quantitative ecosystem modeling and prediction gleaned from research on modeling, prediction, and decision-making in the atmospheric and earth sciences. The lessons are distilled from a series of case studies and placed into the specific context of examples from ecological science.

Incorporating weather into regionwide safety planning prediction models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Predicting safety on roadways is standard practice for road safety professionals and has a corresponding extensive literature. The majority of safety prediction models are estimated using roadway segment and intersection (microscale) data, while more recently efforts have been undertaken to predict safety at the planning level (macroscale). Safety prediction models typically include roadway, operations, and exposure variables—factors known to affect safety in fundamental ways. Environmental variables, in particular variables attempting to capture the effect of rain on road safety, are difficult to obtain and have rarely been considered. In the few cases weather variables have been included, historical averages rather than actual weather conditions during which crashes are observed have been used. Without the inclusion of weather related variables researchers have had difficulty explaining regional differences in the safety performance of various entities (e.g. intersections, road segments, highways, etc.) As part of the NCHRP 8-44 research effort, researchers developed PLANSAFE, or planning level safety prediction models. These models make use of socio-economic, demographic, and roadway variables for predicting planning level safety. Accounting for regional differences - similar to the experience for microscale safety models - has been problematic during the development of planning level safety prediction models. More specifically, without weather related variables there is an insufficient set of variables for explaining safety differences across regions and states. Furthermore, omitted variable bias resulting from excluding these important variables may adversely impact the coefficients of included variables, thus contributing to difficulty in model interpretation and accuracy. This paper summarizes the results of an effort to include weather related variables, particularly various measures of rainfall, into accident frequency prediction and the prediction of the frequency of fatal and/or injury degree of severity crash models. The purpose of the study was to determine whether these variables do in fact improve overall goodness of fit of the models, whether these variables may explain some or all of observed regional differences, and identifying the estimated effects of rainfall on safety. The models are based on Traffic Analysis Zone level datasets from Michigan, and Pima and Maricopa Counties in Arizona. Numerous rain-related variables were found to be statistically significant, selected rain related variables improved the overall goodness of fit, and inclusion of these variables reduced the portion of the model explained by the constant in the base models without weather variables. Rain tends to diminish safety, as expected, in fairly complex ways, depending on rain frequency and intensity.

«
1
2
3
4
5
6
7
8
...
20
21
»