946 resultados para Over-dispersion, Crash prediction, Bayesian method, Intersection safety
Resumo:
On August 3, 2009, a road safety audit was initiated for the intersection of IA 1 and County Road F-67 in Johnson County, Iowa. Due to the high volume of traffic accessing the cheese producing plant (Twin County Dairy, Inc.), a grocery store east of the intersection, and a large Amish community with horse-drawn wagons and carriages frequently sharing the roads with motorized vehicles, this intersection has developed a crash history that concerns the Iowa Department of Transportation (Iowa DOT), Iowa State Patrol, and local agencies. Considering this, Johnson County and the Iowa DOT requested that a road safety audit be conducted to address the safety concerns and recommend possible mitigation strategies.
Resumo:
We present the most comprehensive comparison to date of the predictive benefit of genetics in addition to currently used clinical variables, using genotype data for 33 single-nucleotide polymorphisms (SNPs) in 1,547 Caucasian men from the placebo arm of the REduction by DUtasteride of prostate Cancer Events (REDUCE®) trial. Moreover, we conducted a detailed comparison of three techniques for incorporating genetics into clinical risk prediction. The first method was a standard logistic regression model, which included separate terms for the clinical covariates and for each of the genetic markers. This approach ignores a substantial amount of external information concerning effect sizes for these Genome Wide Association Study (GWAS)-replicated SNPs. The second and third methods investigated two possible approaches to incorporating meta-analysed external SNP effect estimates - one via a weighted PCa 'risk' score based solely on the meta analysis estimates, and the other incorporating both the current and prior data via informative priors in a Bayesian logistic regression model. All methods demonstrated a slight improvement in predictive performance upon incorporation of genetics. The two methods that incorporated external information showed the greatest receiver-operating-characteristic AUCs increase from 0.61 to 0.64. The value of our methods comparison is likely to lie in observations of performance similarities, rather than difference, between three approaches of very different resource requirements. The two methods that included external information performed best, but only marginally despite substantial differences in complexity.
Resumo:
Sensible and latent heat fluxes are often calculated from bulk transfer equations combined with the energy balance. For spatial estimates of these fluxes, a combination of remotely sensed and standard meteorological data from weather stations is used. The success of this approach depends on the accuracy of the input data and on the accuracy of two variables in particular: aerodynamic and surface conductance. This paper presents a Bayesian approach to improve estimates of sensible and latent heat fluxes by using a priori estimates of aerodynamic and surface conductance alongside remote measurements of surface temperature. The method is validated for time series of half-hourly measurements in a fully grown maize field, a vineyard and a forest. It is shown that the Bayesian approach yields more accurate estimates of sensible and latent heat flux than traditional methods.
Resumo:
We have developed a new Bayesian approach to retrieve oceanic rain rate from the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI), with an emphasis on typhoon cases in the West Pacific. Retrieved rain rates are validated with measurements of rain gauges located on Japanese islands. To demonstrate improvement, retrievals are also compared with those from the TRMM/Precipitation Radar (PR), the Goddard Profiling Algorithm (GPROF), and a multi-channel linear regression statistical method (MLRS). We have found that qualitatively, all methods retrieved similar horizontal distributions in terms of locations of eyes and rain bands of typhoons. Quantitatively, our new Bayesian retrievals have the best linearity and the smallest root mean square (RMS) error against rain gauge data for 16 typhoon overpasses in 2004. The correlation coefficient and RMS of our retrievals are 0.95 and ~2 mm hr-1, respectively. In particular, at heavy rain rates, our Bayesian retrievals outperform those retrieved from GPROF and MLRS. Overall, the new Bayesian approach accurately retrieves surface rain rate for typhoon cases. Accurate rain rate estimates from this method can be assimilated in models to improve forecast and prevent potential damages in Taiwan during typhoon seasons.
Resumo:
Numerical Weather Prediction (NWP) fields are used to assist the detection of cloud in satellite imagery. Simulated observations based on NWP are used within a framework based on Bayes' theorem to calculate a physically-based probability of each pixel with an imaged scene being clear or cloudy. Different thresholds can be set on the probabilities to create application-specific cloud-masks. Here, this is done over both land and ocean using night-time (infrared) imagery. We use a validation dataset of difficult cloud detection targets for the Spinning Enhanced Visible and Infrared Imager (SEVIRI) achieving true skill scores of 87% and 48% for ocean and land, respectively using the Bayesian technique, compared to 74% and 39%, respectively for the threshold-based techniques associated with the validation dataset.
Resumo:
An analytical approach for spin stabilized attitude propagation is presented, considering the coupled effect of the aerodynamic torque and the gravity gradient torque. A spherical coordination system fixed in the satellite is used to locate the satellite spin axis in relation to the terrestrial equatorial system. The spin axis direction is specified by its right ascension and the declination angles and the equation of motion are described by these two angles and the magnitude of the spin velocity. An analytical averaging method is applied to obtain the mean torques over an orbital period. To compute the average components of both aerodynamic torque and the gravity gradient torque in the satellite body frame reference system, an average time in the fast varying orbit element, the mean anomaly, is utilized. Afterwards, the inclusion of such torques on the rotational motion differential equations of spin stabilized satellites yields conditions to derive an analytical solution. The pointing deviation evolution, that is, the deviation between the actual spin axis and the computed spin axis, is also availed. In order to validate the analytical approach, the theory developed has been applied for spin stabilized Brazilian satellite SCD1, which are quite appropriated for verification and comparison of the data generated and processed by the Satellite Control Center of the Brazil National Research Institute (INPE). Numerical simulations performed with data of Brazilian Satellite SCD1 show the period that the analytical solution can be used to the attitude propagation, within the dispersion range of the attitude determination system performance of Satellite Control Center of the Brazilian Research Institute.
A Phase Space Box-counting based Method for Arrhythmia Prediction from Electrocardiogram Time Series
Resumo:
Arrhythmia is one kind of cardiovascular diseases that give rise to the number of deaths and potentially yields immedicable danger. Arrhythmia is a life threatening condition originating from disorganized propagation of electrical signals in heart resulting in desynchronization among different chambers of the heart. Fundamentally, the synchronization process means that the phase relationship of electrical activities between the chambers remains coherent, maintaining a constant phase difference over time. If desynchronization occurs due to arrhythmia, the coherent phase relationship breaks down resulting in chaotic rhythm affecting the regular pumping mechanism of heart. This phenomenon was explored by using the phase space reconstruction technique which is a standard analysis technique of time series data generated from nonlinear dynamical system. In this project a novel index is presented for predicting the onset of ventricular arrhythmias. Analysis of continuously captured long-term ECG data recordings was conducted up to the onset of arrhythmia by the phase space reconstruction method, obtaining 2-dimensional images, analysed by the box counting method. The method was tested using the ECG data set of three different kinds including normal (NR), Ventricular Tachycardia (VT), Ventricular Fibrillation (VF), extracted from the Physionet ECG database. Statistical measures like mean (μ), standard deviation (σ) and coefficient of variation (σ/μ) for the box-counting in phase space diagrams are derived for a sliding window of 10 beats of ECG signal. From the results of these statistical analyses, a threshold was derived as an upper bound of Coefficient of Variation (CV) for box-counting of ECG phase portraits which is capable of reliably predicting the impeding arrhythmia long before its actual occurrence. As future work of research, it was planned to validate this prediction tool over a wider population of patients affected by different kind of arrhythmia, like atrial fibrillation, bundle and brunch block, and set different thresholds for them, in order to confirm its clinical applicability.
Resumo:
The chemotherapeutic drug 5-fluorouracil (5-FU) is widely used for treating solid tumors. Response to 5-FU treatment is variable with 10-30% of patients experiencing serious toxicity partly explained by reduced activity of dihydropyrimidine dehydrogenase (DPD). DPD converts endogenous uracil (U) into 5,6-dihydrouracil (UH(2) ), and analogously, 5-FU into 5-fluoro-5,6-dihydrouracil (5-FUH(2) ). Combined quantification of U and UH(2) with 5-FU and 5-FUH(2) may provide a pre-therapeutic assessment of DPD activity and further guide drug dosing during therapy. Here, we report the development of a liquid chromatography-tandem mass spectrometry assay for simultaneous quantification of U, UH(2) , 5-FU and 5-FUH(2) in human plasma. Samples were prepared by liquid-liquid extraction with 10:1 ethyl acetate-2-propanol (v/v). The evaporated samples were reconstituted in 0.1% formic acid and 10 μL aliquots were injected into the HPLC system. Analyte separation was achieved on an Atlantis dC(18) column with a mobile phase consisting of 1.0 mm ammonium acetate, 0.5 mm formic acid and 3.3% methanol. Positively ionized analytes were detected by multiple reaction monitoring. The analytical response was linear in the range 0.01-10 μm for U, 0.1-10 μm for UH(2) , 0.1-75 μm for 5-FU and 0.75-75 μm for 5-FUH(2) , covering the expected concentration ranges in plasma. The method was validated following the FDA guidelines and applied to clinical samples obtained from ten 5-FU-treated colorectal cancer patients. The present method merges the analysis of 5-FU pharmacokinetics and DPD activity into a single assay representing a valuable tool to improve the efficacy and safety of 5-FU-based chemotherapy.
Resumo:
When conducting a randomized comparative clinical trial, ethical, scientific or economic considerations often motivate the use of interim decision rules after successive groups of patients have been treated. These decisions may pertain to the comparative efficacy or safety of the treatments under study, cost considerations, the desire to accelerate the drug evaluation process, or the likelihood of therapeutic benefit for future patients. At the time of each interim decision, an important question is whether patient enrollment should continue or be terminated; either due to a high probability that one treatment is superior to the other, or a low probability that the experimental treatment will ultimately prove to be superior. The use of frequentist group sequential decision rules has become routine in the conduct of phase III clinical trials. In this dissertation, we will present a new Bayesian decision-theoretic approach to the problem of designing a randomized group sequential clinical trial, focusing on two-arm trials with time-to-failure outcomes. Forward simulation is used to obtain optimal decision boundaries for each of a set of possible models. At each interim analysis, we use Bayesian model selection to adaptively choose the model having the largest posterior probability of being correct, and we then make the interim decision based on the boundaries that are optimal under the chosen model. We provide a simulation study to compare this method, which we call Bayesian Doubly Optimal Group Sequential (BDOGS), to corresponding frequentist designs using either O'Brien-Fleming (OF) or Pocock boundaries, as obtained from EaSt 2000. Our simulation results show that, over a wide variety of different cases, BDOGS either performs at least as well as both OF and Pocock, or on average provides a much smaller trial. ^
Resumo:
In this paper, vehicle-track interaction for a new slab track design, conceived to reduce noise and vibration levels has been analyzed, assessing the derailment risk for trains running on curved track when encountering a broken rail. Two different types of rail fastening systems with different elasticities have been analysed and compared. Numerical methods were used in order to simulate the dynamic behaviour of the train-track interaction. Multibody system (MBS) modelling techniques were combined with techniques based on the finite element method (FEM). MBS modelling was used for modelling the vehicle and FEM for simulating the elastic track. The simulation model was validated by comparing simulated results to experimental data obtained in field testing. During the simulations various safety indices, characteristic of derailment risk, were analysed. The simulations realised at the maximum running velocity of 110 km/h showed a similar behaviour for several track types. When reducing the running speed, the safety indices worsened for both cases. Although the worst behaviour was observed for the track with a greater elasticity, in none of the simulations did a derailment occur when running over the broken rail.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
Mode of access: Internet.
Resumo:
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.
Resumo:
Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.
Resumo:
The ERS-1 Satellite was launched in July 1991 by the European Space Agency into a polar orbit at about 800 km, carrying a C-band scatterometer. A scatterometer measures the amount of backscatter microwave radiation reflected by small ripples on the ocean surface induced by sea-surface winds, and so provides instantaneous snap-shots of wind flow over large areas of the ocean surface, known as wind fields. Inherent in the physics of the observation process is an ambiguity in wind direction; the scatterometer cannot distinguish if the wind is blowing toward or away from the sensor device. This ambiguity implies that there is a one-to-many mapping between scatterometer data and wind direction. Current operational methods for wind field retrieval are based on the retrieval of wind vectors from satellite scatterometer data, followed by a disambiguation and filtering process that is reliant on numerical weather prediction models. The wind vectors are retrieved by the local inversion of a forward model, mapping scatterometer observations to wind vectors, and minimising a cost function in scatterometer measurement space. This thesis applies a pragmatic Bayesian solution to the problem. The likelihood is a combination of conditional probability distributions for the local wind vectors given the scatterometer data. The prior distribution is a vector Gaussian process that provides the geophysical consistency for the wind field. The wind vectors are retrieved directly from the scatterometer data by using mixture density networks, a principled method to model multi-modal conditional probability density functions. The complexity of the mapping and the structure of the conditional probability density function are investigated. A hybrid mixture density network, that incorporates the knowledge that the conditional probability distribution of the observation process is predominantly bi-modal, is developed. The optimal model, which generalises across a swathe of scatterometer readings, is better on key performance measures than the current operational model. Wind field retrieval is approached from three perspectives. The first is a non-autonomous method that confirms the validity of the model by retrieving the correct wind field 99% of the time from a test set of 575 wind fields. The second technique takes the maximum a posteriori probability wind field retrieved from the posterior distribution as the prediction. For the third technique, Markov Chain Monte Carlo (MCMC) techniques were employed to estimate the mass associated with significant modes of the posterior distribution, and make predictions based on the mode with the greatest mass associated with it. General methods for sampling from multi-modal distributions were benchmarked against a specific MCMC transition kernel designed for this problem. It was shown that the general methods were unsuitable for this application due to computational expense. On a test set of 100 wind fields the MAP estimate correctly retrieved 72 wind fields, whilst the sampling method correctly retrieved 73 wind fields.