146 resultados para Process control - Statistical methods
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
With daily commercial and social activity in cities, regulation of train service in mass rapid transit railways is necessary to maintain service and passenger flow. Dwell-time adjustment at stations is one commonly used approach to regulation of train service, but its control space is very limited. Coasting control is a viable means of meeting the specific run-time in an inter-station run. The current practice is to start coasting at a fixed distance from the departed station. Hence, it is only optimal with respect to a nominal operational condition of the train schedule, but not the current service demand. The advantage of coasting can only be fully secured when coasting points are determined in real-time. However, identifying the necessary starting point(s) for coasting under the constraints of current service conditions is no simple task as train movement is governed by a large number of factors. The feasibility and performance of classical and heuristic searching measures in locating coasting point(s) is studied with the aid of a single train simulator, according to specified inter-station run times.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros
Resumo:
The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.
Resumo:
Train delay is one of the most important indexes to evaluate the service quality of the railway. Because of the interactions of movement among trains, a delayed train may conflict with trains scheduled on other lines at junction area. Train that loses conflict may be forced to stop or slow down because of restrictive signals, which consequently leads to the loss of run-time and probably enlarges more delays. This paper proposes a time-saving train control method to recover delays as soon as possible. In the proposed method, golden section search is adopted to identify the optimal train speed at the expected time of restrictive signal aspect upgrades, which enables the train to depart from the conflicting area as soon as possible. A heuristic method is then developed to attain the advisory train speed profile assisting drivers in train control. Simulation study indicates that the proposed method enables the train to recover delays as soon as possible in case of disturbances at railway junctions, in comparison with the traditional maximum traction strategy and the green wave strategy.
Resumo:
Networked control systems (NCSs) offer many advantages over conventional control; however, they also demonstrate challenging problems such as network-induced delay and packet losses. This paper proposes an approach of predictive compensation for simultaneous network-induced delays and packet losses. Different from the majority of existing NCS control methods, the proposed approach addresses co-design of both network and controller. It also alleviates the requirements of precise process models and full understanding of NCS network dynamics. For a series of possible sensor-to-actuator delays, the controller computes a series of corresponding redundant control values. Then, it sends out those control values in a single packet to the actuator. Once receiving the control packet, the actuator measures the actual sensor-to-actuator delay and computes the control signals from the control packet. When packet dropout occurs, the actuator utilizes past control packets to generate an appropriate control signal. The effectiveness of the approach is demonstrated through examples.
Resumo:
Aims: This paper describes the development of a risk adjustment (RA) model predictive of individual lesion treatment failure in percutaneous coronary interventions (PCI) for use in a quality monitoring and improvement program. Methods and results: Prospectively collected data for 3972 consecutive revascularisation procedures (5601 lesions) performed between January 2003 and September 2011 were studied. Data on procedures to September 2009 (n = 3100) were used to identify factors predictive of lesion treatment failure. Factors identified included lesion risk class (p < 0.001), occlusion type (p < 0.001), patient age (p = 0.001), vessel system (p < 0.04), vessel diameter (p < 0.001), unstable angina (p = 0.003) and presence of major cardiac risk factors (p = 0.01). A Bayesian RA model was built using these factors with predictive performance of the model tested on the remaining procedures (area under the receiver operating curve: 0.765, Hosmer–Lemeshow p value: 0.11). Cumulative sum, exponentially weighted moving average and funnel plots were constructed using the RA model and subjectively evaluated. Conclusion: A RA model was developed and applied to SPC monitoring for lesion failure in a PCI database. If linked to appropriate quality improvement governance response protocols, SPC using this RA tool might improve quality control and risk management by identifying variation in performance based on a comparison of observed and expected outcomes.
Resumo:
Service processes such as financial advice, booking a business trip or conducting a consulting project have emerged as units of analysis of high interest for the business process and service management communities in practice and academia. While the transactional nature of production processes is relatively well understood and deployed, the less predictable and highly interactive nature of service processes still lacks in many areas appropriate methodological grounding. This paper proposes a framework of a process laboratory as a new IT artefact in order to facilitate the holistic analysis and simulation of such service processes. Using financial services as an example, it will be shown how such a process laboratory can be used to reduce the complexity of service process analysis and facilitate operational service process control.
Resumo:
Business processes are prone to continuous and unexpected changes. Process workers may start executing a process differently in order to adjust to changes in workload, season, guidelines or regulations for example. Early detection of business process changes based on their event logs – also known as business process drift detection – enables analysts to identify and act upon changes that may otherwise affect process performance. Previous methods for business process drift detection are based on an exploration of a potentially large feature space and in some cases they require users to manually identify the specific features that characterize the drift. Depending on the explored feature set, these methods may miss certain types of changes. This paper proposes a fully automated and statistically grounded method for detecting process drift. The core idea is to perform statistical tests over the distributions of runs observed in two consecutive time windows. By adaptively sizing the window, the method strikes a trade-off between classification accuracy and drift detection delay. A validation on synthetic and real-life logs shows that the method accurately detects typical change patterns and scales up to the extent it is applicable for online drift detection.
Resumo:
Fluorinated surfactant-based aqueous film-forming foams (AFFFs) are made up of per- and polyfluorinated alkyl substances (PFAS) and are used to extinguish fires involving highly flammable liquids. The use of perfluorooctanesulfonic acid (PFOS) and other perfluoroalkyl acids (PFAAs) in some AFFF formulations has been linked to substantial environmental contamination. Recent studies have identified a large number of novel and infrequently reported fluorinated surfactants in different AFFF formulations. In this study, a strategy based on a case-control approach using quadrupole time-of-flight tandem mass spectrometry (QTOF-MS/MS) and advanced statistical methods has been used to extract and identify known and unknown PFAS in human serum associated with AFFF-exposed firefighters. Two target sulfonic acids [PFOS and perfluorohexanesulfonic acid (PFHxS)], three non-target acids [perfluoropentanesulfonic acid (PFPeS), perfluoroheptanesulfonic acid (PFHpS), and perfluorononanesulfonic acid (PFNS)], and four unknown sulfonic acids (Cl-PFOS, ketone-PFOS, ether-PFHxS, and Cl-PFHxS) were exclusively or significantly more frequently detected at higher levels in firefighters compared to controls. The application of this strategy has allowed for identification of previously unreported fluorinated chemicals in a timely and cost-efficient way.
Resumo:
Migraine is a painful disorder for which the etiology remains obscure. Diagnosis is largely based on International Headache Society criteria. However, no feature occurs in all patients who meet these criteria, and no single symptom is required for diagnosis. Consequently, this definition may not accurately reflect the phenotypic heterogeneity or genetic basis of the disorder. Such phenotypic uncertainty is typical for complex genetic disorders and has encouraged interest in multivariate statistical methods for classifying disease phenotypes. We applied three popular statistical phenotyping methods—latent class analysis, grade of membership and grade of membership “fuzzy” clustering (Fanny)—to migraine symptom data, and compared heritability and genome-wide linkage results obtained using each approach. Our results demonstrate that different methodologies produce different clustering structures and non-negligible differences in subsequent analyses. We therefore urge caution in the use of any single approach and suggest that multiple phenotyping methods be used.
Resumo:
The ISSCT Process Section workshop held in Réunion 20–23 October 2008 was attended by 51 delegates from 10 countries. The theme was Green cane impact on sugar processing. The workshop provided a valuable and timely opportunity to review and discuss the impact on factory operations and performance from a green cane supply that could include significant levels of trash. It was particularly relevant to those mills that were considering options to boost their biomass intake for increased co-generation capacity. Several of the speakers related their experiences with processing ‘whole of crop’ cane supplies through the factory. Speakers detailed the problems and increased losses that were incurred when processing cane with high trash levels. The consensus of the delegates was that the best scenario would involve a cane-cleaning plant at the factory so that only clean cane would be processed through the factory. The forum recommended that more research was required to address the issues of increased impurities in the process streams associated with high trash levels. Site visits to the two factories and a cane-delivery station were arranged as part of the workshop.
Resumo:
Advances in symptom management strategies through a better understanding of cancer symptom clusters depend on the identification of symptom clusters that are valid and reliable. The purpose of this exploratory research was to investigate alternative analytical approaches to identify symptom clusters for patients with cancer, using readily accessible statistical methods, and to justify which methods of identification may be appropriate for this context. Three studies were undertaken: (1) a systematic review of the literature, to identify analytical methods commonly used for symptom cluster identification for cancer patients; (2) a secondary data analysis to identify symptom clusters and compare alternative methods, as a guide to best practice approaches in cross-sectional studies; and (3) a secondary data analysis to investigate the stability of symptom clusters over time. The systematic literature review identified, in 10 years prior to March 2007, 13 cross-sectional studies implementing multivariate methods to identify cancer related symptom clusters. The methods commonly used to group symptoms were exploratory factor analysis, hierarchical cluster analysis and principal components analysis. Common factor analysis methods were recommended as the best practice cross-sectional methods for cancer symptom cluster identification. A comparison of alternative common factor analysis methods was conducted, in a secondary analysis of a sample of 219 ambulatory cancer patients with mixed diagnoses, assessed within one month of commencing chemotherapy treatment. Principal axis factoring, unweighted least squares and image factor analysis identified five consistent symptom clusters, based on patient self-reported distress ratings of 42 physical symptoms. Extraction of an additional cluster was necessary when using alpha factor analysis to determine clinically relevant symptom clusters. The recommended approaches for symptom cluster identification using nonmultivariate normal data were: principal axis factoring or unweighted least squares for factor extraction, followed by oblique rotation; and use of the scree plot and Minimum Average Partial procedure to determine the number of factors. In contrast to other studies which typically interpret pattern coefficients alone, in these studies symptom clusters were determined on the basis of structure coefficients. This approach was adopted for the stability of the results as structure coefficients are correlations between factors and symptoms unaffected by the correlations between factors. Symptoms could be associated with multiple clusters as a foundation for investigating potential interventions. The stability of these five symptom clusters was investigated in separate common factor analyses, 6 and 12 months after chemotherapy commenced. Five qualitatively consistent symptom clusters were identified over time (Musculoskeletal-discomforts/lethargy, Oral-discomforts, Gastrointestinaldiscomforts, Vasomotor-symptoms, Gastrointestinal-toxicities), but at 12 months two additional clusters were determined (Lethargy and Gastrointestinal/digestive symptoms). Future studies should include physical, psychological, and cognitive symptoms. Further investigation of the identified symptom clusters is required for validation, to examine causality, and potentially to suggest interventions for symptom management. Future studies should use longitudinal analyses to investigate change in symptom clusters, the influence of patient related factors, and the impact on outcomes (e.g., daily functioning) over time.