868 resultados para multivariate data analysis
Resumo:
Many initiatives to improve Business processes are emerging. The essential roles and contributions of Business Analyst (BA) and Business Process Management (BPM) professionals to such initiatives have been recognized in literature and practice. The roles and responsibilities of a BA or BPM practitioner typically require different skill-sets; however these differences are often vague. This vagueness creates much confusion in practice and academia. While both the BA and BPM communities have made attempts to describe their domains through capability defining empirical research and developments of Bodies of knowledge, there has not yet been any attempt to identify the commonality of skills required and points of uniqueness between the two professions. This study aims to address this gap and presents the findings of a detailed content mapping exercise (using NVivo as a qualitative data analysis tool) of the International Institution of Business Analysis (IIBA®) Guide to the Business Analysis Body of Knowledge (BABOK® Guide) against core BPM competency and capability frameworks.
Resumo:
Optimal design for generalized linear models has primarily focused on univariate data. Often experiments are performed that have multiple dependent responses described by regression type models, and it is of interest and of value to design the experiment for all these responses. This requires a multivariate distribution underlying a pre-chosen model for the data. Here, we consider the design of experiments for bivariate binary data which are dependent. We explore Copula functions which provide a rich and flexible class of structures to derive joint distributions for bivariate binary data. We present methods for deriving optimal experimental designs for dependent bivariate binary data using Copulas, and demonstrate that, by including the dependence between responses in the design process, more efficient parameter estimates are obtained than by the usual practice of simply designing for a single variable only. Further, we investigate the robustness of designs with respect to initial parameter estimates and Copula function, and also show the performance of compound criteria within this bivariate binary setting.
Resumo:
This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.
Resumo:
This paper provides fundamental understanding for the use of cumulative plots for travel time estimation on signalized urban networks. Analytical modeling is performed to generate cumulative plots based on the availability of data: a) Case-D, for detector data only; b) Case-DS, for detector data and signal timings; and c) Case-DSS, for detector data, signal timings and saturation flow rate. The empirical study and sensitivity analysis based on simulation experiments have observed the consistency in performance for Case-DS and Case-DSS, whereas, for Case-D the performance is inconsistent. Case-D is sensitive to detection interval and signal timings within the interval. When detection interval is integral multiple of signal cycle then it has low accuracy and low reliability. Whereas, for detection interval around 1.5 times signal cycle both accuracy and reliability are high.
Resumo:
Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.
Resumo:
In this paper we present a sequential Monte Carlo algorithm for Bayesian sequential experimental design applied to generalised non-linear models for discrete data. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. We also consider a flexible parametric model for the stimulus-response relationship together with a newly developed hybrid design utility that can produce more robust estimates of the target stimulus in the presence of substantial model and parameter uncertainty. The algorithm is applied to hypothetical clinical trial or bioassay scenarios. In the discussion, potential generalisations of the algorithm are suggested to possibly extend its applicability to a wide variety of scenarios
Resumo:
Modern technology now has the ability to generate large datasets over space and time. Such data typically exhibit high autocorrelations over all dimensions. The field trial data motivating the methods of this paper were collected to examine the behaviour of traditional cropping and to determine a cropping system which could maximise water use for grain production while minimising leakage below the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18 columns, in the lattice framework of an agricultural field. Bayesian conditional autoregressive (CAR) models are used to account for local site correlations. Conditional autoregressive models have not been widely used in analyses of agricultural data. This paper serves to illustrate the usefulness of these models in this field, along with the ease of implementation in WinBUGS, a freely available software package. The innovation is the fitting of separate conditional autoregressive models for each depth layer, the ‘layered CAR model’, while simultaneously estimating depth profile functions for each site treatment. Modelling interest also lay in how best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbourhood models by depth layer. It is hierarchical, with separate onditional autoregressive spatial variance components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth errors as interval-censored measurement error. The Bayesian framework permits transparent specification and easy comparison of the various complex models compared.
Resumo:
Serving as a powerful tool for extracting localized variations in non-stationary signals, applications of wavelet transforms (WTs) in traffic engineering have been introduced; however, lacking in some important theoretical fundamentals. In particular, there is little guidance provided on selecting an appropriate WT across potential transport applications. This research described in this paper contributes uniquely to the literature by first describing a numerical experiment to demonstrate the shortcomings of commonly-used data processing techniques in traffic engineering (i.e., averaging, moving averaging, second-order difference, oblique cumulative curve, and short-time Fourier transform). It then mathematically describes WT’s ability to detect singularities in traffic data. Next, selecting a suitable WT for a particular research topic in traffic engineering is discussed in detail by objectively and quantitatively comparing candidate wavelets’ performances using a numerical experiment. Finally, based on several case studies using both loop detector data and vehicle trajectories, it is shown that selecting a suitable wavelet largely depends on the specific research topic, and that the Mexican hat wavelet generally gives a satisfactory performance in detecting singularities in traffic and vehicular data.
Resumo:
Monitoring environmental health is becoming increasingly important as human activity and climate change place greater pressure on global biodiversity. Acoustic sensors provide the ability to collect data passively, objectively and continuously across large areas for extended periods. While these factors make acoustic sensors attractive as autonomous data collectors, there are significant issues associated with large-scale data manipulation and analysis. We present our current research into techniques for analysing large volumes of acoustic data efficiently. We provide an overview of a novel online acoustic environmental workbench and discuss a number of approaches to scaling analysis of acoustic data; online collaboration, manual, automatic and human-in-the loop analysis.
Resumo:
The traffic conflict technique (TCT) is a powerful technique applied in road traffic safety assessment as a surrogate of the traditional accident data analysis. It has subdued the conceptual and implemental weaknesses of the accident statistics. Although this technique has been applied effectively in road traffic, it has not been practised well in marine traffic even though this traffic system has some distinct advantages in terms of having a monitoring system. This monitoring system can provide navigational information as well as other geometric information of the ships for a larger study area over a longer time period. However, for implementing the TCT in the marine traffic system, it should be examined critically to suit the complex nature of the traffic system. This paper examines the suitability of the TCT to be applied to marine traffic and proposes a framework for a follow up comprehensive conflict study.
Resumo:
Objective The aim of this study was to demonstrate the potential of near-infrared (NIR) spectroscopy for categorizing cartilage degeneration induced in animal models. Method Three models of osteoarthritic degeneration were induced in laboratory rats via one of the following methods: (i) menisectomy (MSX); (ii) anterior cruciate ligament transaction (ACLT); and (iii) intra-articular injection of mono-ido-acetete (1 mg) (MIA), in the right knee joint, with 12 rats per model group. After 8 weeks, the animals were sacrificed and tibial knee joints were collected. A custom-made nearinfrared (NIR) probe of diameter 5 mm was placed on the cartilage surface and spectral data were acquired from each specimen in the wavenumber range 4 000 – 12 500 cm−1. Following spectral data acquisition, the specimens were fixed and Safranin–O staining was performed to assess disease severity based on the Mankin scoring system. Using multivariate statistical analysis based on principal component analysis and partial least squares regression, the spectral data were then related to the Mankinscores of the samples tested. Results Mild to severe degenerative cartilage changes were observed in the subject animals. The ACLT models showed mild cartilage degeneration, MSX models moderate, and MIA severe cartilage degenerative changes both morphologically and histologically. Our result demonstrate that NIR spectroscopic information is capable of separating the cartilage samples into different groups relative to the severity of degeneration, with NIR correlating significantly with their Mankinscore (R2 = 88.85%). Conclusion We conclude that NIR is a viable tool for evaluating articularcartilage health and physical properties such as change in thickness with degeneration.
Resumo:
China continues to face great challenges in meeting the health needs of its large population. The challenges are not just lack of resources, but also how to use existing resources more efficiently, more effectively, and more equitably. Now a major unaddressed challenge facing China is how to reform an inefficient, poorly organized health care delivery system. The objective of this study is to analyze the role of private health care provision in China and discuss the implications of increasing private-sector development for improving health system performance. This study is based on an extensive literature review, the purpose of which was to identify, summarize, and evaluate ideas and information on private health care provision in China. In addition, the study uses secondary data analysis and the results of previous study by the authors to highlight the current situation of private health care provision in one province of China. This study found that government-owned hospitals form the backbone of the health care system and also account for most health care service provision. However, even though the public health care system is constantly trying to adapt to population needs and improve its performance, there are many problems in the system, such as limited access, low efficiency, poor quality, cost inflation, and low patient satisfaction. Currently, private hospitals are relatively rare, and private health care as an important component of the health care system in China has received little policy attention. It is argued that policymakers in China should recognize the role of private health care provision for health system performance, and then define and achieve an appropriate role for private health care provision in helping to respond to the many challenges facing the health system in present-day China.