278 resultados para INTEGRATIVE DATA-ANALYSIS


Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper provides fundamental understanding for the use of cumulative plots for travel time estimation on signalized urban networks. Analytical modeling is performed to generate cumulative plots based on the availability of data: a) Case-D, for detector data only; b) Case-DS, for detector data and signal timings; and c) Case-DSS, for detector data, signal timings and saturation flow rate. The empirical study and sensitivity analysis based on simulation experiments have observed the consistency in performance for Case-DS and Case-DSS, whereas, for Case-D the performance is inconsistent. Case-D is sensitive to detection interval and signal timings within the interval. When detection interval is integral multiple of signal cycle then it has low accuracy and low reliability. Whereas, for detection interval around 1.5 times signal cycle both accuracy and reliability are high.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we present a sequential Monte Carlo algorithm for Bayesian sequential experimental design applied to generalised non-linear models for discrete data. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. We also consider a flexible parametric model for the stimulus-response relationship together with a newly developed hybrid design utility that can produce more robust estimates of the target stimulus in the presence of substantial model and parameter uncertainty. The algorithm is applied to hypothetical clinical trial or bioassay scenarios. In the discussion, potential generalisations of the algorithm are suggested to possibly extend its applicability to a wide variety of scenarios

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Modern technology now has the ability to generate large datasets over space and time. Such data typically exhibit high autocorrelations over all dimensions. The field trial data motivating the methods of this paper were collected to examine the behaviour of traditional cropping and to determine a cropping system which could maximise water use for grain production while minimising leakage below the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18 columns, in the lattice framework of an agricultural field. Bayesian conditional autoregressive (CAR) models are used to account for local site correlations. Conditional autoregressive models have not been widely used in analyses of agricultural data. This paper serves to illustrate the usefulness of these models in this field, along with the ease of implementation in WinBUGS, a freely available software package. The innovation is the fitting of separate conditional autoregressive models for each depth layer, the ‘layered CAR model’, while simultaneously estimating depth profile functions for each site treatment. Modelling interest also lay in how best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbourhood models by depth layer. It is hierarchical, with separate onditional autoregressive spatial variance components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth errors as interval-censored measurement error. The Bayesian framework permits transparent specification and easy comparison of the various complex models compared.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Serving as a powerful tool for extracting localized variations in non-stationary signals, applications of wavelet transforms (WTs) in traffic engineering have been introduced; however, lacking in some important theoretical fundamentals. In particular, there is little guidance provided on selecting an appropriate WT across potential transport applications. This research described in this paper contributes uniquely to the literature by first describing a numerical experiment to demonstrate the shortcomings of commonly-used data processing techniques in traffic engineering (i.e., averaging, moving averaging, second-order difference, oblique cumulative curve, and short-time Fourier transform). It then mathematically describes WT’s ability to detect singularities in traffic data. Next, selecting a suitable WT for a particular research topic in traffic engineering is discussed in detail by objectively and quantitatively comparing candidate wavelets’ performances using a numerical experiment. Finally, based on several case studies using both loop detector data and vehicle trajectories, it is shown that selecting a suitable wavelet largely depends on the specific research topic, and that the Mexican hat wavelet generally gives a satisfactory performance in detecting singularities in traffic and vehicular data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Knowledge has been recognised as a powerful yet intangible asset, which is difficult to manage. This is especially true in a project environment where there is the potential to repeat mistakes, rather than learn from previous experiences. The literature in the project management field has recognised the importance of knowledge sharing (KS) within and between projects. However, studies in that field focus primarily on KS mechanisms including lessons learned (LL) and post project reviews as the source of knowledge for future projects, and only some preliminary research has been carried out on the aspects of project management offices (PMOs) and organisational culture (OC) in KS. This study undertook to investigate KS behaviours in an inter-project context, with a particular emphasis on the role of trust, OC and a range of knowledge sharing mechanisms (KSM) in achieving successful inter-project knowledge sharing (I-PKS). An extensive literature search resulted in the development of an I-PKS Framework, which defined the scope of the research and shaped its initial design. The literature review indicated that existing research relating to the three factors of OC, trust and KSM remains inadequate in its ability to fully explain the role of these contextual factors. In particular, the literature review identified these areas of interest: (1) the conflicting answers to some of the major questions related to KSM, (2) the limited empirical research on the role of different trust dimensions, (3) limited empirical evidence of the role of OC in KS, and (4) the insufficient research on KS in an inter-project context. The resulting Framework comprised the three main factors including: OC, trust and KSM, demonstrating a more integrated view of KS in the inter-project context. Accordingly, the aim of this research was to examine the relationships between these three factors and KS by investigating behaviours related to KS from the project managers‘ (PMs‘) perspective. In order to achieve the aim, this research sought to answer the following research questions: 1. How does organisational culture influence inter-project knowledge sharing? 2. How does the existence of three forms of trust — (i) ability, (ii) benevolence and (iii) integrity — influence inter-project knowledge sharing? 3. How can different knowledge sharing mechanisms (relational, project management tools and process, and technology) improve inter-project knowledge sharing behaviours? 4. How do the relationships between these three factors of organisational culture, trust and knowledge sharing mechanisms improve inter-project knowledge sharing? a. What are the relationships between the factors? b. What is the best fit for given cases to ensure more effective inter-project knowledge sharing? Using multiple case studies, this research was designed to build propositions emerging from cross-case data analysis. The four cases were chosen on the basis of theoretical sampling. All cases were large project-based organisations (PBOs), with a strong matrix-type structure, as per the typology proposed by the Project Management Body of Knowledge (PMBoK) (2008). Data were collected from project management departments of the respective organisations. A range of analytical techniques were used to deal with the data including pattern matching logic and explanation building analysis, complemented by the use of NVivo for data coding and management. Propositions generated at the end of the analyses were further compared with the extant literature, and practical implications based on the data and literature were suggested in order to improve I-PKS. Findings from this research conclude that OC, trust, and KSM contribute to inter-project knowledge sharing, and suggest the existence of relationships between these factors. In view of that, this research identified the relationships between different trust dimensions, suggesting that integrity trust reinforces the relationship between ability trust and knowledge sharing. Furthermore, this research demonstrated that characteristics of culture and trust interact to reinforce preferences for mechanisms of knowledge sharing. This means that cultures that facilitate characteristics of Clan type are more likely to result in trusting relationships, hence are more likely to use organic sources of knowledge for both tacit and explicit knowledge exchange. In contrast, cultures that are empirically driven, based on control, efficiency, and measures (characteristics of Hierarchy and Market types) display tendency to develop trust primarily in ability of non-organic sources, and therefore use these sources to share mainly explicit knowledge. This thesis contributes to the project management literature by providing a more integrative view of I-PKS, bringing the factors of OC, trust and KSM into the picture. A further contribution is related to the use of collaborative tools as a substitute for static LL databases and as a facilitator for tacit KS between geographically dispersed projects. This research adds to the literature on OC by providing rich empirical evidence of the relationships between OC and the willingness to share knowledge, and by providing empirical evidence that OC has an effect on trust; in doing so this research extends the theoretical propositions outlined by previous research. This study also extends the research on trust by identifying the relationships between different trust dimensions, suggesting that integrity trust reinforces the relationship between ability trust and KS. Finally, this research provides some directions for future studies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Monitoring environmental health is becoming increasingly important as human activity and climate change place greater pressure on global biodiversity. Acoustic sensors provide the ability to collect data passively, objectively and continuously across large areas for extended periods. While these factors make acoustic sensors attractive as autonomous data collectors, there are significant issues associated with large-scale data manipulation and analysis. We present our current research into techniques for analysing large volumes of acoustic data efficiently. We provide an overview of a novel online acoustic environmental workbench and discuss a number of approaches to scaling analysis of acoustic data; online collaboration, manual, automatic and human-in-the loop analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The traffic conflict technique (TCT) is a powerful technique applied in road traffic safety assessment as a surrogate of the traditional accident data analysis. It has subdued the conceptual and implemental weaknesses of the accident statistics. Although this technique has been applied effectively in road traffic, it has not been practised well in marine traffic even though this traffic system has some distinct advantages in terms of having a monitoring system. This monitoring system can provide navigational information as well as other geometric information of the ships for a larger study area over a longer time period. However, for implementing the TCT in the marine traffic system, it should be examined critically to suit the complex nature of the traffic system. This paper examines the suitability of the TCT to be applied to marine traffic and proposes a framework for a follow up comprehensive conflict study.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

China continues to face great challenges in meeting the health needs of its large population. The challenges are not just lack of resources, but also how to use existing resources more efficiently, more effectively, and more equitably. Now a major unaddressed challenge facing China is how to reform an inefficient, poorly organized health care delivery system. The objective of this study is to analyze the role of private health care provision in China and discuss the implications of increasing private-sector development for improving health system performance. This study is based on an extensive literature review, the purpose of which was to identify, summarize, and evaluate ideas and information on private health care provision in China. In addition, the study uses secondary data analysis and the results of previous study by the authors to highlight the current situation of private health care provision in one province of China. This study found that government-owned hospitals form the backbone of the health care system and also account for most health care service provision. However, even though the public health care system is constantly trying to adapt to population needs and improve its performance, there are many problems in the system, such as limited access, low efficiency, poor quality, cost inflation, and low patient satisfaction. Currently, private hospitals are relatively rare, and private health care as an important component of the health care system in China has received little policy attention. It is argued that policymakers in China should recognize the role of private health care provision for health system performance, and then define and achieve an appropriate role for private health care provision in helping to respond to the many challenges facing the health system in present-day China.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: National physical activity data suggest that there is a considerable difference in physical activity levels of US and Australian adults. Although different surveys (Active Australia and BRFSS) are used, the questions are similar. Different protocols, however, are used to estimate “activity” from the data collected. The primary aim of this study was to assess whether the 2 approaches to the management of PA data could explain some of the difference in prevalence estimates derived from the two national surveys. Methods: Secondary data analysis of the most recent AA survey (N = 2987). Results: 15% of the sample was defined as “active” using Australian criteria but as “inactive” using the BRFSS protocol, even though weekly energy expenditure was commensurate with meeting current guidelines. Younger respondents (age < 45 y) were more likely to be “misclassified” using the BRFSS criteria. Conclusions: The prevalence of activity in Australia and the US appears to be more similar than we had previously thought.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Open the sports or business section of your daily newspaper, and you are immediately bombarded with an array of graphs, tables, diagrams, and statistical reports that require interpretation. Across all walks of life, the need to understand statistics is fundamental. Given that our youngsters’ future world will be increasingly data laden, scaffolding their statistical understanding and reasoning is imperative, from the early grades on. The National Council of Teachers of Mathematics (NCTM) continues to emphasize the importance of early statistical learning; data analysis and probability was the Council’s professional development “Focus of the Year” for 2007–2008. We need such a focus, especially given the results of the statistics items from the 2003 NAEP. As Shaughnessy (2007) noted, students’ performance was weak on more complex items involving interpretation or application of items of information in graphs and tables. Furthermore, little or no gains were made between the 2000 NAEP and the 2003 NAEP studies. One approach I have taken to promote young children’s statistical reasoning is through data modeling. Having implemented in grades 3 –9 a number of model-eliciting activities involving working with data (e.g., English 2010), I observed how competently children could create their own mathematical ideas and representations—before being instructed how to do so. I thus wished to introduce data-modeling activities to younger children, confi dent that they would likewise generate their own mathematics. I recently implemented data-modeling activities in a cohort of three first-grade classrooms of six year- olds. I report on some of the children’s responses and discuss the components of data modeling the children engaged in.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work identifies the limitations of n-way data analysis techniques in multidimensional stream data, such as Internet chat room communications data, and establishes a link between data collection and performance of these techniques. Its contributions are twofold. First, it extends data analysis to multiple dimensions by constructing n-way data arrays known as high order tensors. Chat room tensors are generated by a simulator which collects and models actual communication data. The accuracy of the model is determined by the Kolmogorov-Smirnov goodness-of-fit test which compares the simulation data with the observed (real) data. Second, a detailed computational comparison is performed to test several data analysis techniques including svd [1], and multi-way techniques including Tucker1, Tucker3 [2], and Parafac [3].

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results.