Biblioteca Digital

169 resultados para Failure time data analysis

Methods and preliminary results for a data linkage project to determine long-term survival after intensive care unit admission

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims: To describe a local data linkage project to match hospital data with the Australian Institute of Health and Welfare (AIHW) National Death Index (NDI) to assess longterm outcomes of intensive care unit patients. Methods: Data were obtained from hospital intensive care and cardiac surgery databases on all patients aged 18 years and over admitted to either of two intensive care units at a tertiary-referral hospital between 1 January 1994 and 31 December 2005. Date of death was obtained from the AIHW NDI by probabilistic software matching, in addition to manual checking through hospital databases and other sources. Survival was calculated from time of ICU admission, with a censoring date of 14 February 2007. Data for patients with multiple hospital admissions requiring intensive care were analysed only from the first admission. Summary and descriptive statistics were used for preliminary data analysis. Kaplan-Meier survival analysis was used to analyse factors determining long-term survival. Results: During the study period, 21 415 unique patients had 22 552 hospital admissions that included an ICU admission; 19 058 surgical procedures were performed with a total of 20 092 ICU admissions. There were 4936 deaths. Median follow-up was 6.2 years, totalling 134 203 patient years. The casemix was predominantly cardiac surgery (80%), followed by cardiac medical (6%), and other medical (4%). The unadjusted survival at 1, 5 and 10 years was 97%, 84% and 70%, respectively. The 1-year survival ranged from 97% for cardiac surgery to 36% for cardiac arrest. An APACHE II score was available for 16 877 patients. In those discharged alive from hospital, the 1, 5 and 10-year survival varied with discharge location. Conclusions: ICU-based linkage projects are feasible to determine long-term outcomes of ICU patients

Micro-family-owned business : transitory survival mechanism or long term wealth creator?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal Topic: Entrepreneurship is key to employment, innovation and growth (Acs & Mueller, 2008), and as such, has been the subject of tremendous research in both the economic and management literatures since Solow (1957), Schumpeter (1934, 1943), and Penrose (1959). The presence of entrepreneurs in the economy is a key factor in the success or failure of countries to grow (Audretsch and Thurik, 2001; Dejardin, 2001). Further studies focus on the conditions of existence of entrepreneurship, influential factors invoked are historical, cultural, social, institutional, or purely economic (North, 1997; Thurik 1996 & 1999). Of particular interest, beyond the reasons behind the existence of entrepreneurship, are entrepreneurial survival and good ''performance'' factors. Using cross-country firm data analysis, La Porta & Schleifer (2008) confirm that informal micro-businesses provide on average half of all economic activity in developing countries. They find that these are utterly unproductive compared to formal firms, and conclude that the informal sector serves as a social security net ''keep[ing] millions of people alive, but disappearing over time'' (abstract). Robison (1986), Hill (1996, 1997) posit that the Indonesian government under Suharto always pointed to the lack of indigenous entrepreneurship , thereby motivating the nationalisation of all industries. Furthermore, the same literature also points to the fact that small businesses were mostly left out of development programmes because they were supposed less productive and having less productivity potential than larger ones. Vial (2008) challenges this view and shows that small firms represent about 70% of firms, 12% of total output, but contribute to 25% of total factor productivity growth on average over the period 1975-94 in the industrial sector (Table 10, p.316). ---------- Methodology/Key Propositions: A review of the empirical literature points at several under-researched questions. Firstly, we assess whether there is, evidence of small family-business entrepreneurship in Indonesia. Secondly, we examine and present the characteristics of these enterprises, along with the size of the sector, and its dynamics. Thirdly, we study whether these enterprises underperform compared to the larger scale industrial sector, as it is suggested in the literature. We reconsider performance measurements for micro-family owned businesses. We suggest that, beside productivity measures, performance could be appraised by both the survival probability of the firm, and by the amount of household assets formation. We compare micro-family-owned and larger industrial firms' survival probabilities after the 1997 crisis, their capital productivity, then compare household assets of families involved in business with those who do not. Finally, we examine human and social capital as moderators of enterprises' performance. In particular, we assess whether a higher level of education and community participation have an effect on the likelihood of running a family business, and whether it has an impact on households' assets level. We use the IFLS database compiled and published by RAND Corporation. The data is a rich community, households, and individuals panel dataset in four waves: 1993, 1997, 2000, 2007. We now focus on the waves 1997 and 2000 in order to investigate entrepreneurship behaviours in turbulent times, i.e. the 1997 Asian crisis. We use aggregate individual data, and focus on households data in order to study micro-family-owned businesses. IFLS data covers roughly 7,600 households in 1997 and over 10,000 households in 2000, with about 95% of 1997 households re-interviewed in 2000. Households were interviewed in 13 of the 27 provinces as defined before 2001. Those 13 provinces were targeted because accounting for 83% of the population. A full description of the data is provided in Frankenberg and Thomas (2000), and Strauss et alii (2004). We deflate all monetary values in Rupiah with the World Development Indicators Consumer Price Index base 100 in 2000. ---------- Results and Implications: We find that in Indonesia, entrepreneurship is widespread and two thirds of households hold one or several family businesses. In rural areas, in 2000, 75% of households run one or several businesses. The proportion of households holding both a farm and a non farm business is higher in rural areas, underlining the reliance of rural households on self-employment, especially after the crisis. Those businesses come in various sizes from very small to larger ones. The median business production value represents less than the annual national minimum wage. Figures show that at least 75% of farm businesses produce less than the annual minimum wage, with non farm businesses being more numerous to produce the minimum wage. However, this is only one part of the story, as production is not the only ''output'' or effect of the business. We show that the survival rate of those businesses ranks between 70 and 82% after the 1997 crisis, which contrasts with the 67% survival rate for the formal industrial sector (Ter Wengel & Rodriguez, 2006). Micro Family Owned Businesses might be relatively small in terms of production, they also provide stability in times of crisis. For those businesses that provide business assets figures, we show that capital productivity is fairly high, with rates that are ten times higher for non farm businesses. Results show that households running a business have larger family assets, and households are better off in urban areas. We run a panel logit model in order to test the effect of human and social capital on the existence of businesses among households. We find that non farm businesses are more likely to appear in households with higher human and social capital situated in urban areas. Farm businesses are more likely to appear in lower human capital and rural contexts, while still being supported by community participation. The estimation of our panel data model confirm that households are more likely to have higher family assets if situated in urban area, the higher the education level, the larger the assets, and running a business increase the likelihood of having larger assets. This is especially true for non farm businesses that have a clearly larger and more significant effect on assets than farm businesses. Finally, social capital in the form of community participation also has a positive effect on assets. Those results confirm the existence of a strong entrepreneurship culture among Indonesian households. Investigating survival rates also shows that those businesses are quite stable, even in the face of a violent crisis such as the 1997 one, and as a result, can provide a safety net. Finally, considering household assets - the returns of business to the household, rather than profit or productivity - the returns of business to itself, shows that households running a business are better off. While we demonstrate that uman and social capital are key to business existence, survival and performance, those results open avenues for further research regarding the factors that could hamper growth of those businesses in terms of output and employment.

Analysis of acoustic emission data for structural health monitoring applications

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic emission (AE) is the phenomenon where high frequency stress waves are generated by rapid release of energy within a material by sources such as crack initiation or growth. AE technique involves recording these stress waves by means of sensors placed on the surface and subsequent analysis of the recorded signals to gather information such as the nature and location of the source. It is one of the several diagnostic techniques currently used for structural health monitoring (SHM) of civil infrastructure such as bridges. Some of its advantages include ability to provide continuous in-situ monitoring and high sensitivity to crack activity. But several challenges still exist. Due to high sampling rate required for data capture, large amount of data is generated during AE testing. This is further complicated by the presence of a number of spurious sources that can produce AE signals which can then mask desired signals. Hence, an effective data analysis strategy is needed to achieve source discrimination. This also becomes important for long term monitoring applications in order to avoid massive date overload. Analysis of frequency contents of recorded AE signals together with the use of pattern recognition algorithms are some of the advanced and promising data analysis approaches for source discrimination. This paper explores the use of various signal processing tools for analysis of experimental data, with an overall aim of finding an improved method for source identification and discrimination, with particular focus on monitoring of steel bridges.

New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Quantitative approaches to content analysis : identifying conceptual drift across publication outlets

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.

A Bayesian analysis of an agricultural field trial with three spatial dimensions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern technology now has the ability to generate large datasets over space and time. Such data typically exhibit high autocorrelations over all dimensions. The field trial data motivating the methods of this paper were collected to examine the behaviour of traditional cropping and to determine a cropping system which could maximise water use for grain production while minimising leakage below the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18 columns, in the lattice framework of an agricultural field. Bayesian conditional autoregressive (CAR) models are used to account for local site correlations. Conditional autoregressive models have not been widely used in analyses of agricultural data. This paper serves to illustrate the usefulness of these models in this field, along with the ease of implementation in WinBUGS, a freely available software package. The innovation is the fitting of separate conditional autoregressive models for each depth layer, the ‘layered CAR model’, while simultaneously estimating depth profile functions for each site treatment. Modelling interest also lay in how best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbourhood models by depth layer. It is hierarchical, with separate onditional autoregressive spatial variance components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth errors as interval-censored measurement error. The Bayesian framework permits transparent specification and easy comparison of the various complex models compared.

A comparison of two commercially available motion capture systems for gait analysis : high end vs low-cost

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 1999 Richards compared the accuracy of commercially available motion capture systems commonly used in biomechanics. Richards identified that in static tests the optical motion capture systems generally produced RMS errors of less than 1.0 mm. During dynamic tests, the RMS error increased to up to 4.2 mm in some systems. In the last 12 years motion capture systems have continued to evolve and now include high-resolution CCD or CMOS image sensors, wireless communication, and high full frame sampling frequencies. In addition to hardware advances, there have also been a number of advances in software, which includes improved calibration and tracking algorithms, real time data streaming, and the introduction of the c3d standard. These advances have allowed the system manufactures to maintain a high retail price in the name of advancement. In areas such as gait analysis and ergonomics many of the advanced features such as high resolution image sensors and high sampling frequencies are not required due to the nature of the task often investigated. Recently Natural Point introduced low cost cameras, which on face value appear to be suitable as at very least a high quality teaching tool in biomechanics and possibly even a research tool when coupled with the correct calibration and tracking software. The aim of the study was therefore to compare both the linear accuracy and quality of angular kinematics from a typical high end motion capture system and a low cost system during a simple task.

Change Point Estimation in Monitoring Survival Time

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Precise identification of the time when a change in a hospital outcome has occurred enables clinical experts to search for a potential special cause more effectively. In this paper, we develop change point estimation methods for survival time of a clinical procedure in the presence of patient mix in a Bayesian framework. We apply Bayesian hierarchical models to formulate the change point where there exists a step change in the mean survival time of patients who underwent cardiac surgery. The data are right censored since the monitoring is conducted over a limited follow-up period. We capture the effect of risk factors prior to the surgery using a Weibull accelerated failure time regression model. Markov Chain Monte Carlo is used to obtain posterior distributions of the change point parameters including location and magnitude of changes and also corresponding probabilistic intervals and inferences. The performance of the Bayesian estimator is investigated through simulations and the result shows that precise estimates can be obtained when they are used in conjunction with the risk-adjusted survival time CUSUM control charts for different magnitude scenarios. The proposed estimator shows a better performance where a longer follow-up period, censoring time, is applied. In comparison with the alternative built-in CUSUM estimator, more accurate and precise estimates are obtained by the Bayesian estimator. These superiorities are enhanced when probability quantification, flexibility and generalizability of the Bayesian change point detection model are also considered.

On selecting an optimal wavelet for detecting singularities in traffic and vehicular data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Serving as a powerful tool for extracting localized variations in non-stationary signals, applications of wavelet transforms (WTs) in traffic engineering have been introduced; however, lacking in some important theoretical fundamentals. In particular, there is little guidance provided on selecting an appropriate WT across potential transport applications. This research described in this paper contributes uniquely to the literature by first describing a numerical experiment to demonstrate the shortcomings of commonly-used data processing techniques in traffic engineering (i.e., averaging, moving averaging, second-order difference, oblique cumulative curve, and short-time Fourier transform). It then mathematically describes WT’s ability to detect singularities in traffic data. Next, selecting a suitable WT for a particular research topic in traffic engineering is discussed in detail by objectively and quantitatively comparing candidate wavelets’ performances using a numerical experiment. Finally, based on several case studies using both loop detector data and vehicle trajectories, it is shown that selecting a suitable wavelet largely depends on the specific research topic, and that the Mexican hat wavelet generally gives a satisfactory performance in detecting singularities in traffic and vehicular data.

An investigation on multi-vehicle motorcycle crashes using log-linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motorcyclists are the most crash-prone road-user group in many Asian countries including Singapore; however, factors influencing motorcycle crashes are still not well understood. This study examines the effects of various roadway characteristics, traffic control measures and environmental factors on motorcycle crashes at different location types including expressways and intersections. Using techniques of categorical data analysis, this study has developed a set of log-linear models to investigate multi-vehicle motorcycle crashes in Singapore. Motorcycle crash risks in different circumstances have been calculated after controlling for the exposure estimated by the induced exposure technique. Results show that night-time influence increases crash risks of motorcycles particularly during merging and diverging manoeuvres on expressways, and turning manoeuvres at intersections. Riders appear to exercise more care while riding on wet road surfaces particularly during night. Many hazardous interactions at intersections tend to be related to the failure of drivers to notice a motorcycle as well as to judge correctly the speed/distance of an oncoming motorcycle. Road side conflicts due to stopping/waiting vehicles and interactions with opposing traffic on undivided roads have been found to be as detrimental factors on motorcycle safety along arterial, main and local roads away from intersections. Based on the findings of this study, several targeted countermeasures in the form of legislations, rider training, and safety awareness programmes have been recommended.

Analysis of marine conflicts

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The traffic conflict technique (TCT) is a powerful technique applied in road traffic safety assessment as a surrogate of the traditional accident data analysis. It has subdued the conceptual and implemental weaknesses of the accident statistics. Although this technique has been applied effectively in road traffic, it has not been practised well in marine traffic even though this traffic system has some distinct advantages in terms of having a monitoring system. This monitoring system can provide navigational information as well as other geometric information of the ships for a larger study area over a longer time period. However, for implementing the TCT in the marine traffic system, it should be examined critically to suit the complex nature of the traffic system. This paper examines the suitability of the TCT to be applied to marine traffic and proposes a framework for a follow up comprehensive conflict study.

Development of a data mining-based analysis framework for multi-attribute construction project information

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining techniques extract repeated and useful patterns from a large data set that in turn are utilized to predict the outcome of future events. The main purpose of the research presented in this paper is to investigate data mining strategies and develop an efficient framework for multi-attribute project information analysis to predict the performance of construction projects. The research team first reviewed existing data mining algorithms, applied them to systematically analyze a large project data set collected by the survey, and finally proposed a data-mining-based decision support framework for project performance prediction. To evaluate the potential of the framework, a case study was conducted using data collected from 139 capital projects and analyzed the relationship between use of information technology and project cost performance. The study results showed that the proposed framework has potential to promote fast, easy to use, interpretable, and accurate project data analysis.

Asset health prediction using the explicit hazard model

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ability to estimate the asset reliability and the probability of failure is critical to reducing maintenance costs, operation downtime, and safety hazards. Predicting the survival time and the probability of failure in future time is an indispensable requirement in prognostics and asset health management. In traditional reliability models, the lifetime of an asset is estimated using failure event data, alone; however, statistically sufficient failure event data are often difficult to attain in real-life situations due to poor data management, effective preventive maintenance, and the small population of identical assets in use. Condition indicators and operating environment indicators are two types of covariate data that are normally obtained in addition to failure event and suspended data. These data contain significant information about the state and health of an asset. Condition indicators reflect the level of degradation of assets while operating environment indicators accelerate or decelerate the lifetime of assets. When these data are available, an alternative approach to the traditional reliability analysis is the modelling of condition indicators and operating environment indicators and their failure-generating mechanisms using a covariate-based hazard model. The literature review indicates that a number of covariate-based hazard models have been developed. All of these existing covariate-based hazard models were developed based on the principle theory of the Proportional Hazard Model (PHM). However, most of these models have not attracted much attention in the field of machinery prognostics. Moreover, due to the prominence of PHM, attempts at developing alternative models, to some extent, have been stifled, although a number of alternative models to PHM have been suggested. The existing covariate-based hazard models neglect to fully utilise three types of asset health information (including failure event data (i.e. observed and/or suspended), condition data, and operating environment data) into a model to have more effective hazard and reliability predictions. In addition, current research shows that condition indicators and operating environment indicators have different characteristics and they are non-homogeneous covariate data. Condition indicators act as response variables (or dependent variables) whereas operating environment indicators act as explanatory variables (or independent variables). However, these non-homogenous covariate data were modelled in the same way for hazard prediction in the existing covariate-based hazard models. The related and yet more imperative question is how both of these indicators should be effectively modelled and integrated into the covariate-based hazard model. This work presents a new approach for addressing the aforementioned challenges. The new covariate-based hazard model, which termed as Explicit Hazard Model (EHM), explicitly and effectively incorporates all three available asset health information into the modelling of hazard and reliability predictions and also drives the relationship between actual asset health and condition measurements as well as operating environment measurements. The theoretical development of the model and its parameter estimation method are demonstrated in this work. EHM assumes that the baseline hazard is a function of the both time and condition indicators. Condition indicators provide information about the health condition of an asset; therefore they update and reform the baseline hazard of EHM according to the health state of asset at given time t. Some examples of condition indicators are the vibration of rotating machinery, the level of metal particles in engine oil analysis, and wear in a component, to name but a few. Operating environment indicators in this model are failure accelerators and/or decelerators that are included in the covariate function of EHM and may increase or decrease the value of the hazard from the baseline hazard. These indicators caused by the environment in which an asset operates, and that have not been explicitly identified by the condition indicators (e.g. Loads, environmental stresses, and other dynamically changing environment factors). While the effects of operating environment indicators could be nought in EHM; condition indicators could emerge because these indicators are observed and measured as long as an asset is operational and survived. EHM has several advantages over the existing covariate-based hazard models. One is this model utilises three different sources of asset health data (i.e. population characteristics, condition indicators, and operating environment indicators) to effectively predict hazard and reliability. Another is that EHM explicitly investigates the relationship between condition and operating environment indicators associated with the hazard of an asset. Furthermore, the proportionality assumption, which most of the covariate-based hazard models suffer from it, does not exist in EHM. According to the sample size of failure/suspension times, EHM is extended into two forms: semi-parametric and non-parametric. The semi-parametric EHM assumes a specified lifetime distribution (i.e. Weibull distribution) in the form of the baseline hazard. However, for more industry applications, due to sparse failure event data of assets, the analysis of such data often involves complex distributional shapes about which little is known. Therefore, to avoid the restrictive assumption of the semi-parametric EHM about assuming a specified lifetime distribution for failure event histories, the non-parametric EHM, which is a distribution free model, has been developed. The development of EHM into two forms is another merit of the model. A case study was conducted using laboratory experiment data to validate the practicality of the both semi-parametric and non-parametric EHMs. The performance of the newly-developed models is appraised using the comparison amongst the estimated results of these models and the other existing covariate-based hazard models. The comparison results demonstrated that both the semi-parametric and non-parametric EHMs outperform the existing covariate-based hazard models. Future research directions regarding to the new parameter estimation method in the case of time-dependent effects of covariates and missing data, application of EHM in both repairable and non-repairable systems using field data, and a decision support model in which linked to the estimated reliability results, are also identified.

Hazard based models for freeway traffic incident duration

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Assessing and prioritising cost-effective strategies to mitigate the impacts of traffic incidents and accidents on non-recurrent congestion on major roads represents a significant challenge for road network managers. This research examines the influence of numerous factors associated with incidents of various types on their duration. It presents a comprehensive traffic incident data mining and analysis by developing an incident duration model based on twelve months of incident data obtained from the Australian freeway network. Parametric accelerated failure time (AFT) survival models of incident duration were developed, including log-logistic, lognormal, and Weibul-considering both fixed and random parameters, as well as a Weibull model with gamma heterogeneity. The Weibull AFT models with random parameters were appropriate for modelling incident duration arising from crashes and hazards. A Weibull model with gamma heterogeneity was most suitable for modelling incident duration of stationary vehicles. Significant variables affecting incident duration include characteristics of the incidents (severity, type, towing requirements, etc.), and location, time of day, and traffic characteristics of the incident. Moreover, the findings reveal no significant effects of infrastructure and weather on incident duration. A significant and unique contribution of this paper is that the durations of each type of incident are uniquely different and respond to different factors. The results of this study are useful for traffic incident management agencies to implement strategies to reduce incident duration, leading to reduced congestion, secondary incidents, and the associated human and economic losses.

Collective sampling and analysis of high order tensors for chatroom communications

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work investigates the accuracy and efficiency tradeoffs between centralized and collective (distributed) algorithms for (i) sampling, and (ii) n-way data analysis techniques in multidimensional stream data, such as Internet chatroom communications. Its contributions are threefold. First, we use the Kolmogorov-Smirnov goodness-of-fit test to show that statistical differences between real data obtained by collective sampling in time dimension from multiple servers and that of obtained from a single server are insignificant. Second, we show using the real data that collective data analysis of 3-way data arrays (users x keywords x time) known as high order tensors is more efficient than centralized algorithms with respect to both space and computational cost. Furthermore, we show that this gain is obtained without loss of accuracy. Third, we examine the sensitivity of collective constructions and analysis of high order data tensors to the choice of server selection and sampling window size. We construct 4-way tensors (users x keywords x time x servers) and analyze them to show the impact of server and window size selections on the results.

«
1
2
...
4
5
6
7
8
9
10
11
12
»