814 resultados para Hierarchical clustering model


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is one of several ongoing studies conducted within the IT Professional Services (ITPS) research programme at Queensland University of Technology (QUT). In 2003, ITPS introduced the IS-Impact model, a measurement model for measuring information systems success from the viewpoint of multiple stakeholders. The model, along with its instrument, is robust, simple, yet generalisable, and yields results that are comparable across time, stakeholders, different systems and system contexts. The IS-Impact model is defined as “a measure at a point in time, of the stream of net benefits from the Information System (IS), to date and anticipated, as perceived by all key-user-groups”. The model represents four dimensions, which are ‘Individual Impact’, ‘Organizational Impact’, ‘Information Quality’ and ‘System Quality’. The two Impact dimensions measure the up-to-date impact of the evaluated system, while the remaining two Quality dimensions act as proxies for probable future impacts (Gable, Sedera & Chan, 2008). To fulfil the goal of ITPS, “to develop the most widely employed model” this research re-validates and extends the IS-Impact model in a new context. This method/context-extension research aims to test the generalisability of the model by addressing known limitations of the model. One of the limitations of the model relates to the extent of external validity of the model. In order to gain wide acceptance, a model should be consistent and work well in different contexts. The IS-Impact model, however, was only validated in the Australian context, and packaged software was chosen as the IS understudy. Thus, this study is concerned with whether the model can be applied in another different context. Aiming for a robust and standardised measurement model that can be used across different contexts, this research re-validates and extends the IS-Impact model and its instrument to public sector organisations in Malaysia. The overarching research question (managerial question) of this research is “How can public sector organisations in Malaysia measure the impact of information systems systematically and effectively?” With two main objectives, the managerial question is broken down into two specific research questions. The first research question addresses the applicability (relevance) of the dimensions and measures of the IS-Impact model in the Malaysian context. Moreover, this research question addresses the completeness of the model in the new context. Initially, this research assumes that the dimensions and measures of the IS-Impact model are sufficient for the new context. However, some IS researchers suggest that the selection of measures needs to be done purposely for different contextual settings (DeLone & McLean, 1992, Rai, Lang & Welker, 2002). Thus, the first research question is as follows, “Is the IS-Impact model complete for measuring the impact of IS in Malaysian public sector organisations?” [RQ1]. The IS-Impact model is a multidimensional model that consists of four dimensions or constructs. Each dimension is represented by formative measures or indicators. Formative measures are known as composite variables because these measures make up or form the construct, or, in this case, the dimension in the IS-Impact model. These formative measures define different aspects of the dimension, thus, a measurement model of this kind needs to be tested not just on the structural relationship between the constructs but also the validity of each measure. In a previous study, the IS-Impact model was validated using formative validation techniques, as proposed in the literature (i.e., Diamantopoulos and Winklhofer, 2001, Diamantopoulos and Siguaw, 2006, Petter, Straub and Rai, 2007). However, there is potential for improving the validation testing of the model by adding more criterion or dependent variables. This includes identifying a consequence of the IS-Impact construct for the purpose of validation. Moreover, a different approach is employed in this research, whereby the validity of the model is tested using the Partial Least Squares (PLS) method, a component-based structural equation modelling (SEM) technique. Thus, the second research question addresses the construct validation of the IS-Impact model; “Is the IS-Impact model valid as a multidimensional formative construct?” [RQ2]. This study employs two rounds of surveys, each having a different and specific aim. The first is qualitative and exploratory, aiming to investigate the applicability and sufficiency of the IS-Impact dimensions and measures in the new context. This survey was conducted in a state government in Malaysia. A total of 77 valid responses were received, yielding 278 impact statements. The results from the qualitative analysis demonstrate the applicability of most of the IS-Impact measures. The analysis also shows a significant new measure having emerged from the context. This new measure was added as one of the System Quality measures. The second survey is a quantitative survey that aims to operationalise the measures identified from the qualitative analysis and rigorously validate the model. This survey was conducted in four state governments (including the state government that was involved in the first survey). A total of 254 valid responses were used in the data analysis. Data was analysed using structural equation modelling techniques, following the guidelines for formative construct validation, to test the validity and reliability of the constructs in the model. This study is the first research that extends the complete IS-Impact model in a new context that is different in terms of nationality, language and the type of information system (IS). The main contribution of this research is to present a comprehensive, up-to-date IS-Impact model, which has been validated in the new context. The study has accomplished its purpose of testing the generalisability of the IS-Impact model and continuing the IS evaluation research by extending it in the Malaysian context. A further contribution is a validated Malaysian language IS-Impact measurement instrument. It is hoped that the validated Malaysian IS-Impact instrument will encourage related IS research in Malaysia, and that the demonstrated model validity and generalisability will encourage a cumulative tradition of research previously not possible. The study entailed several methodological improvements on prior work, including: (1) new criterion measures for the overall IS-Impact construct employed in ‘identification through measurement relations’; (2) a stronger, multi-item ‘Satisfaction’ construct, employed in ‘identification through structural relations’; (3) an alternative version of the main survey instrument in which items are randomized (rather than blocked) for comparison with the main survey data, in attention to possible common method variance (no significant differences between these two survey instruments were observed); (4) demonstrates a validation process of formative indexes of a multidimensional, second-order construct (existing examples mostly involved unidimensional constructs); (5) testing the presence of suppressor effects that influence the significance of some measures and dimensions in the model; and (6) demonstrates the effect of an imbalanced number of measures within a construct to the contribution power of each dimension in a multidimensional model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate and detailed road models play an important role in a number of geospatial applications, such as infrastructure planning, traffic monitoring, and driver assistance systems. In this thesis, an integrated approach for the automatic extraction of precise road features from high resolution aerial images and LiDAR point clouds is presented. A framework of road information modeling has been proposed, for rural and urban scenarios respectively, and an integrated system has been developed to deal with road feature extraction using image and LiDAR analysis. For road extraction in rural regions, a hierarchical image analysis is first performed to maximize the exploitation of road characteristics in different resolutions. The rough locations and directions of roads are provided by the road centerlines detected in low resolution images, both of which can be further employed to facilitate the road information generation in high resolution images. The histogram thresholding method is then chosen to classify road details in high resolution images, where color space transformation is used for data preparation. After the road surface detection, anisotropic Gaussian and Gabor filters are employed to enhance road pavement markings while constraining other ground objects, such as vegetation and houses. Afterwards, pavement markings are obtained from the filtered image using the Otsu's clustering method. The final road model is generated by superimposing the lane markings on the road surfaces, where the digital terrain model (DTM) produced by LiDAR data can also be combined to obtain the 3D road model. As the extraction of roads in urban areas is greatly affected by buildings, shadows, vehicles, and parking lots, we combine high resolution aerial images and dense LiDAR data to fully exploit the precise spectral and horizontal spatial resolution of aerial images and the accurate vertical information provided by airborne LiDAR. Objectoriented image analysis methods are employed to process the feature classiffcation and road detection in aerial images. In this process, we first utilize an adaptive mean shift (MS) segmentation algorithm to segment the original images into meaningful object-oriented clusters. Then the support vector machine (SVM) algorithm is further applied on the MS segmented image to extract road objects. Road surface detected in LiDAR intensity images is taken as a mask to remove the effects of shadows and trees. In addition, normalized DSM (nDSM) obtained from LiDAR is employed to filter out other above-ground objects, such as buildings and vehicles. The proposed road extraction approaches are tested using rural and urban datasets respectively. The rural road extraction method is performed using pan-sharpened aerial images of the Bruce Highway, Gympie, Queensland. The road extraction algorithm for urban regions is tested using the datasets of Bundaberg, which combine aerial imagery and LiDAR data. Quantitative evaluation of the extracted road information for both datasets has been carried out. The experiments and the evaluation results using Gympie datasets show that more than 96% of the road surfaces and over 90% of the lane markings are accurately reconstructed, and the false alarm rates for road surfaces and lane markings are below 3% and 2% respectively. For the urban test sites of Bundaberg, more than 93% of the road surface is correctly reconstructed, and the mis-detection rate is below 10%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper investigates a detailed Active Shock Control Bump Design Optimisation on a Natural Laminar Flow (NLF) aerofoil; RAE 5243 to reduce cruise drag at transonic flow conditions using Evolutionary Algorithms (EAs) coupled to a robust design approach. For the uncertainty design parameters, the positions of boundary layer transition (xtr) and the coefficient of lift (Cl) are considered (250 stochastic samples in total). In this paper, two robust design methods are considered; the first approach uses a standard robust design method, which evaluates one design model at 250 stochastic conditions for uncertainty. The second approach is the combination of a standard robust design method and the concept of hierarchical (multi-population) sampling (250, 50, 15) for uncertainty. Numerical results show that the evolutionary optimization method coupled to uncertainty design techniques produces useful and reliable Pareto optimal SCB shapes which have low sensitivity and high aerodynamic performance while having significant total drag reduction. In addition,it also shows the benefit of using hierarchical robust method for detailed uncertainty design optimization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Navigational collisions are one of the major safety concerns in many seaports. Despite the extent of recent works done on port navigational safety research, little is known about harbor pilot’s perception of collision risks in port fairways. This paper uses a hierarchical ordered probit model to investigate associations between perceived risks and the geometric and traffic characteristics of fairways and the pilot attributes. Perceived risk data, collected through a risk perception survey conducted among the Singapore port pilots, are used to calibrate the model. Intra-class correlation coefficient justifies use of the hierarchical model in comparison with an ordinary model. Results show higher perceived risks in fairways attached to anchorages, and in those featuring sharper bends and higher traffic operating speeds. Lesser risks are perceived in fairways attached to shoreline and confined waters, and in those with one-way traffic, traffic separation scheme, cardinal marks and isolated danger marks. Risk is also found to be perceived higher in night.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The ability to detect unusual events in surviellance footage as they happen is a highly desireable feature for a surveillance system. However, this problem remains challenging in crowded scenes due to occlusions and the clustering of people. In this paper, we propose using the Distributed Behavior Model (DBM), which has been widely used in computer graphics, for video event detection. Our approach does not rely on object tracking, and is robust to camera movements. We use sparse coding for classification, and test our approach on various datasets. Our proposed approach outperforms a state-of-the-art work which uses the social force model and Latent Dirichlet Allocation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motorcycles are overrepresented in road traffic crashes and particularly vulnerable at signalized intersections. The objective of this study is to identify causal factors affecting the motorcycle crashes at both four-legged and T signalized intersections. Treating the data in time-series cross-section panels, this study explores different Hierarchical Poisson models and found that the model allowing autoregressive lag 1 dependent specification in the error term is the most suitable. Results show that the number of lanes at the four-legged signalized intersections significantly increases motorcycle crashes largely because of the higher exposure resulting from higher motorcycle accumulation at the stop line. Furthermore, the presence of a wide median and an uncontrolled left-turn lane at major roadways of four-legged intersections exacerbate this potential hazard. For T signalized intersections, the presence of exclusive right-turn lane at both major and minor roadways and an uncontrolled left-turn lane at major roadways of T intersections increases motorcycle crashes. Motorcycle crashes increase on high-speed roadways because they are more vulnerable and less likely to react in time during conflicts. The presence of red light cameras reduces motorcycle crashes significantly for both four-legged and T intersections. With the red-light camera, motorcycles are less exposed to conflicts because it is observed that they are more disciplined in queuing at the stop line and less likely to jump start at the start of green.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study proposes a full Bayes (FB) hierarchical modeling approach in traffic crash hotspot identification. The FB approach is able to account for all uncertainties associated with crash risk and various risk factors by estimating a posterior distribution of the site safety on which various ranking criteria could be based. Moreover, by use of hierarchical model specification, FB approach is able to flexibly take into account various heterogeneities of crash occurrence due to spatiotemporal effects on traffic safety. Using Singapore intersection crash data(1997-2006), an empirical evaluate was conducted to compare the proposed FB approach to the state-of-the-art approaches. Results show that the Bayesian hierarchical models with accommodation for site specific effect and serial correlation have better goodness-of-fit than non hierarchical models. Furthermore, all model-based approaches perform significantly better in safety ranking than the naive approach using raw crash count. The FB hierarchical models were found to significantly outperform the standard EB approach in correctly identifying hotspots.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Navigational collisions are a major safety concern in many seaports. Despite the recent advances in port navigational safety research, little is known about harbor pilot’s perception of collision risks in anchorages. This study attempts to model such risks by employing a hierarchical ordered probit model, which is calibrated by using data collected through a risk perception survey conducted on Singapore port pilots. The hierarchical model is found to be useful to account for correlations in risks perceived by individual pilots. Results show higher perceived risks in anchorages attached to intersection, local and international fairway; becoming more critical at night. Lesser risks are perceived in anchorages featuring shoreline in boundary, higher water depth, lower density of stationary ships, cardinal marks and isolated danger marks. Pilotage experience shows a negative effect on perceived risks. This study indicates that hierarchical modeling would be useful for treating correlations in navigational safety data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, the goal of identifying disease subgroups based on differences in observed symptom profile is considered. Commonly referred to as phenotype identification, solutions to this task often involve the application of unsupervised clustering techniques. In this paper, we investigate the application of a Dirichlet Process mixture (DPM) model for this task. This model is defined by the placement of the Dirichlet Process (DP) on the unknown components of a mixture model, allowing for the expression of uncertainty about the partitioning of observed data into homogeneous subgroups. To exemplify this approach, an application to phenotype identification in Parkinson’s disease (PD) is considered, with symptom profiles collected using the Unified Parkinson’s Disease Rating Scale (UPDRS). Clustering, Dirichlet Process mixture, Parkinson’s disease, UPDRS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Approximate clone detection is the process of identifying similar process fragments in business process model collections. The tool presented in this paper can efficiently cluster approximate clones in large process model repositories. Once a repository is clustered, users can filter and browse the clusters using different filtering parameters. Our tool can also visualize clusters in the 2D space, allowing a better understanding of clusters and their member fragments. This demonstration will be useful for researchers and practitioners working on large process model repositories, where process standardization is a critical task for increasing the consistency and reducing the complexity of the repository.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract. For interactive systems, recognition, reproduction, and generalization of observed motion data are crucial for successful interaction. In this paper, we present a novel method for analysis of motion data that we refer to as K-OMM-trees. K-OMM-trees combine Ordered Means Models (OMMs) a model-based machine learning approach for time series with an hierarchical analysis technique for very large data sets, the K-tree algorithm. The proposed K-OMM-trees enable unsupervised prototype extraction of motion time series data with hierarchical data representation. After introducing the algorithmic details, we apply the proposed method to a gesture data set that includes substantial inter-class variations. Results from our studies show that K-OMM-trees are able to substantially increase the recognition performance and to learn an inherent data hierarchy with meaningful gesture abstractions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.