16 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles
em University of Queensland eSpace - Australia
Resumo:
In this paper, we consider how refinements between state-based specifications (e.g., written in Z) can be checked by use of a model checker. Specifically, we are interested in the verification of downward and upward simulations which are the standard approach to verifying refinements in state-based notations. We show how downward and upward simulations can be checked using existing temporal logic model checkers. In particular, we show how the branching time temporal logic CTL can be used to encode the standard simulation conditions. We do this for both a blocking, or guarded, interpretation of operations (often used when specifying reactive systems) as well as the more common non-blocking interpretation of operations used in many state-based specification languages (for modelling sequential systems). The approach is general enough to use with any state-based specification language, and we illustrate how refinements between Z specifications can be checked using the SAL CTL model checker using a small example.
Challenges related to data collection and dynamic model validation of a fertilizer granulation plant
Resumo:
Collaborative recommendation is one of widely used recommendation systems, which recommend items to visitor on a basis of referring other's preference that is similar to current user. User profiling technique upon Web transaction data is able to capture such informative knowledge of user task or interest. With the discovered usage pattern information, it is likely to recommend Web users more preferred content or customize the Web presentation to visitors via collaborative recommendation. In addition, it is helpful to identify the underlying relationships among Web users, items as well as latent tasks during Web mining period. In this paper, we propose a Web recommendation framework based on user profiling technique. In this approach, we employ Probabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence activities and develop a modified k-means clustering algorithm to build user profiles as the representatives of usage patterns. Moreover, the hidden task model is derived by characterizing the meaningful latent factor space. With the discovered user profiles, we then choose the most matched profile, which possesses the closely similar preference to current user and make collaborative recommendation based on the corresponding page weights appeared in the selected user profile. The preliminary experimental results performed on real world data sets show that the proposed approach is capable of making recommendation accurately and efficiently.
Resumo:
This paper presents load profiles of electricity customers, using the knowledge discovery in databases (KDD) procedure, a data mining technique, to determine the load profiles for different types of customers. In this paper, the current load profiling methods are compared using data mining techniques, by analysing and evaluating these classification techniques. The objective of this study is to determine the best load profiling methods and data mining techniques to classify, detect and predict non-technical losses in the distribution sector, due to faulty metering and billing errors, as well as to gather knowledge on customer behaviour and preferences so as to gain a competitive advantage in the deregulated market. This paper focuses mainly on the comparative analysis of the classification techniques selected; a forthcoming paper will focus on the detection and prediction methods.
Resumo:
The best accepted method for design of autogenous and semi-autogenous (AG/SAG) mills is to carry out pilot scale test work using a 1.8 m diameter by 0.6 m long pilot scale test mill. The load in such a mill typically contains 250,000-450,000 particles larger than 6 mm, allowing correct representation of more than 90% of the charge in Discrete Element Method (DEM) simulations. Most AG/SAG mills use discharge grate slots which are 15 mm or more in width. The mass in each size fraction usually decreases rapidly below grate size. This scale of DEM model is now within the possible range of standard workstations running an efficient DEM code. This paper describes various ways of extracting collision data front the DEM model and translating it into breakage estimates. Account is taken of the different breakage mechanisms (impact and abrasion) and of the specific impact histories of the particles in order to assess the breakage rates for various size fractions in the mills. At some future time, the integration of smoothed particle hydrodynamics with DEM will allow for the inclusion of slurry within the pilot mill simulation. (C) 2004 Elsevier Ltd. All rights reserved.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
In this paper, we describe the evaluation of a method for building detection by the Dempster-Shafer fusion of LIDAR data and multispectral images. For that purpose, ground truth was digitised for two test sites with quite different characteristics. Using these data sets, the heuristic model for the probability mass assignments of the method is validated, and rules for the tuning of the parameters of this model are discussed. Further we evaluate the contributions of the individual cues used in the classification process to the quality of the classification results. Our results show the degree to which the overall correctness of the results can be improved by fusing LIDAR data with multispectral images.
Resumo:
Information processing speed, as measured by elementary cognitive tasks, is correlated with higher order cognitive ability so that increased speed relates to improved cognitive performance. The question of whether the genetic variation in Inspection Time (IT) and Choice Reaction Time (CRT) is associated with IQ through a unitary factor was addressed in this multivariate genetic study of IT, CRT, and IQ subtest scores. The sample included 184 MZ and 206 DZ twin pairs with a mean age of 16.2 years (range 15-18 years). They were administered a visual (pi-figure) IT task, a two-choice RT task, five computerized subtests of the Multidimensional Aptitude Battery, and the digit symbol substitution subtest from the WAIS-R. The data supported a factor model comprising a general, three group (verbal ability, visuospatial ability, broad speediness), and specific genetic factor structure, a shared environmental factor influencing all tests but IT, plus unique environmental factors that were largely specific to individual measures. The general genetic factor displayed factor loadings ranging between 0.35 and 0.66 for the IQ subtests, with IT and CRT loadings of -0.47 and -0.24, respectively. Results indicate that a unitary factor is insufficient to describe the entire relationship between cognitive speed measures and all IQ subtests, with independent genetic effects explaining further covariation between processing speed (especially CRT) and Digit Symbol.
Resumo:
In this paper we propose a new identification method based on the residual white noise autoregressive criterion (Pukkila et al. , 1990) to select the order of VARMA structures. Results from extensive simulation experiments based on different model structures with varying number of observations and number of component series are used to demonstrate the performance of this new procedure. We also use economic and business data to compare the model structures selected by this order selection method with those identified in other published studies.
Resumo:
Background: This study extended that of Kwon and Oei [Kwon, S.M., Oei, T.P.S., 2003. Cognitive change processes in a group cognitive behavior therapy of depression. J. Behav. Ther. Exp. Psychiatry, 3, 73-85], which outlined a number of testable models based on Beck's cognitive theory of depression. Specifically, the current study tested the following four competing models: the causal, consequential, fully and partially interactive cognitive models in patients with major depressive disorder. Methods: A total of 168 clinically depressed outpatients were recruited into a 12-week group cognitive behaviour therapy program. Data was collected at three time points: baseline, mid- and at termination of therapy using the ATQ DAS and BD1. The data were analysed with Amos 4.01 (Arbuckle, J.L., 1999. Amos 4.1. Smallwaters, Chicago.) structural equation modelling. Results: Results indicated that dysfunctional attitudes, negative automatic thoughts and symptoms of depression reduced significantly during treatment. Both the causal and consequential models equally provided an adequate fit to the data. The fully interactive model provided the best fit. However, after removing non-significant pathways, it was found that reduced depressive symptom contributed to reduced depressogenic automatic thoughts and dysfunctional attitudes, not the reverse. Conclusion: These findings did not fully support Beck's cognitive theory of depression that cognitions are primary in the reduction of depressed mood. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Security protocols are often modelled at a high level of abstraction, potentially overlooking implementation-dependent vulnerabilities. Here we use the Z specification language's rich set of data structures to formally model potentially ambiguous messages that may be exploited in a 'type flaw' attack. We then show how to formally verify whether or not such an attack is actually possible in a particular protocol using Z's schema calculus.
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.
Resumo:
In modern magnetic resonance imaging (MRI), both patients and radiologists are exposed to strong, nonuniform static magnetic fields inside or outside of the scanner, in which the body movement may be able to induce electric currents in tissues which could be possibly harmful. This paper presents theoretical investigations into the spatial distribution of induced E-fields in the human model when moving at various positions around the magnet. The numerical calculations are based on an efficient, quasistatic, finite-difference scheme and an anatomically realistic, full-body, male model. 3D field profiles from an actively-shielded 4 T magnet system are used and the body model projected through the field profile with normalized velocity. The simulation shows that it is possible to induce E-fields/currents near the level of physiological significance under some circumstances and provides insight into the spatial characteristics of the induced fields. The results are easy to extrapolate to very high field strengths for the safety evaluation at a variety of field strengths and motion velocities.
Resumo:
This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.
Resumo:
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster.