39 resultados para Probabilistic latent semantic analysis (PLSA)
em University of Queensland eSpace - Australia
Resumo:
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of click-stream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.
Resumo:
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster.
Resumo:
Collaborative recommendation is one of widely used recommendation systems, which recommend items to visitor on a basis of referring other's preference that is similar to current user. User profiling technique upon Web transaction data is able to capture such informative knowledge of user task or interest. With the discovered usage pattern information, it is likely to recommend Web users more preferred content or customize the Web presentation to visitors via collaborative recommendation. In addition, it is helpful to identify the underlying relationships among Web users, items as well as latent tasks during Web mining period. In this paper, we propose a Web recommendation framework based on user profiling technique. In this approach, we employ Probabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence activities and develop a modified k-means clustering algorithm to build user profiles as the representatives of usage patterns. Moreover, the hidden task model is derived by characterizing the meaningful latent factor space. With the discovered user profiles, we then choose the most matched profile, which possesses the closely similar preference to current user and make collaborative recommendation based on the corresponding page weights appeared in the selected user profile. The preliminary experimental results performed on real world data sets show that the proposed approach is capable of making recommendation accurately and efficiently.
Resumo:
In this paper, we compare a well-known semantic spacemodel, Latent Semantic Analysis (LSA) with another model, Hyperspace Analogue to Language (HAL) which is widely used in different area, especially in automatic query refinement. We conduct this comparative analysis to prove our hypothesis that with respect to ability of extracting the lexical information from a corpus of text, LSA is quite similar to HAL. We regard HAL and LSA as black boxes. Through a Pearsonrsquos correlation analysis to the outputs of these two black boxes, we conclude that LSA highly co-relates with HAL and thus there is a justification that LSA and HAL can potentially play a similar role in the area of facilitating automatic query refinement. This paper evaluates LSA in a new application area and contributes an effective way to compare different semantic space models.
Resumo:
For zygosity diagnosis in the absence of genotypic data, or in the recruitment phase of a twin study where only single twins from same-sex pairs are being screened, or to provide a test for sample duplication leading to the false identification of a dizygotic pair as monozygotic, the appropriate analysis of respondents' answers to questions about zygosity is critical. Using data from a young adult Australian twin cohort (N = 2094 complete pairs and 519 singleton twins from same-sex pairs with complete responses to all zygosity items), we show that application of latent class analysis (LCA), fitting a 2-class model, yields results that show good concordance with traditional methods of zygosity diagnosis, but with certain important advantages. These include the ability, in many cases, to assign zygosity with specified probability on the basis of responses of a single informant (advantageous when one zygosity type is being oversampled); and the ability to quantify the probability of misassignment of zygosity, allowing prioritization of cases for genotyping as well as identification of cases of probable laboratory error. Out of 242 twins (from 121 like-sex pairs) where genotypic data were available for zygosity confirmation, only a single case was identified of incorrect zygosity assignment by the latent class algorithm. Zygosity assignment for that single case was identified by the LCA as uncertain (probability of being a monozygotic twin only 76%), and the co-twin's responses clearly identified the pair as dizygotic (probability of being dizygotic 100%). In the absence of genotypic data, or as a safeguard against sample duplication, application of LCA for zygosity assignment or confirmation is strongly recommended.
Resumo:
This article applies methods of latent class analysis (LCA) to data on lifetime illicit drug use in order to determine whether qualitatively distinct classes of illicit drug users can be identified. Self-report data on lifetime illicit drug use (cannabis, stimulants, hallucinogens, sedatives, inhalants, cocaine, opioids and solvents) collected from a sample of 6265 Australian twins (average age 30 years) were analyzed using LCA. Rates of childhood sexual and physical abuse, lifetime alcohol and tobacco dependence, symptoms of illicit drug abuse/dependence and psychiatric comorbidity were compared across classes using multinomial logistic regression. LCA identified a 5-class model: Class 1 (68.5%) had low risks of the use of all drugs except cannabis; Class 2 (17.8%) had moderate risks of the use of all drugs; Class 3 (6.6%) had high rates of cocaine, other stimulant and hallucinogen use but lower risks for the use of sedatives or opioids. Conversely, Class 4 (3.0%) had relatively low risks of cocaine, other stimulant or hallucinogen use but high rates of sedative and opioid use. Finally, Class 5 (4.2%) had uniformly high probabilities for the use of all drugs. Rates of psychiatric comorbidity were highest in the polydrug class although the sedative/opioid class had elevated rates of depression/suicidal behaviors and exposure to childhood abuse. Aggregation of population-level data may obscure important subgroup differences in patterns of illicit drug use and psychiatric comorbidity. Further exploration of a 'self-medicating' subgroup is needed.
Resumo:
Context: The relationships among the different eating disorders that exist in the community are poorly understood, especially for residual disorders in which bingeing or purging occurs in the absence of other behaviors. Objective: To examine a community sample for the number of mutually exclusive weight and eating profiles. Design: Data regarding lifetime eating disorder symptoms and weight range were submitted to a latent profile analysis. Profiles were compared regarding personality, current eating and weight, retrospectively reported life events, and lifetime depressive psychopathology. Setting: Longitudinal study among female twins from the Australian Twin Registry in whom eating was assessed by a telephone interview. Participants: A community sample of 1002 twins (individuals) who had participated in earlier waves of data collection. Main Outcome Measures: Number and clinical character of latent profiles. Results: The best fit was a 5-profile solution with women who were (1) of normal weight with few lifetime eating disorders (4.3%), (2) overweight (10.6% had a lifetime eating disorder), (3) underweight and generally had no eating disorders except for 5.3% who had restricting anorexia nervosa, (4) of low to normal weight (89.0% had a lifetime eating disorder), and (5) obese (37.0% had a lifetime eating disorder). Each profile contained more than 1 type of lifetime eating disorder except for the third profile. Women in the first and third profiles had the best functioning, with women in the fourth and fifth profiles having similarly poorer functioning. The women in the fourth group had a symptom profile distinctive from the other 4 groups in terms of severity; they were also more likely to have had lifetime major depression and suicidality. Conclusion: Lifetime weight ranges and the severity of eating disorder symptoms affected clustering more than the type of eating disorder symptom.
Resumo:
A new methodology is proposed for the analysis of generation capacity investment in a deregulated market environment. This methodology proposes to make the investment appraisal using a probabilistic framework. The probabilistic production simulation (PPC) algorithm is used to compute the expected energy generated, taking into account system load variations and plant forced outage rates, while the Monte Carlo approach has been applied to model the electricity price variability seen in a realistic network. The model is able to capture the price and hence the profitability uncertainties for generator companies. Seasonal variation in the electricity prices and the system demand are independently modeled. The method is validated on IEEE RTS system, augmented with realistic market and plant data, by using it to compare the financial viability of several generator investments applying either conventional or directly connected generator (powerformer) technologies. The significance of the results is assessed using several financial risk measures.
Resumo:
Grid computing is an advanced technique for collaboratively solving complicated scientific problems using geographically and organisational dispersed computational, data storage and other recourses. Application of grid computing could provide significant benefits to all aspects of power system that involves using computers. Based on our previous research, this paper presents a novel grid computing approach for probabilistic small signal stability (PSSS) analysis in electric power systems with uncertainties. A prototype computing grid is successfully implemented in our research lab to carry out PSSS analysis on two benchmark systems. Comparing to traditional computing techniques, the gird computing has given better performances for PSSS analysis in terms of computing capacity, speed, accuracy and stability. In addition, a computing grid framework for power system analysis has been proposed based on the recent study.
Resumo:
Latent class and genetic analyses were used to identify subgroups of migraine sufferers in a community sample of 6,265 Australian twins (55% female) aged 25-36 who had completed an interview based on International Headache Society UHS) criteria. Consistent with prevalence rates from other population-based studies, 703 (20%) female and 250 (9%) male twins satisfied the IHS criteria for migraine without aura (MO), and of these, 432 (13%) female and 166 (6%) male twins satisfied the criteria for migraine with aura (MA) as indicated by visual symptoms. Latent class analysis (LCA) of IHS symptoms identified three major symptomatic classes, representing 1) a mild form of recurrent nonmigrainous headache, 2) a moderately severe form of migraine, typically without visual aura symptoms (although 40% of individuals in this class were positive for aura), and 3) a severe form of migraine typically with visual aura symptoms (although 24% of individuals were negative for aura). Using the LCA classification, many more individuals were considered affected to some degree than when using IHS criteria (35% vs. 13%). Furthermore, genetic model fitting indicated a greater genetic contribution to migraine using the LCA classification (heritability, h(2) =0.40; 95% CI, 0.29-0.46) compared with the IHS classification (h(2)=0.36; 95% CI, 0.22-0.42). Exploratory latent class modeling, fitting up to 10 classes, did not identify classes corresponding to either the IHS MO or MA classification. Our data indicate the existence of a continuum of severity, with MA more severe but not etiologically distinct from MO. In searching for predisposing genes, we should therefore expect to find some genes that may underlie all major recurrent headache subtypes, with modifying genetic or environmental factors that may lead to differential expression of the liability for migraine. (C) 2004 Wiley-Liss, Inc.
Resumo:
This paper explains and explores the concept of "semantic molecules" in the NSM methodology of semantic analysis. A semantic molecule is a complex lexical meaning which functions as an intermediate unit in the structure of other, more complex concepts. The paper undertakes an overview of different kinds of semantic molecule, showing how they enter into more complex meanings and how they themselves can be explicated. It shows that four levels of "nesting" of molecules within molecules are attested, and it argues that while some molecules such as 'hands' and 'make', may well be language-universal, many others are language-specific.
Resumo:
Familial typical migraine is a common, complex disorder that shows strong familial aggregation. Using latent-class analysis (LCA), we identified subgroups of people with migraine/severe headache in a community sample of 12,245 Australian twins (60% female), drawn from two cohorts of individuals aged 23-90 years who completed an interview based on International Headache Society criteria. We report results from genomewide linkage analyses involving 756 twin families containing a total of 790 independent sib pairs ( 130 affected concordant, 324 discordant, and 336 unaffected concordant for LCA-derived migraine). Quantitative-trait linkage analysis produced evidence of significant linkage on chromosome 5q21 and suggestive linkage on chromosomes 8, 10, and 13. In addition, we replicated previously reported typical-migraine susceptibility loci on chromosomes 6p12.2-p21.1 and 1q21-q23, the latter being within 3 cM of the rare autosomal dominant familial hemiplegic migraine gene (ATP1A2), a finding which potentially implicates ATP1A2 in familial typical migraine for the first time. Linkage analyses of individual migraine symptoms for our six most interesting chromosomes provide tantalizing hints of the phenotypic and genetic complexity of migraine. Specifically, the chromosome 1 locus is most associated with phonophobia; the chromosome 5 peak is predominantly associated with pulsating headache; the chromosome 6 locus is associated with activity-prohibiting headache and photophobia; the chromosome 8 locus is associated with nausea/vomiting and moderate/severe headache; the chromosome 10 peak is most associated with phonophobia and photophobia; and the chromosome 13 peak is completely due to association with photophobia. These results will prove to be invaluable in the design and analysis of future linkage and linkage disequilibrium studies of migraine.