196 resultados para Probabilistic latent semantic analysis (PLSA)
Resumo:
Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.
Resumo:
We address the problem of face recognition on video by employing the recently proposed probabilistic linear discrimi-nant analysis (PLDA). The PLDA has been shown to be robust against pose and expression in image-based face recognition. In this research, the method is extended and applied to video where image set to image set matching is performed. We investigate two approaches of computing similarities between image sets using the PLDA: the closest pair approach and the holistic sets approach. To better model face appearances in video, we also propose the heteroscedastic version of the PLDA which learns the within-class covariance of each individual separately. Our experi-ments on the VidTIMIT and Honda datasets show that the combination of the heteroscedastic PLDA and the closest pair approach achieves the best performance.
A tag-based personalized item recommendation system using tensor modeling and topic model approaches
Resumo:
This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment
Resumo:
This is a methodological paper describing when and how manifest items dropped from a latent construct measurement model (e.g., factor analysis) can be retained for additional analysis. Presented are protocols for assessment for retention in the measurement model, evaluation of dropped items as potential items separate from the latent construct, and post hoc analyses that can be conducted using all retained (manifest or latent) variables. The protocols are then applied to data relating to the impact of the NAPLAN test. The variables examined are teachers’ achievement goal orientations and teachers’ perceptions of the impact of the test on curriculum and pedagogy. It is suggested that five attributes be considered before retaining dropped manifest items for additional analyses. (1) Items can be retained when employed in service of an established or hypothesized theoretical model. (2) Items should only be retained if sufficient variance is present in the data set. (3) Items can be retained when they provide a rational segregation of the data set into subsamples (e.g., a consensus measure). (4) The value of retaining items can be assessed using latent class analysis or latent mean analysis. (5) Items should be retained only when post hoc analyses with these items produced significant and substantive results. These suggested exploratory strategies are presented so that other researchers using survey instruments might explore their data in similar and more innovative ways. Finally, suggestions for future use are provided.
Resumo:
Probabilistic load flow techniques have been adopted in AC electrified railways to study the load demand under various train service conditions. This paper highlights the differences in probabilistic load flow analysis between the usual power systems and power supply systems in AC railways; discusses the possible difficulties in problem formulation and presents the link between train movement and the corresponding power demand for load flow calculation.
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
Aims and objectives. The aim of this study was to gain an understanding of the experiences and perspectives of intensive care nurses caring for critically ill obstetric patients. Background. Current literature suggests critically ill obstetric patients need specialised, technically appropriate care to meet their specific needs with which many intensive care nurses are unfamiliar. Furthermore, there is little research and evidence to guide the care of this distinct patient group. Design. This study used a descriptive qualitative design. Methods. Two focus groups were used to collect data from 10 Australian intensive care units nurses in May 2007. Open-ended questions were used to guide the discussion. Latent content analysis was used to analyse the data set. Each interview lasted no longer than 60 minutes and was recorded using audio tape. The full interviews were transcribed prior to in-depth analysis to identify major themes. Results. The themes identified from the focus group interviews were competence with knowledge and skills for managing obstetric patients in the intensive care unit, confidence in caring for obstetric patients admitted to the intensive care unit and acceptance of an expanded scope of practice perceived to include fundamental midwifery knowledge and skills. Conclusion. The expressed lack of confidence and competence in meeting the obstetric and support needs of critically ill obstetric women indicates a clear need for greater assistance and education of intensive care nurses. This in turn may encourage critical care nurses to accept an expanded role of clinical practice in caring for critically ill obstetric patients. Relevance to clinical practice. Recognition of the issues for nurses in successfully caring for obstetric patients admitted to an adult intensive care setting provides direction for designing education packages, ensuring specific carepaths and guidelines are in place and that support from a multidisciplinary team is available including midwifery staff.
Resumo:
Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provide a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA). Speaker verification performance for utterances with as little as 2 sec of data taken from the NIST Speaker Recognition Evaluations are presented to provide a clearer picture of the current performance characteristics of these techniques in short utterance conditions.
Resumo:
This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear discriminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to the development of automatic speaker verification system in real world applications. When sufficient speech is available, previous research has shown that heavy-tailed PLDA (HTPLDA) modeling of speakers in the i-vector space provides state-of-the-art performance, however, the robustness of HTPLDA to the limited speech resources in development, enrolment and verification is an important issue that has not yet been investigated. In this paper, we analyze the speaker verification performance with regards to the duration of utterances used for both speaker evaluation (enrolment and verification) and score normalization and PLDA modeling during development. Two different approaches to total-variability representation are analyzed within the PLDA approach to show improved performance in short-utterance mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development. The results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset suggest that the HTPLDA system can continue to achieve better performance than Gaussian PLDA (GPLDA) as evaluation utterance lengths are decreased. We also highlight the importance of matching durations for score normalization and PLDA modeling to the expected evaluation conditions. Finally, we found that a pooled total-variability approach to PLDA modeling can achieve better performance than the traditional concatenated total-variability approach for short utterances in mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development.
Resumo:
This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA) modeling for the purpose of improving speaker verification performance in the presence of high inter-session variability. Recently it was shown that WLDA techniques can provide improvement over traditional linear discriminant analysis (LDA) for channel compensation in i-vector based speaker verification systems. We show in this paper that the speaker discriminative information that is available in the distance between pair of speakers clustered in the development i-vector space can also be exploited in heavy-tailed PLDA modeling by using the weighted discriminant approaches prior to PLDA modeling. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that WLDA and WMFD projections before PLDA modeling can provide an improved approach when compared to uncompensated PLDA modeling for i-vector based speaker verification systems.
Resumo:
Chatrooms, for example Internet Relay Chat, are generally multi-user, multi-channel and multiserver chat-systems which run over the Internet and provide a protocol for real-time text-based conferencing between users all over the world. While a well-trained human observer is able to understand who is chatting with whom, there are no efficient and accurate automated tools to determine the groups of users conversing with each other. A precursor to analysing evolving cyber-social phenomena is to first determine what the conversations are and which groups of chatters are involved in each conversation. We consider this problem in this paper. We propose an algorithm to discover all groups of users that are engaged in conversation. Our algorithms are based on a statistical model of a chatroom that is founded on our experience with real chatrooms. Our approach does not require any semantic analysis of the conversations, rather it is based purely on the statistical information contained in the sequence of posts. We improve the accuracy by applying some graph algorithms to clean the statistical information. We present some experimental results which indicate that one can automatically determine the conversing groups in a chatroom, purely on the basis of statistical analysis.
Resumo:
The validity of the Multidimensional School Anger Inventory (MSAI) was examined with adolescents from 5 Pacific Rim countries (N ¼ 3,181 adolescents; age, M ¼ 14.8 years; 52% females). Confirmatory factor analyses examined configural invariance for the MSAI’s anger experience, hostility, destructive expression, and anger coping subscales. The model did not converge for Peruvian students. Using the top 4 loaded items for anger experience, hostility, and destructive expression configural invariance and partial metric and scalar invariances were found. Latent means analysis compared mean responses on each subscale to the U.S. sample. Students from other countries showed higher mean responses on the anger experience subscale (ds ¼ .37–.73). Australian (d ¼ .40) and Japanese students (d ¼ .21) had significantly higher mean hostility subscale scores. Australian students had higher mean scores on the destructive expression subscale (d ¼ .30), whereas Japanese students had lower mean scores (d ¼ 2.17). The largest latent mean gender differences (females lower than males) were for destructive expression among Australian (d ¼ 2.67), Guatemalan (d ¼ 2.42), and U.S. (d ¼ 2.66) students. This study supported an abbreviated, 12-item MSAI with partial invariance. Implications for the use of the MSAI in comparative research are discussed.
Resumo:
Migraine is a common neurological disorder with a strong genetic basis. However, the complex nature of the disorder has meant that few genes or susceptibility loci have been identified and replicated consistently to confirm their involvement in migraine. Approaches to genetic studies of the disorder have included analysis of the rare migraine subtype, familial hemiplegic migraine with several causal genes identified for this severe subtype. However, the exact genetic contributors to the more common migraine subtypes are still to be deciphered. Genome-wide studies such as genome-wide association studies and linkage analysis as well as candidate genes studies have been employed to investigate genes involved in common migraine. Neurological, hormonal and vascular genes are all considered key factors in the pathophysiology of migraine and are a focus of many of these studies. It is clear that the influence of individual genes on the expression of this disorder will vary. Furthermore, the disorder may be dependent on gene–gene and gene–environment interactions that have not yet been considered. In addition, identifying susceptibility genes may require phenotyping methods outside of the International Classification of Headache Disorders II criteria, such as trait component analysis and latent class analysis to better define the ambit of migraine expression.
Resumo:
A significant amount of speech data is required to develop a robust speaker verification system, but it is difficult to find enough development speech to match all expected conditions. In this paper we introduce a new approach to Gaussian probabilistic linear discriminant analysis (GPLDA) to estimate reliable model parameters as a linearly weighted model taking more input from the large volume of available telephone data and smaller proportional input from limited microphone data. In comparison to a traditional pooled training approach, where the GPLDA model is trained over both telephone and microphone speech, this linear-weighted GPLDA approach is shown to provide better EER and DCF performance in microphone and mixed conditions in both the NIST 2008 and NIST 2010 evaluation corpora. Based upon these results, we believe that linear-weighted GPLDA will provide a better approach than pooled GPLDA, allowing for the further improvement of GPLDA speaker verification in conditions with limited development data.
Resumo:
This paper proposes techniques to improve the performance of i-vector based speaker verification systems when only short utterances are available. Short-length utterance i-vectors vary with speaker, session variations, and the phonetic content of the utterance. Well established methods such as linear discriminant analysis (LDA), source-normalized LDA (SN-LDA) and within-class covariance normalisation (WCCN) exist for compensating the session variation but we have identified the variability introduced by phonetic content due to utterance variation as an additional source of degradation when short-duration utterances are used. To compensate for utterance variations in short i-vector speaker verification systems using cosine similarity scoring (CSS), we have introduced a short utterance variance normalization (SUVN) technique and a short utterance variance (SUV) modelling approach at the i-vector feature level. A combination of SUVN with LDA and SN-LDA is proposed to compensate the session and utterance variations and is shown to provide improvement in performance over the traditional approach of using LDA and/or SN-LDA followed by WCCN. An alternative approach is also introduced using probabilistic linear discriminant analysis (PLDA) approach to directly model the SUV. The combination of SUVN, LDA and SN-LDA followed by SUV PLDA modelling provides an improvement over the baseline PLDA approach. We also show that for this combination of techniques, the utterance variation information needs to be artificially added to full-length i-vectors for PLDA modelling.