196 resultados para Probabilistic latent semantic analysis (PLSA)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Critical analysis and problem-solving skills are two graduate attributes that are important in ensuring that graduates are well equipped in working across research and practice settings within the discipline of psychology. Despite the importance of these skills, few psychology undergraduate programmes have undertaken any systematic development, implementation, and evaluation of curriculum activities to foster these graduate skills. The current study reports on the development and implementation of a tutorial programme designed to enhance the critical analysis and problem-solving skills of undergraduate psychology students. Underpinned by collaborative learning and problem-based learning, the tutorial programme was administered to 273 third year undergraduate students in psychology. Latent Growth Curve Modelling revealed that students demonstrated a significant linear increase in self-reported critical analysis and problem-solving skills across the tutorial programme. The findings suggest that the development of inquiry-based curriculum offers important opportunities for psychology undergraduates to develop critical analysis and problem-solving skills.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chinese modal particles feature prominently in Chinese people’s daily use of the language, but their pragmatic and semantic functions are elusive as commonly recognised by Chinese linguists and teachers of Chinese as a foreign language. This book originates from an extensive and intensive empirical study of the Chinese modal particle a (啊), one of the most frequently used modal particles in Mandarin Chinese. In order to capture all the uses and the underlying meanings of the particle, the author transcribed the first 20 episodes, about 20 hours in length, of the popular Chinese TV drama series Kewang ‘Expectations’, which yielded a corpus data of more than 142’000 Chinese characters with a total of 1829 instances of the particle all used in meaningful communicative situations. Within its context of use, every single occurrence of the particle was analysed in terms of its pragmatic and semantic contributions to the hosting utterance. Upon this basis the core meanings were identified which were seen as constituting the modal nature of the particle.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper uses innovative content analysis techniques to map how the death of Oscar Pistorius' girlfriend, Reeva Steenkamp, was framed on Twitter conversations. Around 1.5 million posts from a two-week timeframe are analyzed with a combination of syntactic and semantic methods. This analysis is grounded in the frame analysis perspective and is different than sentiment analysis. Instead of looking for explicit evaluations, such as “he is guilty” or “he is innocent”, we showcase through the results how opinions can be identified by complex articulations of more implicit symbolic devices such as examples and metaphors repeatedly mentioned. Different frames are adopted by users as more information about the case is revealed: from a more episodic one, highly used in the very beginning, to more systemic approaches, highlighting the association of the event with urban violence, gun control issues, and violence against women. A detailed timeline of the discussions is provided.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A common problem with the use of tensor modeling in generating quality recommendations for large datasets is scalability. In this paper, we propose the Tensor-based Recommendation using Probabilistic Ranking method that generates the reconstructed tensor using block-striped parallel matrix multiplication and then probabilistically calculates the preferences of user to rank the recommended items. Empirical analysis on two real-world datasets shows that the proposed method is scalable for large tensor datasets and is able to outperform the benchmarking methods in terms of accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a sequel to a paper that dealt with the analysis of two-way quantitative data in large germplasm collections, this paper presents analytical methods appropriate for two-way data matrices consisting of mixed data types, namely, ordered multicategory and quantitative data types. While various pattern analysis techniques have been identified as suitable for analysis of the mixed data types which occur in germplasm collections, the clustering and ordination methods used often can not deal explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions) with incomplete information. However, it is shown that the ordination technique of principal component analysis and the mixture maximum likelihood method of clustering can be employed to achieve such analyses. Germplasm evaluation data for 11436 accessions of groundnut (Arachis hypogaea L.) from the International Research Institute of the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the post-rainy season and five ordered multicategory descriptors were used. Pattern analysis results generally indicated that the accessions could be distinguished into four regions along the continuum of growth habit (or plant erectness). Interpretation of accession membership in these regions was found to be consistent with taxonomic information, such as subspecies. Each growth habit region contained accessions from three of the most common groundnut botanical varieties. This implies that within each of the habit types there is the full range of expression for the other descriptors used in the analysis. Using these types of insights, the patterns of variability in germplasm collections can provide scientists with valuable information for their plant improvement programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study focuses on using the partial least squares (PLS) path modelling technique in archival auditing research by replicating the data and research questions from prior bank audit fee studies. PLS path modelling allows for inter-correlations among audit fee determinants by establishing latent constructs and multiple relationship paths in one simultaneous PLS path model. Endogeneity concerns about auditor choice can also be addressed with PLS path modelling. With a sample of US bank holding companies for the period 2003-2009, we examine the associations among on-balance sheet financial risks, off-balance sheet risks and audit fees, and also address the pervasive client size effect, and the effect of the self-selection of auditors. The results endorse the dominating effect of size on audit fees, both directly and indirectly via its impacts on other audit fee determinants. By simultaneously considering the self-selection of auditors, we still find audit fee premiums on Big N auditors, which is the second important factor on audit fee determination. On-balance-sheet financial risk measures in terms of capital adequacy, loan composition, earnings and asset quality performance have positive impacts on audit fees. After allowing for the positive influence of on-balance sheet financial risks and entity size on off-balance sheet risk, the off-balance sheet risk measure, SECRISK, is still positively associated with bank audit fees, both before and after the onset of the financial crisis. The consistent results from this study compared with prior literature provide supporting evidence and enhance confidence on the application of this new research technique in archival accounting studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study focuses on using the partial least squares (PLS) path modelling methodology in archival auditing research by replicating the data and research questions from prior bank audit fee studies. PLS path modelling allows for inter-correlations among audit fee determinants by establishing latent constructs and multiple relationship paths in one simultaneous PLS path model. Endogeneity concerns about auditor choice can also be addressed with PLS path modelling. With a sample of US bank holding companies for the period 2003-2009, we examine the associations among on-balance sheet financial risks, off-balance sheet risks and audit fees, and also address the pervasive client size effect, and the effect of the self-selection of auditors. The results endorse the dominating effect of size on audit fees, both directly and indirectly via its impacts on other audit fee determinants. By simultaneously considering the self-selection of auditors, we still find audit fee premiums on Big N auditors, which is the second important factor on audit fee determination. On-balance-sheet financial risk measures in terms of capital adequacy, loan composition, earnings and asset quality performance have positive impacts on audit fees. After allowing for the positive influence of on-balance sheet financial risks and entity size on off-balance sheet risk, the off-balance sheet risk measure, SECRISK, is still positively associated with bank audit fees, both before and after the onset of the financial crisis. The consistent results from this study compared with prior literature provide supporting evidence and enhance confidence on the application of this new research technique in archival accounting studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

AIM: To assess the cost-effectiveness of an automated telephone-linked care intervention, Australian TLC Diabetes, delivered over 6 months to patients with established Type 2 diabetes mellitus and high glycated haemoglobin level, compared to usual care. METHODS: A Markov model was designed to synthesize data from a randomized controlled trial of TLC Diabetes (n=120) and other published evidence. The 5-year model consisted of three health states related to glycaemic control: 'sub-optimal' HbA1c ≥58mmol/mol (7.5%); 'average' ≥48-57mmol/mol (6.5-7.4%) and 'optimal' <48mmol/mol (6.5%) and a fourth state 'all-cause death'. Key outcomes of the model include discounted health system costs and quality-adjusted life years (QALYS) using SF-6D utility weights. Univariate and probabilistic sensitivity analyses were undertaken. RESULTS: Annual medication costs for the intervention group were lower than usual care [Intervention: £1076 (95%CI: £947, £1206) versus usual care £1271 (95%CI: £1115, £1428) p=0.052]. The estimated mean cost for intervention group participants over five years, including the intervention cost, was £17,152 versus £17,835 for the usual care group. The corresponding mean QALYs were 3.381 (SD 0.40) for the intervention group and 3.377 (SD 0.41) for the usual care group. Results were sensitive to the model duration, utility values and medication costs. CONCLUSION: The Australian TLC Diabetes intervention was a low-cost investment for individuals with established diabetes and may result in medication cost-savings to the health system. Although QALYs were similar between groups, other benefits arising from the intervention should also be considered when determining the overall value of this strategy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a tag-based recommender system, the multi-dimensional correlation should be modeled effectively for finding quality recommendations. Recently, few researchers have used tensor models in recommendation to represent and analyze latent relationships inherent in multi-dimensions data. A common approach is to build the tensor model, decompose it and, then, directly use the reconstructed tensor to generate the recommendation based on the maximum values of tensor elements. In order to improve the accuracy and scalability, we propose an implementation of the -mode block-striped (matrix) product for scalable tensor reconstruction and probabilistically ranking the candidate items generated from the reconstructed tensor. With testing on real-world datasets, we demonstrate that the proposed method outperforms the benchmarking methods in terms of recommendation accuracy and scalability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective Research into youth caregiving in families where a parent experiences a significant medical condition has been hampered by a lack of contextually sensitive measures of the nature and breadth of young caregiving experiences. This study examined the factor structure and measurement invariance of such a measure called the Young Carer of Parents Inventory (YCOPI; Pakenham et al., 2006) using confirmatory factor analysis across 3 groups of youth. The YCOPI has 2 parts: YCOPI-A with 5 factors assessing caregiving experiences that are applicable to all caregiving contexts; YCOPI-B with 4 factors that tap dimensions related to youth caregiving in the context of parent illness. Design Two samples (ages 9–20 years) were recruited: a community sample of 2,429 youth from which 2 groups were derived (“healthy” family [HF], n = 1760; parental illness [PI], n = 446), and a sample of 130 youth of a parent with multiple sclerosis). Results With some modification, the YCOPI-A demonstrated a replicable factor structure across 3 groups, and exhibited only partial measurement invariance across the HF and PI groups. The impact of assuming full measurement invariance on latent mean differences appeared small, supporting use of the measure in research and applied settings when estimated using latent factors and controlling for measurement invariance. PI youth reported significantly higher scores than did HF youth on all YCOPI-A subscales. The YCOPI-B requires some modifications, and further development work is recommended. Conclusion The factor structure that emerged and the addition of new items constitutes the YCOPI-Revised. Findings support the use of the YCOPI-Revised in research and applied settings.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conceptual combination performs a fundamental role in creating the broad range of compound phrases utilised in everyday language. While the systematicity and productivity of language provide a strong argument in favour of assuming compositionality, this very assumption is still regularly questioned in both cognitive science and philosophy. This article provides a novel probabilistic framework for assessing whether the semantics of conceptual combinations are compositional, and so can be considered as a function of the semantics of the constituent concepts, or not. Rather than adjudicating between different grades of compositionality, the framework presented here contributes formal methods for determining a clear dividing line between compositional and non-compositional semantics. Compositionality is equated with a joint probability distribution modelling how the constituent concepts in the combination are interpreted. Marginal selectivity is emphasised as a pivotal probabilistic constraint for the application of the Bell/CH and CHSH systems of inequalities (referred to collectively as Bell-type). Non-compositionality is then equated with either a failure of marginal selectivity, or, in the presence of marginal selectivity, with a violation of Bell-type inequalities. In both non-compositional scenarios, the conceptual combination cannot be modelled using a joint probability distribution with variables corresponding to the interpretation of the individual concepts. The framework is demonstrated by applying it to an empirical scenario of twenty-four non-lexicalised conceptual combinations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We used event-related functional magnetic resonance imaging (fMRI) to investigate neural responses associated with the semantic interference (SI) effect in the picture-word task. Independent stage models of word production assume that the locus of the SI effect is at the conceptual processing level (Levelt et al. [1999]: Behav Brain Sci 22:1-75), whereas interactive models postulate that it occurs at phonological retrieval (Starreveld and La Heij [1996]: J Exp Psychol Learn Mem Cogn 22:896-918). In both types of model resolution of the SI effect occurs as a result of competitive, spreading activation without the involvement of inhibitory links. These assumptions were tested by randomly presenting participants with trials from semantically-related and lexical control distractor conditions and acquiring image volumes coincident with the estimated peak hemodynamic response for each trial. Overt vocalization of picture names occurred in the absence of scanner noise, allowing reaction time (RT) data to be collected. Analysis of the RT data confirmed the SI effect. Regions showing differential hemodynamic responses during the SI effect included the left mid section of the middle temporal gyrus, left posterior superior temporal gyrus, left anterior cingulate cortex, and bilateral orbitomedial prefrontal cortex. Additional responses were observed in the frontal eye fields, left inferior parietal lobule, and right anterior temporal and occipital cortex. The results are interpreted as indirectly supporting interactive models that allow spreading activation between both conceptual processing and phonological retrieval levels of word production. In addition, the data confirm that selective attention/response suppression has a role in resolving the SI effect similar to the way in which Stroop interference is resolved. We conclude that neuroimaging studies can provide information about the neuroanatomical organization of the lexical system that may prove useful for constraining theoretical models of word production.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Naming impairments in aphasia are typically targeted using semantic and/or phonologically based tasks. However, it is not known whether these treatments have different neural mechanisms. Eight participants with aphasia received twelve treatment sessions using an alternating treatment design, with fMRI scans pre- and post-treatment. Half the sessions employed Phonological Components Analysis (PCA), and half the sessions employed Semantic Feature Analysis (SFA). Pre-treatment activity in the left caudate correlated with greater immediate treatment success for items treated with SFA, whereas recruitment of the left supramarginal gyrus and right precuneus post-treatment correlated with greater immediate treatment success for items treated with PCA. The results support previous studies that have found greater treatment outcome to be associated with activity in predominantly left hemisphere regions, and suggest that different mechanisms may be engaged dependent on the type of treatment employed.