447 resultados para Latent Semantic Analysis
Resumo:
The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.
Resumo:
This thesis addressed issues that have prevented qualitative researchers from using thematic discovery algorithms. The central hypothesis evaluated whether allowing qualitative researchers to interact with thematic discovery algorithms and incorporate domain knowledge improved their ability to address research questions and trust the derived themes. Non-negative Matrix Factorisation and Latent Dirichlet Allocation find latent themes within document collections but these algorithms are rarely used, because qualitative researchers do not trust and cannot interact with the themes that are automatically generated. The research determined the types of interactivity that qualitative researchers require and then evaluated interactive algorithms that matched these requirements. Theoretical contributions included the articulation of design guidelines for interactive thematic discovery algorithms, the development of an Evaluation Model and a Conceptual Framework for Interactive Content Analysis.
Resumo:
This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Resumo:
Despite its widespread use, there has been limited examination of the underlying factor structure of the Psychological Sense of School Membership (PSSM) scale. The current study examined the psychometric properties of the PSSM to refine its utility for researchers and practitioners using a sample of 504 Australian high school students. Results from exploratory and confirmatory factor analyses indicated that the PSSM is a multidimensional instrument. Factor analysis procedures identified three factors representing related aspects of students’ perceptions of their school membership: caring relationships, acceptance, and rejection
Resumo:
This thesis investigates the coefficient of performance (COP) of a hybrid liquid desiccant solar cooling system. This hybrid cooling system includes three sections: 1) conventional air-conditioning section; 2) liquid desiccant dehumidification section and 3) air mixture section. The air handling unit (AHU) with mixture variable air volume design is included in the hybrid cooling system to control humidity. In the combined system, the air is first dehumidified in the dehumidifier and then mixed with ambient air by AHU before entering the evaporator. Experiments using lithium chloride as the liquid desiccant have been carried out for the performance evaluation of the dehumidifier and regenerator. Based on the air mixture (AHU) design, the electrical coefficient of performance (ECOP), thermal coefficient of performance (TCOP) and whole system coefficient of performance (COPsys) models used in the hybrid liquid desiccant solar cooing system were developed to evaluate this system performance. These mathematical models can be used to describe the coefficient of performance trend under different ambient conditions, while also providing a convenient comparison with conventional air conditioning systems. These models provide good explanations about the relationship between the performance predictions of models and ambient air parameters. The simulation results have revealed the coefficient of performance in hybrid liquid desiccant solar cooling systems substantially depends on ambient air and dehumidifier parameters. Also, the liquid desiccant experiments prove that the latent component of the total cooling load requirements can be easily fulfilled by using the liquid desiccant dehumidifier. While cooling requirements can be met, the liquid desiccant system is however still subject to the hysteresis problems.
Resumo:
This paper presents a study on estimating the latent demand for rail transit in Australian context. Based on travel mode-choice modelling, a two-stage analysis approach is proposed, namely market population identification and mode share estimation. A case study is conducted on Midland-Fremantle rail transit corridor in Perth, Western Australia. The required data mainly include journey-to-work trip data from Australian Bureau of Statistics Census 2006 and work-purpose mode-choice model in Perth Strategic Transport Evaluation Model. The market profile is analysed, such as catchment areas, market population, mode shares, mode specific trip distributions and average trip distances. A numerical simulation is performed to test the sensitivity of the transit ridership to the change of fuel price. A corridor-level transit demand function of fuel price is thus obtained and its characteristics of elasticity are discussed. This study explores a viable approach to developing a decision-support tool for the assessment of short-term impacts of policy and operational adjustments on corridor-level demand for rail transit.
Resumo:
Asset health inspections can produce two types of indicators: (1) direct indicators (e.g. the thickness of a brake pad, and the crack depth on a gear) which directly relate to a failure mechanism; and (2) indirect indicators (e.g. the indicators extracted from vibration signals and oil analysis data) which can only partially reveal a failure mechanism. While direct indicators enable more precise references to asset health condition, they are often more difficult to obtain than indirect indicators. The state space model provides an efficient approach to estimating direct indicators by using indirect indicators. However, existing state space models to estimate direct indicators largely depend on assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The discrete time assumption requires fixed inspection intervals. The discrete state assumption entails discretising continuous degradation indicators, which often introduces additional errors. The linear and Gaussian assumptions are not consistent with nonlinear and irreversible degradation processes in most engineering assets. This paper proposes a state space model without these assumptions. Monte Carlo-based algorithms are developed to estimate the model parameters and the remaining useful life. These algorithms are evaluated for performance using numerical simulations through MATLAB. The result shows that both the parameters and the remaining useful life are estimated accurately. Finally, the new state space model is used to process vibration and crack depth data from an accelerated test of a gearbox. During this application, the new state space model shows a better fitness result than the state space model with linear and Gaussian assumption.
Resumo:
It is recognised that individuals do not always respond honestly when completing psychological tests. One of the foremost issues for research in this area is the inability to detect individuals attempting to fake. While a number of strategies have been identified in faking, a commonality of these strategies is the latent role of long term memory. Seven studies were conducted in order to examine whether it is possible to detect the activation of faking related cognitions using a lexical decision task. Study 1 found that engagement with experiential processing styles predicted the ability to fake successfully, confirming the role of associative processing styles in faking. After identifying appropriate stimuli for the lexical decision task (Studies 2A and 2B), Studies 3 to 5 examined whether a cognitive state of faking could be primed and subsequently identified, using a lexical decision task. Throughout the course of these studies, the experimental methodology was increasingly refined in an attempt to successfully identify the relevant priming mechanisms. The results were consistent and robust throughout the three priming studies: faking good on a personality test primed positive faking related words in the lexical decision tasks. Faking bad, however, did not result in reliable priming of negative faking related cognitions. To more completely address potential issues with the stimuli and the possible role of affective priming, two additional studies were conducted. Studies 6A and 6B revealed that negative faking related words were more arousing than positive faking related words, and that positive faking related words were more abstract than negative faking related words and neutral words. Study 7 examined whether the priming effects evident in the lexical decision tasks occurred as a result of an unintentional mood induction while faking the psychological tests. Results were equivocal in this regard. This program of research aligned the fields of psychological assessment and cognition to inform the preliminary development and validation of a new tool to detect faking. Consequently, an implicit technique to identify attempts to fake good on a psychological test has been identified, using long established and robust cognitive theories in a novel and innovative way. This approach represents a new paradigm for the detection of individuals responding strategically to psychological testing. With continuing development and validation, this technique may have immense utility in the field of psychological assessment.
Resumo:
Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.
Resumo:
Critical analysis and problem-solving skills are two graduate attributes that are important in ensuring that graduates are well equipped in working across research and practice settings within the discipline of psychology. Despite the importance of these skills, few psychology undergraduate programmes have undertaken any systematic development, implementation, and evaluation of curriculum activities to foster these graduate skills. The current study reports on the development and implementation of a tutorial programme designed to enhance the critical analysis and problem-solving skills of undergraduate psychology students. Underpinned by collaborative learning and problem-based learning, the tutorial programme was administered to 273 third year undergraduate students in psychology. Latent Growth Curve Modelling revealed that students demonstrated a significant linear increase in self-reported critical analysis and problem-solving skills across the tutorial programme. The findings suggest that the development of inquiry-based curriculum offers important opportunities for psychology undergraduates to develop critical analysis and problem-solving skills.
Resumo:
Chinese modal particles feature prominently in Chinese people’s daily use of the language, but their pragmatic and semantic functions are elusive as commonly recognised by Chinese linguists and teachers of Chinese as a foreign language. This book originates from an extensive and intensive empirical study of the Chinese modal particle a (啊), one of the most frequently used modal particles in Mandarin Chinese. In order to capture all the uses and the underlying meanings of the particle, the author transcribed the first 20 episodes, about 20 hours in length, of the popular Chinese TV drama series Kewang ‘Expectations’, which yielded a corpus data of more than 142’000 Chinese characters with a total of 1829 instances of the particle all used in meaningful communicative situations. Within its context of use, every single occurrence of the particle was analysed in terms of its pragmatic and semantic contributions to the hosting utterance. Upon this basis the core meanings were identified which were seen as constituting the modal nature of the particle.