Biblioteca Digital

17 resultados para Latent classes analysis

em Aston University Research Archive

Insufficient discriminant validity:a comment on Bove, Pervan, Beatty and Shiu (2009)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bove, Pervan, Beatty, and Shiu [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] develop and test a latent variable model of the role of service workers in encouraging customers' organizational citizenship behaviors. However, Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] claim support for hypothesized relationships between constructs that, due to insufficient discriminant validity regarding certain constructs, may be inaccurate. This research comment discusses what discriminant validity represents, procedures for establishing discriminant validity, and presents an example of inaccurate discriminant validity assessment based upon the work of Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.]. Solutions to discriminant validity problems and a five-step procedure for assessing discriminant validity then conclude the paper. This comment hopes to motivate a review of discriminant validity issues and offers assistance to future researchers conducting latent variable analysis.

The interplay between microscopic and mesoscopic structures in complex networks

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Understanding a complex network's structure holds the key to understanding its function. The physics community has contributed a multitude of methods and analyses to this cross-disciplinary endeavor. Structural features exist on both the microscopic level, resulting from differences between single node properties, and the mesoscopic level resulting from properties shared by groups of nodes. Disentangling the determinants of network structure on these different scales has remained a major, and so far unsolved, challenge. Here we show how multiscale generative probabilistic exponential random graph models combined with efficient, distributive message-passing inference techniques can be used to achieve this separation of scales, leading to improved detection accuracy of latent classes as demonstrated on benchmark problems. It sheds new light on the statistical significance of motif-distributions in neural networks and improves the link-prediction accuracy as exemplified for gene-disease associations in the highly consequential Online Mendelian Inheritance in Man database. © 2011 Reichardt et al.

Automatic summary assessment for intelligent tutoring systems

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Summary writing is an important part of many English Language Examinations. As grading students' summary writings is a very time-consuming task, computer-assisted assessment will help teachers carry out the grading more effectively. Several techniques such as latent semantic analysis (LSA), n-gram co-occurrence and BLEU have been proposed to support automatic evaluation of summaries. However, their performance is not satisfactory for assessing summary writings. To improve the performance, this paper proposes an ensemble approach that integrates LSA and n-gram co-occurrence. As a result, the proposed ensemble approach is able to achieve high accuracy and improve the performance quite substantially compared with current techniques. A summary assessment system based on the proposed approach has also been developed.

PDNAsite:identification of DNA-binding site from protein sequence by incorporating spatial and sequence context

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.

Phonological processing skills and the prediction of word reading in beginning and intermediate readers:a latent variable multi-group analysis

Relevância:

40.00% 40.00%

Publicador:

A hierarchical latent variable model for data visualization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.

Probabilistic principal component analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

Probabilistic principal component analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.

Analysis of magnetoencephaoographic data as a nonlinear dynamical system

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents the results from an investigation into the merits of analysing Magnetoencephalographic (MEG) data in the context of dynamical systems theory. MEG is the study of both the methods for the measurement of minute magnetic flux variations at the scalp, resulting from neuro-electric activity in the neocortex, as well as the techniques required to process and extract useful information from these measurements. As a result of its unique mode of action - by directly measuring neuronal activity via the resulting magnetic field fluctuations - MEG possesses a number of useful qualities which could potentially make it a powerful addition to any brain researcher's arsenal. Unfortunately, MEG research has so far failed to fulfil its early promise, being hindered in its progress by a variety of factors. Conventionally, the analysis of MEG has been dominated by the search for activity in certain spectral bands - the so-called alpha, delta, beta, etc that are commonly referred to in both academic and lay publications. Other efforts have centred upon generating optimal fits of "equivalent current dipoles" that best explain the observed field distribution. Many of these approaches carry the implicit assumption that the dynamics which result in the observed time series are linear. This is despite a variety of reasons which suggest that nonlinearity might be present in MEG recordings. By using methods that allow for nonlinear dynamics, the research described in this thesis avoids these restrictive linearity assumptions. A crucial concept underpinning this project is the belief that MEG recordings are mere observations of the evolution of the true underlying state, which is unobservable and is assumed to reflect some abstract brain cognitive state. Further, we maintain that it is unreasonable to expect these processes to be adequately described in the traditional way: as a linear sum of a large number of frequency generators. One of the main objectives of this thesis will be to prove that much more effective and powerful analysis of MEG can be achieved if one were to assume the presence of both linear and nonlinear characteristics from the outset. Our position is that the combined action of a relatively small number of these generators, coupled with external and dynamic noise sources, is more than sufficient to account for the complexity observed in the MEG recordings. Another problem that has plagued MEG researchers is the extremely low signal to noise ratios that are obtained. As the magnetic flux variations resulting from actual cortical processes can be extremely minute, the measuring devices used in MEG are, necessarily, extremely sensitive. The unfortunate side-effect of this is that even commonplace phenomena such as the earth's geomagnetic field can easily swamp signals of interest. This problem is commonly addressed by averaging over a large number of recordings. However, this has a number of notable drawbacks. In particular, it is difficult to synchronise high frequency activity which might be of interest, and often these signals will be cancelled out by the averaging process. Other problems that have been encountered are high costs and low portability of state-of-the- art multichannel machines. The result of this is that the use of MEG has, hitherto, been restricted to large institutions which are able to afford the high costs associated with the procurement and maintenance of these machines. In this project, we seek to address these issues by working almost exclusively with single channel, unaveraged MEG data. We demonstrate the applicability of a variety of methods originating from the fields of signal processing, dynamical systems, information theory and neural networks, to the analysis of MEG data. It is noteworthy that while modern signal processing tools such as independent component analysis, topographic maps and latent variable modelling have enjoyed extensive success in a variety of research areas from financial time series modelling to the analysis of sun spot activity, their use in MEG analysis has thus far been extremely limited. It is hoped that this work will help to remedy this oversight.

An analysis of fatal accidents in agriculture in Great Britain

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The number of fatal accidents in the agricultural, horticultural and forestry industry in Great Britain has declined from an annual rate of about 135 in the 1960's to its current level of about 50. Changes to the size and makeup of the population at risk mean that there has been no real improvement in fatal injury incidence rates for farmers. The Health and Safety Executives' (HSE) current system of accident investigation, recording, and analysis is directed primarily at identifying fault, allocating blame, and punishing wrongdoers. Relatively little information is recorded about the personal and organisational factors that contributed to, or failed to prevent accidents. To develop effective preventive strategies, it is important to establish whether errors by the victims and others, occur at the skills, rules, or knowledge level of functioning: are violations of some rule or procedure; or stem from failures to correctly appraise, or control a hazard. A modified version of the Hale and Glendon accident causation model was used to study 230 fatal accidents. Inspectors' original reports were examined and expert judgement applied to identify and categorise the errors committed by each of the parties involved. The highest proportion of errors that led directly to accidents occurred whilst the victims were operating at the knowledge level. The mix and proportion of errors varied considerably between different classes of victim and kind of accident. Different preventive strategies will be needed to address the problem areas identified.

Joint sentiment/topic model for sentiment analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

Feature analysis of spammers in social networks with active honeypots:a case study of Chinese microblogging networks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this poster we presented our preliminary work on the study of spammer detection and analysis with 50 active honeypot profiles implemented on Weibo.com and QQ.com microblogging networks. We picked out spammers from legitimate users by manually checking every captured user's microblogs content. We built a spammer dataset for each social network community using these spammer accounts and a legitimate user dataset as well. We analyzed several features of the two user classes and made a comparison on these features, which were found to be useful to distinguish spammers from legitimate users. The followings are several initial observations from our analysis on the features of spammers captured on Weibo.com and QQ.com. ¦The following/follower ratio of spammers is usually higher than legitimate users. They tend to follow a large amount of users in order to gain popularity but always have relatively few followers. ¦There exists a big gap between the average numbers of microblogs posted per day from these two classes. On Weibo.com, spammers post quite a lot microblogs every day, which is much more than legitimate users do; while on QQ.com spammers post far less microblogs than legitimate users. This is mainly due to the different strategies taken by spammers on these two platforms. ¦More spammers choose a cautious spam posting pattern. They mix spam microblogs with ordinary ones so that they can avoid the anti-spam mechanisms taken by the service providers. ¦Aggressive spammers are more likely to be detected so they tend to have a shorter life while cautious spammers can live much longer and have a deeper influence on the network. The latter kind of spammers may become the trend of social network spammer. © 2012 IEEE.

Incorporating social role theory into topic models for social media content analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.

Nidek OPD-scan analysis of normal, keratoconic, and penetrating keratoplasty eyes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To determine by wavefront analysis the difference between eyes considered normal, eyes diagnosed with keratoconus, and eyes that have undergone penetrating keratoplasty METHODS: The Nidek OPD-Scan wavefront aberrometer was used to measure ocular aberrations out to the sixth Zernike order. One hundred and thirty eyes that were free of ocular pathology, 41 eyes diagnosed with keratoconus, and 8 eyes that had undergone penetrating keratoplasty were compared for differences in root mean square value. Three and five millimeter root mean square values of the refractive power aberrometry maps of the three classes of eyes were compared. Radially symmetric and irregular higher order aberration values were compared for differences in magnitude. RESULTS: Root mean square values were lower in eyes free of ocular pathology compared to eyes with keratoconus and eyes that had undergone penetrating keratoplasty. The aberrations were larger with the 5-mm pupil. Coma and spherical aberration values were lower in normal eyes. CONCLUSION: Wavefront aberrometry of normal, pathological, and eyes after surgery may help to explain the visual distortions encountered by patients. The ability to measure highly aberrated eyes allows an objective assessment of the optical consequences of ocular pathology and surgery. The Nidek OPD-Scan can be used in areas other than refractive surgery.

From icon to naturalised icon:a linguistic analysis of media representations of the BP Deepwater Horizon crisis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research explores how news media reports construct representations of a business crisis through language. In an innovative approach to dealing with the vast pool of potentially relevant texts, media texts concerning the BP Deepwater Horizon oil spill are gathered from three different time points: immediately after the explosion in 2010, one year later in 2011 and again in 2012. The three sets of 'BP texts' are investigated using discourse analysis and semi-quantitative methods within a semiotic framework that gives an account of language at the semiotic levels of sign, code, mythical meaning and ideology. The research finds in the texts three discourses of representation concerning the crisis that show a movement from the ostensibly representational to the symbolic and conventional: a discourse of 'objective factuality', a discourse of 'positioning' and a discourse of 'redeployment'. This progression can be shown to have useful parallels with Peirce's sign classes of Icon, Index and Symbol, with their implied movement from a clear motivation by the Object (in this case the disaster events), to an arbitrary, socially-agreed connection. However, the naturalisation of signs, whereby ideologies are encoded in ways of speaking and writing that present them as 'taken for granted' is at its most complete when it is least discernible. The findings suggest that media coverage is likely to move on from symbolic representation to a new kind of iconicity, through a fourth discourse of 'naturalisation'. Here the representation turns back towards ostensible factuality or iconicity, to become the 'naturalised icon'. This work adds to the study of media representation a heuristic for understanding how the meaning-making of a news story progresses. It offers a detailed account of what the stages of this progression 'look like' linguistically, and suggests scope for future research into both language characteristics of phases and different news-reported phenomena.

«
1
2
»