903 resultados para Probabilistic latent semantic analysis (PLSA)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bove, Pervan, Beatty, and Shiu [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] develop and test a latent variable model of the role of service workers in encouraging customers' organizational citizenship behaviors. However, Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] claim support for hypothesized relationships between constructs that, due to insufficient discriminant validity regarding certain constructs, may be inaccurate. This research comment discusses what discriminant validity represents, procedures for establishing discriminant validity, and presents an example of inaccurate discriminant validity assessment based upon the work of Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.]. Solutions to discriminant validity problems and a five-step procedure for assessing discriminant validity then conclude the paper. This comment hopes to motivate a review of discriminant validity issues and offers assistance to future researchers conducting latent variable analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper deals with methods of choice in the INTERNET of natural-language textual fragments relevant to a given theme. Relevancy is estimated on the basis of semantic analysis of sentences. Recognition of syntactic and semantic connections between words of the text is carried out by the analysis of combinations of inflections and prepositions, without use of categories and rules of traditional grammar. Choice in the INTERNET of the thematic information is organized cyclically with automatic forming of the new key at every cycle when addressing to the INTERNET.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study aims to analyze the content and measures of accuracy of the nursing diagnosis Ineffective Self Health in patients undergoing hemodialysis. Study of nursing diagnosis validation carried out in two stages, namely: content analysis by judges and accuracy of clinical indicators. In the first stage, 22 judges evaluated the setting and location of the diagnosis, clinical indicators and etiological factors and their conceptual and empirical definitions. We used the binomial test to determine the proportion of the judges of the relevance of the components of the nursing diagnosis. In the second stage, we used the Latent Class Analysis for the diagnostic accuracy by evaluating 200 patients in a hemodialysis clinic in northeastern Brazil. Research approved by the Ethics Committee, under the Opinion No 387 837 and CAAE 18486413.0.0000.5537. The results show that the judges evaluated as pertinent clinical indicators 12 and 22 etiological factors. Proposed amendment of the nomenclature of five indicators and six factors and the implementation of a clinical indicator for etiology and three etiological factors for clinical indicators. In conceptual and empirical definitions, judges judged as not relevant the conceptual and empirical definitions of a clinical indicator, the conceptual definitions of two etiological factors and empirical definitions four etiological factors. Still, changes were suggested in the conceptual and empirical definitions of two clinical indicators, the conceptual definitions of 12 etiological factors and empirical definitions of 11 etiological factors. Clinical indicators analyzed in the first stage were validated clinically in patients undergoing hemodialysis. The most frequent clinical indicators were Changes in laboratory tests (100%) and daily life choices ineffective to achieve health goals (81%); and three etiological factors had a higher frequency, they are: unfavorable demographic factors (94.5%), beliefs (79%) and comorbidities (77.5%). From Latent class analysis, diagnosis prevalence was estimated at 66.28%. Clinical indicators that showed the best sensitivity measures for the nursing diagnosis Ineffective Self Health were: daily life choices ineffective to achieve health goals and Expression of difficulty with prescribed regimens. In turn, the clinical indicators of inappropriate medication use, no expression of desire to control the disease, irregular attendance to the dialysis sessions and infection were more specific as to that diagnosis. Non-adherence to treatment was the only indicator that showed confidence intervals with values for sensitivity and specificity, statistically above 0.5, being the one who has better diagnostic accuracy as the inference of the nursing diagnosis Ineffective Self Health in hemodialysis clientele. Thus, it is believed that the improvement of the components of diagnosis in question will contribute to the development of more reliable nursing interventions to the health status of the individual in hemodialysis, providing a more scientifically qualified care.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study aims to analyze the content and measures of accuracy of the nursing diagnosis Ineffective Self Health in patients undergoing hemodialysis. Study of nursing diagnosis validation carried out in two stages, namely: content analysis by judges and accuracy of clinical indicators. In the first stage, 22 judges evaluated the setting and location of the diagnosis, clinical indicators and etiological factors and their conceptual and empirical definitions. We used the binomial test to determine the proportion of the judges of the relevance of the components of the nursing diagnosis. In the second stage, we used the Latent Class Analysis for the diagnostic accuracy by evaluating 200 patients in a hemodialysis clinic in northeastern Brazil. Research approved by the Ethics Committee, under the Opinion No 387 837 and CAAE 18486413.0.0000.5537. The results show that the judges evaluated as pertinent clinical indicators 12 and 22 etiological factors. Proposed amendment of the nomenclature of five indicators and six factors and the implementation of a clinical indicator for etiology and three etiological factors for clinical indicators. In conceptual and empirical definitions, judges judged as not relevant the conceptual and empirical definitions of a clinical indicator, the conceptual definitions of two etiological factors and empirical definitions four etiological factors. Still, changes were suggested in the conceptual and empirical definitions of two clinical indicators, the conceptual definitions of 12 etiological factors and empirical definitions of 11 etiological factors. Clinical indicators analyzed in the first stage were validated clinically in patients undergoing hemodialysis. The most frequent clinical indicators were Changes in laboratory tests (100%) and daily life choices ineffective to achieve health goals (81%); and three etiological factors had a higher frequency, they are: unfavorable demographic factors (94.5%), beliefs (79%) and comorbidities (77.5%). From Latent class analysis, diagnosis prevalence was estimated at 66.28%. Clinical indicators that showed the best sensitivity measures for the nursing diagnosis Ineffective Self Health were: daily life choices ineffective to achieve health goals and Expression of difficulty with prescribed regimens. In turn, the clinical indicators of inappropriate medication use, no expression of desire to control the disease, irregular attendance to the dialysis sessions and infection were more specific as to that diagnosis. Non-adherence to treatment was the only indicator that showed confidence intervals with values for sensitivity and specificity, statistically above 0.5, being the one who has better diagnostic accuracy as the inference of the nursing diagnosis Ineffective Self Health in hemodialysis clientele. Thus, it is believed that the improvement of the components of diagnosis in question will contribute to the development of more reliable nursing interventions to the health status of the individual in hemodialysis, providing a more scientifically qualified care.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this doctoral thesis analyzed the discursive representations of the bandit Lampião, the Lantern and his bandits gang in news mossoroenses newspapers published in the twenties of the last century (1927), when the gang invasion of the city of Mossoro in the state of Rio Grande do Norte, on June 13 of that year. To this end, we take as basis the theoretical assumptions of linguistics Textual, especially the narrower context of what is known today as Textual Analysis of the Discourses (ADT), theoretical and descriptive approach to linguistic studies of the text proposed by the French linguist Jean-Michel Adam. In this approach, we are interested in, specifically, the semantic level of the text, highlighting the notion of discursive representation, studied based on benchmarking operations, predication, modification, spatial location and temporal connection and analogy (ADAM, 2011; CASTILHO, 2010; KOCH, 2002, 2006; MARCUSCHI, 1998, 2008; NEVES, 2007; RODRIGUES, PASSEGGI & SILVA NETO, 2010). The corpus of this research consists of three reports in the twenties of the last century in newspapers The Mossoroense, Correio do Povo and the Northeast, and reconstituted through the collection held in the Municipal Museum Lauro Scotland files, Memorial Resistance Mossoro, both located in Natal, and in the news collection of Lampião newspapers in Natal, north of Rio Grande Raimundo Nonato historian. The discursive representations are built from the use of semantic analysis operations. Lampião to, the following representations are built: bandit, head of bandits, briber, defeated, Captain and Lord. To the outlaws of Lampião bunch of the following discursive representations were built: group, gang, gangsters, mates, bloodthirsty pack, brigands, bandits, criminals, burglar horde, and wild beasts. These representations reveal mainly the views of the newspapers of that time, which represented mainly the interests of traders, politicians, the government itself and generally Mossoró population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El artículo identifica y analiza el discurso predominante que poseen 12 niños y 7 niñas de 7° y 8° año básico pertenecientes a 4 establecimientos educacionales en la ciudad de Talca en Chile, en torno a la transgresión de las identidades tradicionales de la mujer en los videojuegos. Para ello durante el 1° semestre del año 2014 al interior de un programa de formación de profesores/as en Artes Visuales se implementa una estrategia didáctica centrada en la expresión gráfica denominada “Crea tu propia personaje para videojuego”. Haciendo partícipes a niños y niñas junto a profesionales en formación de una propuesta metodológica basada en la Investigación-Acción enmarcada en las prácticas profesionales. Concluyendo tras el análisis semántico de dibujos y relatos, que las imágenes representativas de la mujer en los videojuegos transgreden las identidades tradicionales de género al interior de un marco androcéntrico predominante.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is a study about the way in which se structures are represented in 20 verb entries of nine dictionaries of Spanish language. There is a large number of these structures and they are problematic for native and non native speakers. Verbs of the analysis are middle-high frequency and, in the most part of the cases, very polysemous, and this allows to observe interconnections between the different se structures and the different meanings of each verb. Data of the lexicographic analysis are cross-checked with corpus analysis of the same units. As a result, it is observed that there is a large variety in the data which are offered in each dictionary and in the way they are offered, inter and intradictionary. The reasons range from the theoretical overall of each Project to practical performance. This leads to the conclusion that it is necessary to further progress in the dictionary model it is being handled, in order to offer lexico-grammatical phenomenon such as se verbs in an accurate, clear and exhaustive way.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present study focuses on the frequency of phrasal verbs with the particle up in the context of crime and police investigative work. This research emerges from the need to enlarge McCarthy and O’Dell’s (2004) scope from purely criminal behavior to police investigative actions. To do so, we relied on a corpus of 504,124 running words made up of spoken dialogues extracted from the script of the American TV series Castle shown on ABC since 2009. Based on Rudzka-Ostyn’s (2003) cognitive motivations for the particle up, we have identified five different meaning extensions for our phrasal verbs. Drawing from these findings, we have designed pedagogical activities for those L2 learners that study English at the Police Academy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. Individual trajectories toward aggression originate in early infancy, before there is intent to harm. We focused on infants who were contentious, i.e., prone to engage in anger and use of physical force with other people, and examined change in levels of contentiousness between 6 and 12 months of age with reference to later aggressive conduct problems.
Sample. The CCDS is a nationally representative sample of 321 firstborn children whose families were recruited from antenatal clinics in two National Health Service Trusts.
Method. Mothers, fathers, and a third family member or friend who knew infants well completed the Cardiff Infant Contentiousness Scale (CICS) at 6 months, which was stable form 6 to 12 months, and validated by direct observation of infants’ use of force against peers. Primary caregivers again completed the CICS at 12 months, and up to three informants completed the Child Behaviour Check List at mean ages of 36 and 84 months. We used Latent Transition Analysis to identify different groups of infants in respect to their patterns of contentiousness from 6 to 12 months.
Results
Three ordered classes of contentiousness from low to high were found at 6 and 12 months. Infants exposed to greater family adversity were more likely to move into the high-contentious class from 6 to 12 months. Higher contentiousness in infancy predicted more aggressive conduct problems at 33 months and thereafter.
Conclusions
Infants exposed to family adversity are already at disadvantage by 6 months and likely to escalate in their anger and aggressiveness over time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis I examine a variety of linguistic elements which involve ``alternative'' semantic values---a class arguably including focus, interrogatives, indefinites, and disjunctions---and the connections between these elements. This study focusses on the analysis of such elements in Sinhala, with comparison to Malayalam, Tlingit, and Japanese. The central part of the study concerns the proper syntactic and semantic analysis of Q[uestion]-particles (including Sinhala "da", Malayalam "-oo", Japanese "ka"), which, in many languages, appear not only in interrogatives, but also in the formation of indefinites, disjunctions, and relative clauses. This set of contexts is syntactically-heterogeneous, and so syntax does not offer an explanation for the appearance of Q-particles in this particular set of environments. I propose that these contexts can be united in terms of semantics, as all involving some element which denotes a set of ``alternatives''. Both wh-words and disjunctions can be analysed as creating Hamblin-type sets of ``alternatives''. Q-particles can be treated as uniformly denoting variables over choice functions which apply to the aforementioned Hamblin-type sets, thus ``restoring'' the derivation to normal Montagovian semantics. The treatment of Q-particles as uniformly denoting variables over choice functions provides an explanation for why these particles appear in just this set of contexts: they all include an element with Hamblin-type semantics. However, we also find variation in the use of Q-particles; including, in some languages, the appearance of multiple morphologically-distinct Q-particles in different syntactic contexts. Such variation can be handled largely by positing that Q-particles may vary in their formal syntactic feature specifications, determining which syntactic contexts they are licensed in. The unified analysis of Q-particles as denoting variables over choice functions also raises various questions about the proper analysis of interrogatives, indefinites, and disjunctions, including issues concerning the nature of the semantics of wh-words and the syntactic structure of disjunction. As well, I observe that indefinites involving Q-particles have a crosslinguistic tendency to be epistemic indefinites, i.e. indefinites which explicitly signal ignorance of details regarding who or what satisfies the existential claim. I provide an account of such indefinites which draws on the analysis of Q-particles as variables over choice functions. These pragmatic ``signals of ignorance'' (which I argue to be presuppositions) also have a further role to play in determining the distribution of Q-particles in disjunctions. The final section of this study investigates the historical development of focus constructions and Q-particles in Sinhala. This diachronic study allows us not only to observe the origin and development of such elements, but also serves to delimit the range of possible synchronic analyses, thus providing us with further insights into the formal syntactic and semantic properties of Q-particles. This study highlights both the importance of considering various components of the grammar (e.g. syntax, semantics, pragmatics, morphology) and the use of philology in developing plausible formal analyses of complex linguistic phenomena such as the crosslinguistic distribution of Q-particles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Au Sénégal, les maladies diarrhéiques constituent un fardeau important, qui pèse encore lourdement sur la santé des enfants. Ces maladies sont influencées par un large éventail de facteurs, appartenant à différents niveaux et sphères d'analyse. Cet article analyse ces facteurs de risque et leur rôle relatif dans les maladies diarrhéiques de l'enfant à Dakar. Ce faisant, elle illustre une nouvelle approche pour synthétiser le réseau de ces déterminants. Une analyse en classes latentes (LCA) est d’abord menée, puis les variables latentes ainsi construites sont utilisées comme variables explicatives dans une régression logistique sur trois niveaux. Les résultats confirment que les déterminants des diarrhées chez l'enfant appartiennent aux trois niveaux d'analyse et que les facteurs comportementaux et l'assainissement du quartier jouent un rôle prépondérant. Les résultats illustrent aussi l'utilité des LCA pour synthétiser plusieurs indicateurs, afin de créer une image causale intégrée, tout en utilisant des modèles statistiques parcimonieux.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação (mestrado)—Universidade de Brasília, Faculdade de Educação, Programa de Pós-Graduação em Educação, 2016.