903 resultados para Probabilistic latent semantic analysis (PLSA)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a research of linguistic structure of Bulgarian bells knowledge. The idea of building semantic structure of Bulgarian bells appeared during the “Multimedia fund - BellKnow” project. In this project was collected a lots of data about bells, their structure, history, technical data, etc. This is the first attempt for computation linguistic explain of bell knowledge and deliver a semantic representation of that knowledge. Based on this research some linguistic components, aiming to realize different types of analysis of text objects are implemented in term dictionaries. Thus, we lay the foundation of the linguistic analysis services in these digital dictionaries aiding the research of kinds, number and frequency of the lexical units that constitute various bell objects.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: Primary 60J45, 60J50, 35Cxx; Secondary 31Cxx.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Center for Epidemiologic Studies-Depression Scale (CES-D) is the most frequently used scale for measuring depressive symptomatology in caregiving research. The aim of this study is to test its construct structure and measurement equivalence between caregivers from two Spanish-speaking countries. Face-to-face interviews were carried out with 595 female dementia caregivers from Madrid, Spain, and from Coahuila, Mexico. The structure of the CES-D was analyzed using exploratory and confirmatory factor analysis (EFA and CFA, respectively). Measurement invariance across samples was analyzed comparing a baseline model with a more restrictive model. Significant differences between means were found for 7 items. The results of the EFA clearly supported a four-factor solution. The CFA for the whole sample with the four factors revealed high and statistically significant loading coefficients for all items (except item number 4). When equality constraints were imposed to test for the invariance between countries, the change in chi-square was significant, indicating that complete invariance could not be assumed. Significant between-countries differences were found for three of the four latent factor mean scores. Although the results provide general support for the original four-factor structure, caution should be exercised on reporting comparisons of depression scores between Spanish-speaking countries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The focus of this thesis is the extension of topographic visualisation mappings to allow for the incorporation of uncertainty. Few visualisation algorithms in the literature are capable of mapping uncertain data with fewer able to represent observation uncertainties in visualisations. As such, modifications are made to NeuroScale, Locally Linear Embedding, Isomap and Laplacian Eigenmaps to incorporate uncertainty in the observation and visualisation spaces. The proposed mappings are then called Normally-distributed NeuroScale (N-NS), T-distributed NeuroScale (T-NS), Probabilistic LLE (PLLE), Probabilistic Isomap (PIso) and Probabilistic Weighted Neighbourhood Mapping (PWNM). These algorithms generate a probabilistic visualisation space with each latent visualised point transformed to a multivariate Gaussian or T-distribution, using a feed-forward RBF network. Two types of uncertainty are then characterised dependent on the data and mapping procedure. Data dependent uncertainty is the inherent observation uncertainty. Whereas, mapping uncertainty is defined by the Fisher Information of a visualised distribution. This indicates how well the data has been interpolated, offering a level of ‘surprise’ for each observation. These new probabilistic mappings are tested on three datasets of vectorial observations and three datasets of real world time series observations for anomaly detection. In order to visualise the time series data, a method for analysing observed signals and noise distributions, Residual Modelling, is introduced. The performance of the new algorithms on the tested datasets is compared qualitatively with the latent space generated by the Gaussian Process Latent Variable Model (GPLVM). A quantitative comparison using existing evaluation measures from the literature allows performance of each mapping function to be compared. Finally, the mapping uncertainty measure is combined with NeuroScale to build a deep learning classifier, the Cascading RBF. This new structure is tested on the MNist dataset achieving world record performance whilst avoiding the flaws seen in other Deep Learning Machines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays financial institutions due to regulation and internal motivations care more intensively on their risks. Besides previously dominating market and credit risk new trend is to handle operational risk systematically. Operational risk is the risk of loss resulting from inadequate or failed internal processes, people and systems or from external events. First we show the basic features of operational risk and its modelling and regulatory approaches, and after we will analyse operational risk in an own developed simulation model framework. Our approach is based on the analysis of latent risk process instead of manifest risk process, which widely popular in risk literature. In our model the latent risk process is a stochastic risk process, so called Ornstein- Uhlenbeck process, which is a mean reversion process. In the model framework we define catastrophe as breach of a critical barrier by the process. We analyse the distributions of catastrophe frequency, severity and first time to hit, not only for single process, but for dual process as well. Based on our first results we could not falsify the Poisson feature of frequency, and long tail feature of severity. Distribution of “first time to hit” requires more sophisticated analysis. At the end of paper we examine advantages of simulation based forecasting, and finally we concluding with the possible, further research directions to be done in the future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis research describes the design and implementation of a Semantic Geographic Information System (GIS) and the creation of its spatial database. The database schema is designed and created, and all textual and spatial data are loaded into the database with the help of the Semantic DBMS's Binary Database Interface currently being developed at the FIU's High Performance Database Research Center (HPDRC). A friendly graphical user interface is created together with the other main system's areas: displaying process, data animation, and data retrieval. All these components are tightly integrated to form a novel and practical semantic GIS that has facilitated the interpretation, manipulation, analysis, and display of spatial data like: Ocean Temperature, Ozone(TOMS), and simulated SeaWiFS data. At the same time, this system has played a major role in the testing process of the HPDRC's high performance and efficient parallel Semantic DBMS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Photoproduction of neutral kaons off a deuteron target has been investigated at the Tohoku University Laboratory of Nuclear Science. The PID methods investigated incorporated a combination of momentum, velocity (β=v/c), and energy deposition per unit length (dE/dx) measurements. The analysis demonstrates that energy deposition and time of flight are exceedingly useful. A higher signal to background ratio was achieved for hard cuts in combination. A probabilistic likelihood estimation approach (LE) as a method for PID was also explored. The probability of a particle being correctly identified by this LE method and the preliminary results denote the need for highly precise limitations on the distributions from which the parameters would be extracted. It was confirmed that these PID are applicable approaches to properly identify pions for the analysis of this experiment. However, the background evident in the mass spectra points to the need for a higher level of proton identification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Contexte : En dépit du fait que la tuberculose est un problème de santé publique important dans les pays en voie de développement, les pays occidentaux doivent faire face à des taux d'infection important chez certaines populations immigrantes. Le risque de développer la TB active est 10% plus élevé chez les personnes atteintes de TB latente si elles ne reçoivent pas de traitement adéquat. La détection et le traitement opportun de la TB latente sont non seulement nécessaires pour préserver la santé de l'individu atteint mais aussi pour réduire le fardeau socio- économique et sanitaire du pays hôte. Les taux d'observance des traitements préventifs de TB latente sont faibles et une solution efficace à ce problème est requise pour contrôler la prévalence de l'infection. L'objectif de ce mémoire est d'identifier les facteurs qui contribuent à l'observance thérapeutique des traitements de TB latente auprès de nouveaux arrivants dans les pays occidentaux où les taux endémiques sont faibles. Méthodologie : Une revue systématique a été effectuée à partir de bases de données et répertoires scientifiques reconnus tels Medline, Medline in Process, Embase, Global Health, Cumulative Index to Nursing, le CINAHL et la librairie Cochrane pour en citer quelques un. Les études recensées ont été publiées après 1997 en français, en anglais, conduites auprès de populations immigrantes de l'occident (Canada, Etats-Unis, Europe, Royaume-Uni, Australie et la Nouvelle Zélande) dont le statut socio-économique est homogène. Résultats : Au total, neuf (9) études réalisées aux Etats-Unis sur des immigrants originaires de différents pays où la TB est endémique ont été analysées: deux (2) études qualitatives ethnographiques, six (6) quantitatives observationnelles et une (1) quantitative interventionnelle. Les facteurs sociodémographiques, les caractéristiques individuelles, familiales, ainsi que des déterminants liés à l'accès et à la prestation des services et soins de santé, ont été analysés pour identifier des facteurs d'observance thérapeutique. L'âge, le nombre d'années passées dans le pays hôte, le sexe, le statut civil, l'emploi, le pays d'origine, le soutien familiale et les effets secondaires et indésirables du traitement de la TB ne sont pas des facteurs ii déterminants de l'adhésion au traitement préventif. Toutefois, l’accès à l'information et de l'éducation adaptées aux langues et cultures des populations immigrantes, sur la TB et des objectifs de traitement explicites, l'offre de plan de traitement plus court et mieux tolérés, un environnement stable, un encadrement et l'adhésion au suivi médical par des prestataires motivés ont émergés comme des déterminants d'observance thérapeutique. Conclusion et recommandation : Le manque d'observance thérapeutique du traitement de la TB latente (LTBI) par des populations immigrantes, qui sont déjà aux prises avec des difficultés d'intégration, de communication et économique, est un facteur de risque pour les pays occidentaux où les taux endémiques de TB sont faibles. Les résultats de notre étude suggèrent que des interventions adaptées, un suivi individuel, un encadrement clinique et des plans de traitement plus courts, peuvent grandement améliorer les taux d'observance et d'adhésion aux traitements préventifs, devenant ainsi un investissement pertinent pour les pays hôtes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The importance of non-destructive techniques (NDT) in structural health monitoring programmes is being critically felt in the recent times. The quality of the measured data, often affected by various environmental conditions can be a guiding factor in terms usefulness and prediction efficiencies of the various detection and monitoring methods used in this regard. Often, a preprocessing of the acquired data in relation to the affecting environmental parameters can improve the information quality and lead towards a significantly more efficient and correct prediction process. The improvement can be directly related to the final decision making policy about a structure or a network of structures and is compatible with general probabilistic frameworks of such assessment and decision making programmes. This paper considers a preprocessing technique employed for an image analysis based structural health monitoring methodology to identify sub-marine pitting corrosion in the presence of variable luminosity, contrast and noise affecting the quality of images. A preprocessing of the gray-level threshold of the various images is observed to bring about a significant improvement in terms of damage detection as compared to an automatically computed gray-level threshold. The case dependent adjustments of the threshold enable to obtain the best possible information from an existing image. The corresponding improvements are observed in a qualitative manner in the present study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.

Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper takes some of Melanie Klein’s ideas, which Bion (1961/1998) previously used to understand group dynamics, to analyse the discipline of management studies since its ‘birth’ in the United States in the late 19th century. Specifically, it focuses on the idealisation of work and play, and argues that at its inception, for idiosyncratic historical reasons, the discipline was rooted in a ‘paranoid-schizoid’ position in which work was idealised as good and play as bad. The paper maps out the peculiar set of factors and influences that brought this about. It then examines how and if, again following Klein, the discipline has evolved to the ‘depressive’ position, where the idealisations are replaced by a more ambiguous, holistic semantic frame. Seven different relationships between work and play are then described. The paper contends that the originary splitting and idealisation is foundational to the discipline, and provides an enduring basis for analysing management theory and practice. It concludes by using this splitting to map out five potential future trajectories for the discipline.