930 resultados para Probabilistic latent semantic analysis (PLSA)
Resumo:
Nowadays financial institutions due to regulation and internal motivations care more intensively on their risks. Besides previously dominating market and credit risk new trend is to handle operational risk systematically. Operational risk is the risk of loss resulting from inadequate or failed internal processes, people and systems or from external events. First we show the basic features of operational risk and its modelling and regulatory approaches, and after we will analyse operational risk in an own developed simulation model framework. Our approach is based on the analysis of latent risk process instead of manifest risk process, which widely popular in risk literature. In our model the latent risk process is a stochastic risk process, so called Ornstein- Uhlenbeck process, which is a mean reversion process. In the model framework we define catastrophe as breach of a critical barrier by the process. We analyse the distributions of catastrophe frequency, severity and first time to hit, not only for single process, but for dual process as well. Based on our first results we could not falsify the Poisson feature of frequency, and long tail feature of severity. Distribution of “first time to hit” requires more sophisticated analysis. At the end of paper we examine advantages of simulation based forecasting, and finally we concluding with the possible, further research directions to be done in the future.
Resumo:
An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^
Resumo:
To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.
Resumo:
This thesis research describes the design and implementation of a Semantic Geographic Information System (GIS) and the creation of its spatial database. The database schema is designed and created, and all textual and spatial data are loaded into the database with the help of the Semantic DBMS's Binary Database Interface currently being developed at the FIU's High Performance Database Research Center (HPDRC). A friendly graphical user interface is created together with the other main system's areas: displaying process, data animation, and data retrieval. All these components are tightly integrated to form a novel and practical semantic GIS that has facilitated the interpretation, manipulation, analysis, and display of spatial data like: Ocean Temperature, Ozone(TOMS), and simulated SeaWiFS data. At the same time, this system has played a major role in the testing process of the HPDRC's high performance and efficient parallel Semantic DBMS.
Resumo:
The Photoproduction of neutral kaons off a deuteron target has been investigated at the Tohoku University Laboratory of Nuclear Science. The PID methods investigated incorporated a combination of momentum, velocity (β=v/c), and energy deposition per unit length (dE/dx) measurements. The analysis demonstrates that energy deposition and time of flight are exceedingly useful. A higher signal to background ratio was achieved for hard cuts in combination. A probabilistic likelihood estimation approach (LE) as a method for PID was also explored. The probability of a particle being correctly identified by this LE method and the preliminary results denote the need for highly precise limitations on the distributions from which the parameters would be extracted. It was confirmed that these PID are applicable approaches to properly identify pions for the analysis of this experiment. However, the background evident in the mass spectra points to the need for a higher level of proton identification.
Resumo:
Contexte : En dépit du fait que la tuberculose est un problème de santé publique important dans les pays en voie de développement, les pays occidentaux doivent faire face à des taux d'infection important chez certaines populations immigrantes. Le risque de développer la TB active est 10% plus élevé chez les personnes atteintes de TB latente si elles ne reçoivent pas de traitement adéquat. La détection et le traitement opportun de la TB latente sont non seulement nécessaires pour préserver la santé de l'individu atteint mais aussi pour réduire le fardeau socio- économique et sanitaire du pays hôte. Les taux d'observance des traitements préventifs de TB latente sont faibles et une solution efficace à ce problème est requise pour contrôler la prévalence de l'infection. L'objectif de ce mémoire est d'identifier les facteurs qui contribuent à l'observance thérapeutique des traitements de TB latente auprès de nouveaux arrivants dans les pays occidentaux où les taux endémiques sont faibles. Méthodologie : Une revue systématique a été effectuée à partir de bases de données et répertoires scientifiques reconnus tels Medline, Medline in Process, Embase, Global Health, Cumulative Index to Nursing, le CINAHL et la librairie Cochrane pour en citer quelques un. Les études recensées ont été publiées après 1997 en français, en anglais, conduites auprès de populations immigrantes de l'occident (Canada, Etats-Unis, Europe, Royaume-Uni, Australie et la Nouvelle Zélande) dont le statut socio-économique est homogène. Résultats : Au total, neuf (9) études réalisées aux Etats-Unis sur des immigrants originaires de différents pays où la TB est endémique ont été analysées: deux (2) études qualitatives ethnographiques, six (6) quantitatives observationnelles et une (1) quantitative interventionnelle. Les facteurs sociodémographiques, les caractéristiques individuelles, familiales, ainsi que des déterminants liés à l'accès et à la prestation des services et soins de santé, ont été analysés pour identifier des facteurs d'observance thérapeutique. L'âge, le nombre d'années passées dans le pays hôte, le sexe, le statut civil, l'emploi, le pays d'origine, le soutien familiale et les effets secondaires et indésirables du traitement de la TB ne sont pas des facteurs ii déterminants de l'adhésion au traitement préventif. Toutefois, l’accès à l'information et de l'éducation adaptées aux langues et cultures des populations immigrantes, sur la TB et des objectifs de traitement explicites, l'offre de plan de traitement plus court et mieux tolérés, un environnement stable, un encadrement et l'adhésion au suivi médical par des prestataires motivés ont émergés comme des déterminants d'observance thérapeutique. Conclusion et recommandation : Le manque d'observance thérapeutique du traitement de la TB latente (LTBI) par des populations immigrantes, qui sont déjà aux prises avec des difficultés d'intégration, de communication et économique, est un facteur de risque pour les pays occidentaux où les taux endémiques de TB sont faibles. Les résultats de notre étude suggèrent que des interventions adaptées, un suivi individuel, un encadrement clinique et des plans de traitement plus courts, peuvent grandement améliorer les taux d'observance et d'adhésion aux traitements préventifs, devenant ainsi un investissement pertinent pour les pays hôtes.
Resumo:
The importance of non-destructive techniques (NDT) in structural health monitoring programmes is being critically felt in the recent times. The quality of the measured data, often affected by various environmental conditions can be a guiding factor in terms usefulness and prediction efficiencies of the various detection and monitoring methods used in this regard. Often, a preprocessing of the acquired data in relation to the affecting environmental parameters can improve the information quality and lead towards a significantly more efficient and correct prediction process. The improvement can be directly related to the final decision making policy about a structure or a network of structures and is compatible with general probabilistic frameworks of such assessment and decision making programmes. This paper considers a preprocessing technique employed for an image analysis based structural health monitoring methodology to identify sub-marine pitting corrosion in the presence of variable luminosity, contrast and noise affecting the quality of images. A preprocessing of the gray-level threshold of the various images is observed to bring about a significant improvement in terms of damage detection as compared to an automatically computed gray-level threshold. The case dependent adjustments of the threshold enable to obtain the best possible information from an existing image. The corresponding improvements are observed in a qualitative manner in the present study.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.
Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.
Resumo:
This paper takes some of Melanie Klein’s ideas, which Bion (1961/1998) previously used to understand group dynamics, to analyse the discipline of management studies since its ‘birth’ in the United States in the late 19th century. Specifically, it focuses on the idealisation of work and play, and argues that at its inception, for idiosyncratic historical reasons, the discipline was rooted in a ‘paranoid-schizoid’ position in which work was idealised as good and play as bad. The paper maps out the peculiar set of factors and influences that brought this about. It then examines how and if, again following Klein, the discipline has evolved to the ‘depressive’ position, where the idealisations are replaced by a more ambiguous, holistic semantic frame. Seven different relationships between work and play are then described. The paper contends that the originary splitting and idealisation is foundational to the discipline, and provides an enduring basis for analysing management theory and practice. It concludes by using this splitting to map out five potential future trajectories for the discipline.
Resumo:
The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.
To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.
The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.
Resumo:
Bayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Resumo:
The work presented in this dissertation is focused on applying engineering methods to develop and explore probabilistic survival models for the prediction of decompression sickness in US NAVY divers. Mathematical modeling, computational model development, and numerical optimization techniques were employed to formulate and evaluate the predictive quality of models fitted to empirical data. In Chapters 1 and 2 we present general background information relevant to the development of probabilistic models applied to predicting the incidence of decompression sickness. The remainder of the dissertation introduces techniques developed in an effort to improve the predictive quality of probabilistic decompression models and to reduce the difficulty of model parameter optimization.
The first project explored seventeen variations of the hazard function using a well-perfused parallel compartment model. Models were parametrically optimized using the maximum likelihood technique. Model performance was evaluated using both classical statistical methods and model selection techniques based on information theory. Optimized model parameters were overall similar to those of previously published Results indicated that a novel hazard function definition that included both ambient pressure scaling and individually fitted compartment exponent scaling terms.
We developed ten pharmacokinetic compartmental models that included explicit delay mechanics to determine if predictive quality could be improved through the inclusion of material transfer lags. A fitted discrete delay parameter augmented the inflow to the compartment systems from the environment. Based on the observation that symptoms are often reported after risk accumulation begins for many of our models, we hypothesized that the inclusion of delays might improve correlation between the model predictions and observed data. Model selection techniques identified two models as having the best overall performance, but comparison to the best performing model without delay and model selection using our best identified no delay pharmacokinetic model both indicated that the delay mechanism was not statistically justified and did not substantially improve model predictions.
Our final investigation explored parameter bounding techniques to identify parameter regions for which statistical model failure will not occur. When a model predicts a no probability of a diver experiencing decompression sickness for an exposure that is known to produce symptoms, statistical model failure occurs. Using a metric related to the instantaneous risk, we successfully identify regions where model failure will not occur and identify the boundaries of the region using a root bounding technique. Several models are used to demonstrate the techniques, which may be employed to reduce the difficulty of model optimization for future investigations.
Resumo:
Contexte : En dépit du fait que la tuberculose est un problème de santé publique important dans les pays en voie de développement, les pays occidentaux doivent faire face à des taux d'infection important chez certaines populations immigrantes. Le risque de développer la TB active est 10% plus élevé chez les personnes atteintes de TB latente si elles ne reçoivent pas de traitement adéquat. La détection et le traitement opportun de la TB latente sont non seulement nécessaires pour préserver la santé de l'individu atteint mais aussi pour réduire le fardeau socio- économique et sanitaire du pays hôte. Les taux d'observance des traitements préventifs de TB latente sont faibles et une solution efficace à ce problème est requise pour contrôler la prévalence de l'infection. L'objectif de ce mémoire est d'identifier les facteurs qui contribuent à l'observance thérapeutique des traitements de TB latente auprès de nouveaux arrivants dans les pays occidentaux où les taux endémiques sont faibles. Méthodologie : Une revue systématique a été effectuée à partir de bases de données et répertoires scientifiques reconnus tels Medline, Medline in Process, Embase, Global Health, Cumulative Index to Nursing, le CINAHL et la librairie Cochrane pour en citer quelques un. Les études recensées ont été publiées après 1997 en français, en anglais, conduites auprès de populations immigrantes de l'occident (Canada, Etats-Unis, Europe, Royaume-Uni, Australie et la Nouvelle Zélande) dont le statut socio-économique est homogène. Résultats : Au total, neuf (9) études réalisées aux Etats-Unis sur des immigrants originaires de différents pays où la TB est endémique ont été analysées: deux (2) études qualitatives ethnographiques, six (6) quantitatives observationnelles et une (1) quantitative interventionnelle. Les facteurs sociodémographiques, les caractéristiques individuelles, familiales, ainsi que des déterminants liés à l'accès et à la prestation des services et soins de santé, ont été analysés pour identifier des facteurs d'observance thérapeutique. L'âge, le nombre d'années passées dans le pays hôte, le sexe, le statut civil, l'emploi, le pays d'origine, le soutien familiale et les effets secondaires et indésirables du traitement de la TB ne sont pas des facteurs ii déterminants de l'adhésion au traitement préventif. Toutefois, l’accès à l'information et de l'éducation adaptées aux langues et cultures des populations immigrantes, sur la TB et des objectifs de traitement explicites, l'offre de plan de traitement plus court et mieux tolérés, un environnement stable, un encadrement et l'adhésion au suivi médical par des prestataires motivés ont émergés comme des déterminants d'observance thérapeutique. Conclusion et recommandation : Le manque d'observance thérapeutique du traitement de la TB latente (LTBI) par des populations immigrantes, qui sont déjà aux prises avec des difficultés d'intégration, de communication et économique, est un facteur de risque pour les pays occidentaux où les taux endémiques de TB sont faibles. Les résultats de notre étude suggèrent que des interventions adaptées, un suivi individuel, un encadrement clinique et des plans de traitement plus courts, peuvent grandement améliorer les taux d'observance et d'adhésion aux traitements préventifs, devenant ainsi un investissement pertinent pour les pays hôtes.
Resumo:
An investigation into karst hazard in southern Ontario has been undertaken with the intention of leading to the development of predictive karst models for this region. The reason these are not currently feasible is a lack of sufficient karst data, though this is not entirely due to the lack of karst features. Geophysical data was collected at Lake on the Mountain, Ontario as part of this karst investigation. This data was collected in order to validate the long-standing hypothesis that Lake on the Mountain was formed from a sinkhole collapse. Sub-bottom acoustic profiling data was collected in order to image the lake bottom sediments and bedrock. Vertical bedrock features interpreted as solutionally enlarged fractures were taken as evidence for karst processes on the lake bottom. Additionally, the bedrock topography shows a narrower and more elongated basin than was previously identified, and this also lies parallel to a mapped fault system in the area. This suggests that Lake on the Mountain was formed over a fault zone which also supports the sinkhole hypothesis as it would provide groundwater pathways for karst dissolution to occur. Previous sediment cores suggest that Lake on the Mountain would have formed at some point during the Wisconsinan glaciation with glacial meltwater and glacial loading as potential contributing factors to sinkhole development. A probabilistic karst model for the state of Kentucky, USA, has been generated using the Weights of Evidence method. This model is presented as an example of the predictive capabilities of these kind of data-driven modelling techniques and to show how such models could be applied to karst in Ontario. The model was able to classify 70% of the validation dataset correctly while minimizing false positive identifications. This is moderately successful and could stand to be improved. Finally, suggestions to improving the current karst model of southern Ontario are suggested with the goal of increasing investigation into karst in Ontario and streamlining the reporting system for sinkholes, caves, and other karst features so as to improve the current Ontario karst database.