932 resultados para Probabilistic latent semantic model
Resumo:
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster.
Resumo:
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of click-stream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.
Resumo:
Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.
Resumo:
This article develops a latent class model for estimating willingness-to-pay for public goods using simultaneously contingent valuation (CV) and attitudinal data capturing protest attitudes related to the lack of trust in public institutions providing those goods. A measure of the social cost associated with protest responses and the consequent loss in potential contributions for providing the public good is proposed. The presence of potential justification biases is further considered, that is, the possibility that for psychological reasons the response to the CV question affects the answers to the attitudinal questions. The results from our empirical application suggest that psychological factors should not be ignored in CV estimation for policy purposes, allowing for a correct identification of protest responses.
Resumo:
According to a recent Eurobarometer survey (2014), 68% of Europeans tend not to trust national governments. As the increasing alienation of citizens from politics endangers democracy and welfare, governments, practitioners and researchers look for innovative means to engage citizens in policy matters. One of the measures intended to overcome the so-called democratic deficit is the promotion of civic participation. Digital media proliferation offers a set of novel characteristics related to interactivity, ubiquitous connectivity, social networking and inclusiveness that enable new forms of societal-wide collaboration with a potential impact on leveraging participative democracy. Following this trend, e-Participation is an emerging research area that consists in the use of Information and Communication Technologies to mediate and transform the relations among citizens and governments towards increasing citizens’ participation in public decision-making. However, despite the widespread efforts to implement e-Participation through research programs, new technologies and projects, exhaustive studies on the achieved outcomes reveal that it has not yet been successfully incorporated in institutional politics. Given the problems underlying e-Participation implementation, the present research suggested that, rather than project-oriented efforts, the cornerstone for successfully implementing e-Participation in public institutions as a sustainable added-value activity is a systematic organisational planning, embodying the principles of open-governance and open-engagement. It further suggested that BPM, as a management discipline, can act as a catalyst to enable the desired transformations towards value creation throughout the policy-making cycle, including political, organisational and, ultimately, citizen value. Following these findings, the primary objective of this research was to provide an instrumental model to foster e-Participation sustainability across Government and Public Administration towards a participatory, inclusive, collaborative and deliberative democracy. The developed artefact, consisting in an e-Participation Organisational Semantic Model (ePOSM) underpinned by a BPM-steered approach, introduces this vision. This approach to e-Participation was modelled through a semi-formal lightweight ontology stack structured in four sub-ontologies, namely e-Participation Strategy, Organisational Units, Functions and Roles. The ePOSM facilitates e-Participation sustainability by: (1) Promoting a common and cross-functional understanding of the concepts underlying e-Participation implementation and of their articulation that bridges the gap between technical and non-technical users; (2) Providing an organisational model which allows a centralised and consistent roll-out of strategy-driven e-Participation initiatives, supported by operational units dedicated to the execution of transformation projects and participatory processes; (3) Providing a standardised organisational structure, goals, functions and roles related to e-Participation processes that enhances process-level interoperability among government agencies; (4) Providing a representation usable in software development for business processes’ automation, which allows advanced querying using a reasoner or inference engine to retrieve concrete and specific information about the e-Participation processes in place. An evaluation of the achieved outcomes, as well a comparative analysis with existent models, suggested that this innovative approach tackling the organisational planning dimension can constitute a stepping stone to harness e-Participation value.
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
In this paper we present the theoretical and methodologicalfoundations for the development of a multi-agentSelective Dissemination of Information (SDI) servicemodel that applies Semantic Web technologies for specializeddigital libraries. These technologies make possibleachieving more efficient information management,improving agent–user communication processes, andfacilitating accurate access to relevant resources. Othertools used are fuzzy linguistic modelling techniques(which make possible easing the interaction betweenusers and system) and natural language processing(NLP) techniques for semiautomatic thesaurus generation.Also, RSS feeds are used as “current awareness bulletins”to generate personalized bibliographic alerts.
Resumo:
We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal
Resumo:
We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges.
Resumo:
This volume is a serious attempt to open up the subject of European philosophy of science to real thought, and provide the structural basis for the interdisciplinary development of its specialist fields, but also to provoke reflection on the idea of ‘European philosophy of science’. This efforts should foster a contemporaneous reflection on what might be meant by philosophy of science in Europe and European philosophy of science, and how in fact awareness of it could assist philosophers interpret and motivate their research through a stronger collective identity. The overarching aim is to set the background for a collaborative project organising, systematising, and ultimately forging an identity for, European philosophy of science by creating research structures and developing research networks across Europe to promote its development.
Resumo:
Building Information Modeling (BIM) is the process of structuring, capturing, creating, and managing a digital representation of physical and/or functional characteristics of a built space [1]. Current BIM has limited ability to represent dynamic semantics, social information, often failing to consider building activity, behavior and context; thus limiting integration with intelligent, built-environment management systems. Research, such as the development of Semantic Exchange Modules, and/or the linking of IFC with semantic web structures, demonstrates the need for building models to better support complex semantic functionality. To implement model semantics effectively, however, it is critical that model designers consider semantic information constructs. This paper discusses semantic models with relation to determining the most suitable information structure. We demonstrate how semantic rigidity can lead to significant long-term problems that can contribute to model failure. A sufficiently detailed feasibility study is advised to maximize the value from the semantic model. In addition we propose a set of questions, to be used during a model’s feasibility study, and guidelines to help assess the most suitable method for managing semantics in a built environment.
Resumo:
An ability to quantify the reliability of probabilistic flood inundation predictions is a requirement not only for guiding model development but also for their successful application. Probabilistic flood inundation predictions are usually produced by choosing a method of weighting the model parameter space, but previous study suggests that this choice leads to clear differences in inundation probabilities. This study aims to address the evaluation of the reliability of these probabilistic predictions. However, a lack of an adequate number of observations of flood inundation for a catchment limits the application of conventional methods of evaluating predictive reliability. Consequently, attempts have been made to assess the reliability of probabilistic predictions using multiple observations from a single flood event. Here, a LISFLOOD-FP hydraulic model of an extreme (>1 in 1000 years) flood event in Cockermouth, UK, is constructed and calibrated using multiple performance measures from both peak flood wrack mark data and aerial photography captured post-peak. These measures are used in weighting the parameter space to produce multiple probabilistic predictions for the event. Two methods of assessing the reliability of these probabilistic predictions using limited observations are utilized; an existing method assessing the binary pattern of flooding, and a method developed in this paper to assess predictions of water surface elevation. This study finds that the water surface elevation method has both a better diagnostic and discriminatory ability, but this result is likely to be sensitive to the unknown uncertainties in the upstream boundary condition