858 resultados para Robust Probabilistic Model, Dyslexic Users, Rewriting, Question-Answering
Resumo:
Much of the real-world dataset, including textual data, can be represented using graph structures. The use of graphs to represent textual data has many advantages, mainly related to maintaining a more significant amount of information, such as the relationships between words and their types. In recent years, many neural network architectures have been proposed to deal with tasks on graphs. Many of them consider only node features, ignoring or not giving the proper relevance to relationships between them. However, in many node classification tasks, they play a fundamental role. This thesis aims to analyze the main GNNs, evaluate their advantages and disadvantages, propose an innovative solution considered as an extension of GAT, and apply them to a case study in the biomedical field. We propose the reference GNNs, implemented with methodologies later analyzed, and then applied to a question answering system in the biomedical field as a replacement for the pre-existing GNN. We attempt to obtain better results by using models that can accept as input both node and edge features. As shown later, our proposed models can beat the original solution and define the state-of-the-art for the task under analysis.
Resumo:
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster.
Resumo:
Online enquiry communities such as Question Answering (Q&A) websites allow people to seek answers to all kind of questions. With the growing popularity of such platforms, it is important for community managers to constantly monitor the performance of their communities. Although different metrics have been proposed for tracking the evolution of such communities, maturity, the process in which communities become more topic proficient over time, has been largely ignored despite its potential to help in identifying robust communities. In this paper, we interpret community maturity as the proportion of complex questions in a community at a given time. We use the Server Fault (SF) community, a Question Answering (Q&A) community of system administrators, as our case study and perform analysis on question complexity, the level of expertise required to answer a question. We show that question complexity depends on both the length of involvement and the level of contributions of the users who post questions within their community. We extract features relating to askers, answerers, questions and answers, and analyse which features are strongly correlated with question complexity. Although our findings highlight the difficulty of automatically identifying question complexity, we found that complexity is more influenced by both the topical focus and the length of community involvement of askers. Following the identification of question complexity, we define a measure of maturity and analyse the evolution of different topical communities. Our results show that different topical communities show different maturity patterns. Some communities show a high maturity at the beginning while others exhibit slow maturity rate. Copyright 2013 ACM.
Resumo:
A novel framework for probabilistic-based structural assessment of existing structures, which combines model identification and reliability assessment procedures, considering in an objective way different sources of uncertainty, is presented in this paper. A short description of structural assessment applications, provided in literature, is initially given. Then, the developed model identification procedure, supported in a robust optimization algorithm, is presented. Special attention is given to both experimental and numerical errors, to be considered in this algorithm convergence criterion. An updated numerical model is obtained from this process. The reliability assessment procedure, which considers a probabilistic model for the structure in analysis, is then introduced, incorporating the results of the model identification procedure. The developed model is then updated, as new data is acquired, through a Bayesian inference algorithm, explicitly addressing statistical uncertainty. Finally, the developed framework is validated with a set of reinforced concrete beams, which were loaded up to failure in laboratory.
Resumo:
We introduce a classification-based approach to finding occluding texture boundaries. The classifier is composed of a set of weak learners, which operate on image intensity discriminative features that are defined on small patches and are fast to compute. A database that is designed to simulate digitized occluding contours of textured objects in natural images is used to train the weak learners. The trained classifier score is then used to obtain a probabilistic model for the presence of texture transitions, which can readily be used for line search texture boundary detection in the direction normal to an initial boundary estimate. This method is fast and therefore suitable for real-time and interactive applications. It works as a robust estimator, which requires a ribbon-like search region and can handle complex texture structures without requiring a large number of observations. We demonstrate results both in the context of interactive 2D delineation and of fast 3D tracking and compare its performance with other existing methods for line search boundary detection.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
There are strong uncertainties regarding LAI dynamics in forest ecosystems in response to climate change. While empirical growth & yield models (G&YMs) provide good estimations of tree growth at the stand level on a yearly to decennial scale, process-based models (PBMs) use LAI dynamics as a key variable for enabling the accurate prediction of tree growth over short time scales. Bridging the gap between PBMs and G&YMs could improve the prediction of forest growth and, therefore, carbon, water and nutrient fluxes by combining modeling approaches at the stand level.Our study aimed to estimate monthly changes of leaf area in response to climate variations from sparse measurements of foliage area and biomass. A leaf population probabilistic model (SLCD) was designed to simulate foliage renewal. The leaf population was distributed in monthly cohorts, and the total population size was limited depending on forest age and productivity. Foliage dynamics were driven by a foliation function and the probabilities ruling leaf aging or fall. Their formulation depends on the forest environment.The model was applied to three tree species growing under contrasting climates and soil types. In tropical Brazilian evergreen broadleaf eucalypt plantations, the phenology was described using 8 parameters. A multi-objective evolutionary algorithm method (MOEA) was used to fit the model parameters on litterfall and LAI data over an entire stand rotation. Field measurements from a second eucalypt stand were used to validate the model. Seasonal LAI changes were accurately rendered for both sites (R-2 = 0.898 adjustment, R-2 = 0.698 validation). Litterfall production was correctly simulated (R-2 = 0.562, R-2 = 0.4018 validation) and may be improved by using additional validation data in future work. In two French temperate deciduous forests (beech and oak), we adapted phenological sub-modules of the CASTANEA model to simulate canopy dynamics, and SLCD was validated using LAI measurements. The phenological patterns were simulated with good accuracy in the two cases studied. However, IA/max was not accurately simulated in the beech forest, and further improvement is required.Our probabilistic approach is expected to contribute to improving predictions of LAI dynamics. The model formalism is general and suitable to broadleaf forests for a large range of ecological conditions. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
The determination of skeletal loading conditions in vivo and their relationship to the health of bone tissues, remain an open question. Computational modeling of the musculoskeletal system is the only practicable method providing a valuable approach to muscle and joint loading analyses, although crucial shortcomings limit the translation process of computational methods into the orthopedic and neurological practice. A growing attention focused on subject-specific modeling, particularly when pathological musculoskeletal conditions need to be studied. Nevertheless, subject-specific data cannot be always collected in the research and clinical practice, and there is a lack of efficient methods and frameworks for building models and incorporating them in simulations of motion. The overall aim of the present PhD thesis was to introduce improvements to the state-of-the-art musculoskeletal modeling for the prediction of physiological muscle and joint loads during motion. A threefold goal was articulated as follows: (i) develop state-of-the art subject-specific models and analyze skeletal load predictions; (ii) analyze the sensitivity of model predictions to relevant musculotendon model parameters and kinematic uncertainties; (iii) design an efficient software framework simplifying the effort-intensive phases of subject-specific modeling pre-processing. The first goal underlined the relevance of subject-specific musculoskeletal modeling to determine physiological skeletal loads during gait, corroborating the choice of full subject-specific modeling for the analyses of pathological conditions. The second goal characterized the sensitivity of skeletal load predictions to major musculotendon parameters and kinematic uncertainties, and robust probabilistic methods were applied for methodological and clinical purposes. The last goal created an efficient software framework for subject-specific modeling and simulation, which is practical, user friendly and effort effective. Future research development aims at the implementation of more accurate models describing lower-limb joint mechanics and musculotendon paths, and the assessment of an overall scenario of the crucial model parameters affecting the skeletal load predictions through probabilistic modeling.
Resumo:
This Letter addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album [1]. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework [2]. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.
Resumo:
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.
Resumo:
We propose three research problems to explore the relations between trust and security in the setting of distributed computation. In the first problem, we study trust-based adversary detection in distributed consensus computation. The adversaries we consider behave arbitrarily disobeying the consensus protocol. We propose a trust-based consensus algorithm with local and global trust evaluations. The algorithm can be abstracted using a two-layer structure with the top layer running a trust-based consensus algorithm and the bottom layer as a subroutine executing a global trust update scheme. We utilize a set of pre-trusted nodes, headers, to propagate local trust opinions throughout the network. This two-layer framework is flexible in that it can be easily extensible to contain more complicated decision rules, and global trust schemes. The first problem assumes that normal nodes are homogeneous, i.e. it is guaranteed that a normal node always behaves as it is programmed. In the second and third problems however, we assume that nodes are heterogeneous, i.e, given a task, the probability that a node generates a correct answer varies from node to node. The adversaries considered in these two problems are workers from the open crowd who are either investing little efforts in the tasks assigned to them or intentionally give wrong answers to questions. In the second part of the thesis, we consider a typical crowdsourcing task that aggregates input from multiple workers as a problem in information fusion. To cope with the issue of noisy and sometimes malicious input from workers, trust is used to model workers' expertise. In a multi-domain knowledge learning task, however, using scalar-valued trust to model a worker's performance is not sufficient to reflect the worker's trustworthiness in each of the domains. To address this issue, we propose a probabilistic model to jointly infer multi-dimensional trust of workers, multi-domain properties of questions, and true labels of questions. Our model is very flexible and extensible to incorporate metadata associated with questions. To show that, we further propose two extended models, one of which handles input tasks with real-valued features and the other handles tasks with text features by incorporating topic models. Our models can effectively recover trust vectors of workers, which can be very useful in task assignment adaptive to workers' trust in the future. These results can be applied for fusion of information from multiple data sources like sensors, human input, machine learning results, or a hybrid of them. In the second subproblem, we address crowdsourcing with adversaries under logical constraints. We observe that questions are often not independent in real life applications. Instead, there are logical relations between them. Similarly, workers that provide answers are not independent of each other either. Answers given by workers with similar attributes tend to be correlated. Therefore, we propose a novel unified graphical model consisting of two layers. The top layer encodes domain knowledge which allows users to express logical relations using first-order logic rules and the bottom layer encodes a traditional crowdsourcing graphical model. Our model can be seen as a generalized probabilistic soft logic framework that encodes both logical relations and probabilistic dependencies. To solve the collective inference problem efficiently, we have devised a scalable joint inference algorithm based on the alternating direction method of multipliers. The third part of the thesis considers the problem of optimal assignment under budget constraints when workers are unreliable and sometimes malicious. In a real crowdsourcing market, each answer obtained from a worker incurs cost. The cost is associated with both the level of trustworthiness of workers and the difficulty of tasks. Typically, access to expert-level (more trustworthy) workers is more expensive than to average crowd and completion of a challenging task is more costly than a click-away question. In this problem, we address the problem of optimal assignment of heterogeneous tasks to workers of varying trust levels with budget constraints. Specifically, we design a trust-aware task allocation algorithm that takes as inputs the estimated trust of workers and pre-set budget, and outputs the optimal assignment of tasks to workers. We derive the bound of total error probability that relates to budget, trustworthiness of crowds, and costs of obtaining labels from crowds naturally. Higher budget, more trustworthy crowds, and less costly jobs result in a lower theoretical bound. Our allocation scheme does not depend on the specific design of the trust evaluation component. Therefore, it can be combined with generic trust evaluation algorithms.
Resumo:
Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.
Resumo:
Multiple sampling is widely used in vadose zone percolation experiments to investigate the extent in which soil structure heterogeneities influence the spatial and temporal distributions of water and solutes. In this note, a simple, robust, mathematical model, based on the beta-statistical distribution, is proposed as a method of quantifying the magnitude of heterogeneity in such experiments. The model relies on fitting two parameters, alpha and zeta to the cumulative elution curves generated in multiple-sample percolation experiments. The model does not require knowledge of the soil structure. A homogeneous or uniform distribution of a solute and/or soil-water is indicated by alpha = zeta = 1, Using these parameters, a heterogeneity index (HI) is defined as root 3 times the ratio of the standard deviation and mean. Uniform or homogeneous flow of water or solutes is indicated by HI = 1 and heterogeneity is indicated by HI > 1. A large value for this index may indicate preferential flow. The heterogeneity index relies only on knowledge of the elution curves generated from multiple sample percolation experiments and is, therefore, easily calculated. The index may also be used to describe and compare the differences in solute and soil-water percolation from different experiments. The use of this index is discussed for several different leaching experiments. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
An important consideration in the development of mathematical models for dynamic simulation, is the identification of the appropriate mathematical structure. By building models with an efficient structure which is devoid of redundancy, it is possible to create simple, accurate and functional models. This leads not only to efficient simulation, but to a deeper understanding of the important dynamic relationships within the process. In this paper, a method is proposed for systematic model development for startup and shutdown simulation which is based on the identification of the essential process structure. The key tool in this analysis is the method of nonlinear perturbations for structural identification and model reduction. Starting from a detailed mathematical process description both singular and regular structural perturbations are detected. These techniques are then used to give insight into the system structure and where appropriate to eliminate superfluous model equations or reduce them to other forms. This process retains the ability to interpret the reduced order model in terms of the physico-chemical phenomena. Using this model reduction technique it is possible to attribute observable dynamics to particular unit operations within the process. This relationship then highlights the unit operations which must be accurately modelled in order to develop a robust plant model. The technique generates detailed insight into the dynamic structure of the models providing a basis for system re-design and dynamic analysis. The technique is illustrated on the modelling for an evaporator startup. Copyright (C) 1996 Elsevier Science Ltd
Resumo:
Dissertação de natureza científica realizada para obtenção do grau de Mestre em Engenharia Informática e de Computadores