928 resultados para Robust Probabilistic Model, Dyslexic Users, Rewriting, Question-Answering


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract
Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional
Markov random field. Once estimated, it serves as a probabilistic database that supports various useful queries such as rating prediction and top-N recommendation. To handle the challenging problem of learning large networks of users and items, we employ a simple but effective pseudo-likelihood with regularisation. Experiments on the movie rating data demonstrate the merits of the PN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are strong uncertainties regarding LAI dynamics in forest ecosystems in response to climate change. While empirical growth & yield models (G&YMs) provide good estimations of tree growth at the stand level on a yearly to decennial scale, process-based models (PBMs) use LAI dynamics as a key variable for enabling the accurate prediction of tree growth over short time scales. Bridging the gap between PBMs and G&YMs could improve the prediction of forest growth and, therefore, carbon, water and nutrient fluxes by combining modeling approaches at the stand level.Our study aimed to estimate monthly changes of leaf area in response to climate variations from sparse measurements of foliage area and biomass. A leaf population probabilistic model (SLCD) was designed to simulate foliage renewal. The leaf population was distributed in monthly cohorts, and the total population size was limited depending on forest age and productivity. Foliage dynamics were driven by a foliation function and the probabilities ruling leaf aging or fall. Their formulation depends on the forest environment.The model was applied to three tree species growing under contrasting climates and soil types. In tropical Brazilian evergreen broadleaf eucalypt plantations, the phenology was described using 8 parameters. A multi-objective evolutionary algorithm method (MOEA) was used to fit the model parameters on litterfall and LAI data over an entire stand rotation. Field measurements from a second eucalypt stand were used to validate the model. Seasonal LAI changes were accurately rendered for both sites (R-2 = 0.898 adjustment, R-2 = 0.698 validation). Litterfall production was correctly simulated (R-2 = 0.562, R-2 = 0.4018 validation) and may be improved by using additional validation data in future work. In two French temperate deciduous forests (beech and oak), we adapted phenological sub-modules of the CASTANEA model to simulate canopy dynamics, and SLCD was validated using LAI measurements. The phenological patterns were simulated with good accuracy in the two cases studied. However, IA/max was not accurately simulated in the beech forest, and further improvement is required.Our probabilistic approach is expected to contribute to improving predictions of LAI dynamics. The model formalism is general and suitable to broadleaf forests for a large range of ecological conditions. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The determination of skeletal loading conditions in vivo and their relationship to the health of bone tissues, remain an open question. Computational modeling of the musculoskeletal system is the only practicable method providing a valuable approach to muscle and joint loading analyses, although crucial shortcomings limit the translation process of computational methods into the orthopedic and neurological practice. A growing attention focused on subject-specific modeling, particularly when pathological musculoskeletal conditions need to be studied. Nevertheless, subject-specific data cannot be always collected in the research and clinical practice, and there is a lack of efficient methods and frameworks for building models and incorporating them in simulations of motion. The overall aim of the present PhD thesis was to introduce improvements to the state-of-the-art musculoskeletal modeling for the prediction of physiological muscle and joint loads during motion. A threefold goal was articulated as follows: (i) develop state-of-the art subject-specific models and analyze skeletal load predictions; (ii) analyze the sensitivity of model predictions to relevant musculotendon model parameters and kinematic uncertainties; (iii) design an efficient software framework simplifying the effort-intensive phases of subject-specific modeling pre-processing. The first goal underlined the relevance of subject-specific musculoskeletal modeling to determine physiological skeletal loads during gait, corroborating the choice of full subject-specific modeling for the analyses of pathological conditions. The second goal characterized the sensitivity of skeletal load predictions to major musculotendon parameters and kinematic uncertainties, and robust probabilistic methods were applied for methodological and clinical purposes. The last goal created an efficient software framework for subject-specific modeling and simulation, which is practical, user friendly and effort effective. Future research development aims at the implementation of more accurate models describing lower-limb joint mechanics and musculotendon paths, and the assessment of an overall scenario of the crucial model parameters affecting the skeletal load predictions through probabilistic modeling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This Letter addresses image segmentation via a generative model approach. A Bayesian network (BNT) in the space of dyadic wavelet transform coefficients is introduced to model texture images. The model is similar to a Hidden Markov model (HMM), but with non-stationary transitive conditional probability distributions. It is composed of discrete hidden variables and observable Gaussian outputs for wavelet coefficients. In particular, the Gabor wavelet transform is considered. The introduced model is compared with the simplest joint Gaussian probabilistic model for Gabor wavelet coefficients for several textures from the Brodatz album [1]. The comparison is based on cross-validation and includes probabilistic model ensembles instead of single models. In addition, the robustness of the models to cope with additive Gaussian noise is investigated. We further study the feasibility of the introduced generative model for image segmentation in the novelty detection framework [2]. Two examples are considered: (i) sea surface pollution detection from intensity images and (ii) image segmentation of the still images with varying illumination across the scene.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose three research problems to explore the relations between trust and security in the setting of distributed computation. In the first problem, we study trust-based adversary detection in distributed consensus computation. The adversaries we consider behave arbitrarily disobeying the consensus protocol. We propose a trust-based consensus algorithm with local and global trust evaluations. The algorithm can be abstracted using a two-layer structure with the top layer running a trust-based consensus algorithm and the bottom layer as a subroutine executing a global trust update scheme. We utilize a set of pre-trusted nodes, headers, to propagate local trust opinions throughout the network. This two-layer framework is flexible in that it can be easily extensible to contain more complicated decision rules, and global trust schemes. The first problem assumes that normal nodes are homogeneous, i.e. it is guaranteed that a normal node always behaves as it is programmed. In the second and third problems however, we assume that nodes are heterogeneous, i.e, given a task, the probability that a node generates a correct answer varies from node to node. The adversaries considered in these two problems are workers from the open crowd who are either investing little efforts in the tasks assigned to them or intentionally give wrong answers to questions. In the second part of the thesis, we consider a typical crowdsourcing task that aggregates input from multiple workers as a problem in information fusion. To cope with the issue of noisy and sometimes malicious input from workers, trust is used to model workers' expertise. In a multi-domain knowledge learning task, however, using scalar-valued trust to model a worker's performance is not sufficient to reflect the worker's trustworthiness in each of the domains. To address this issue, we propose a probabilistic model to jointly infer multi-dimensional trust of workers, multi-domain properties of questions, and true labels of questions. Our model is very flexible and extensible to incorporate metadata associated with questions. To show that, we further propose two extended models, one of which handles input tasks with real-valued features and the other handles tasks with text features by incorporating topic models. Our models can effectively recover trust vectors of workers, which can be very useful in task assignment adaptive to workers' trust in the future. These results can be applied for fusion of information from multiple data sources like sensors, human input, machine learning results, or a hybrid of them. In the second subproblem, we address crowdsourcing with adversaries under logical constraints. We observe that questions are often not independent in real life applications. Instead, there are logical relations between them. Similarly, workers that provide answers are not independent of each other either. Answers given by workers with similar attributes tend to be correlated. Therefore, we propose a novel unified graphical model consisting of two layers. The top layer encodes domain knowledge which allows users to express logical relations using first-order logic rules and the bottom layer encodes a traditional crowdsourcing graphical model. Our model can be seen as a generalized probabilistic soft logic framework that encodes both logical relations and probabilistic dependencies. To solve the collective inference problem efficiently, we have devised a scalable joint inference algorithm based on the alternating direction method of multipliers. The third part of the thesis considers the problem of optimal assignment under budget constraints when workers are unreliable and sometimes malicious. In a real crowdsourcing market, each answer obtained from a worker incurs cost. The cost is associated with both the level of trustworthiness of workers and the difficulty of tasks. Typically, access to expert-level (more trustworthy) workers is more expensive than to average crowd and completion of a challenging task is more costly than a click-away question. In this problem, we address the problem of optimal assignment of heterogeneous tasks to workers of varying trust levels with budget constraints. Specifically, we design a trust-aware task allocation algorithm that takes as inputs the estimated trust of workers and pre-set budget, and outputs the optimal assignment of tasks to workers. We derive the bound of total error probability that relates to budget, trustworthiness of crowds, and costs of obtaining labels from crowds naturally. Higher budget, more trustworthy crowds, and less costly jobs result in a lower theoretical bound. Our allocation scheme does not depend on the specific design of the trust evaluation component. Therefore, it can be combined with generic trust evaluation algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stack Overflow is a highly successful Community Question Answering (CQA) service for software developers with more than three millions users and more than ten thousand posts per day. The large volume of questions makes it difficult for users to find questions that they are interested in answering. In this paper, we propose a number of approaches to predict who will answer a new question using the characteristics of the question (i.e. Topic) and users (i.e. Reputation), and the social network of Stack Overflow users (i.e. Interested in the same topic). Specifically, our approach aims to identify a group of users (candidates) who have the potential to answer a new question by using feature-based prediction approach and social network based prediction approach. We develop predictive models to predict whether an identified candidate answers a new question. This prediction helps motivate the knowledge exchanging in the community by routing relevant questions to potential answerers. The evaluation results demonstrate the effectiveness of our predictive models, achieving 44% precision, 59% recall, and 49% F-measure (average across all test sets). In addition, our candidate identification techniques can identify the answerers who actually answer questions up to 12.8% (average across all test sets).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a robust stochastic model for the incorporation of natural features within data fusion algorithms. The representation combines Isomap, a non-linear manifold learning algorithm, with Expectation Maximization, a statistical learning scheme. The representation is computed offline and results in a non-linear, non-Gaussian likelihood model relating visual observations such as color and texture to the underlying visual states. The likelihood model can be used online to instantiate likelihoods corresponding to observed visual features in real-time. The likelihoods are expressed as a Gaussian Mixture Model so as to permit convenient integration within existing nonlinear filtering algorithms. The resulting compactness of the representation is especially suitable to decentralized sensor networks. Real visual data consisting of natural imagery acquired from an Unmanned Aerial Vehicle is used to demonstrate the versatility of the feature representation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. In conservation decision-making, we operate within the confines of limited funding. Furthermore, we often assume particular relationships between management impact and our investment in management. The structure of these relationships, however, is rarely known with certainty - there is model uncertainty. We investigate how these two fundamentally limiting factors in conservation management, money and knowledge, impact optimal decision-making. 2. We use information-gap decision theory to find strategies for maximizing the number of extant subpopulations of a threatened species that are most immune to failure due to model uncertainty. We thus find a robust framework for exploring optimal decision-making. 3. The performance of every strategy decreases as model uncertainty increases. 4. The strategy most robust to model uncertainty depends not only on what performance is perceived to be acceptable but also on available funding and the time horizon over which extinction is considered. 5. Synthesis and applications. We investigate the impact of model uncertainty on robust decision-making in conservation and how this is affected by available conservation funding. We show that subpopulation triage can be a natural consequence of robust decision-making. We highlight the need for managers to consider triage not as merely giving up, but as a tool for ensuring species persistence in light of the urgency of most conservation requirements, uncertainty and the poor state of conservation funding. We illustrate this theory by a specific application to allocation of funding to reduce poaching impact on the Sumatran tiger Panthera tigris sumatrae in Kerinci Seblat National Park. © 2008 The Authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Decision-making for conservation is conducted within the margins of limited funding. Furthermore, to allocate these scarce resources we make assumptions about the relationship between management impact and expenditure. The structure of these relationships, however, is rarely known with certainty. We present a summary of work investigating the impact of model uncertainty on robust decision-making in conservation and how this is affected by available conservation funding. We show that achieving robustness in conservation decisions can require a triage approach, and emphasize the need for managers to consider triage not as surrendering but as rational decision making to ensure species persistence in light of the urgency of the conservation problems, uncertainty, and the poor state of conservation funding. We illustrate this theory by a specific application to allocation of funding to reduce poaching impact on the Sumatran tiger Panthera tigris sumatrae in Kerinci Seblat National Park, Indonesia. To conserve our environment, conservation managers must make decisions in the face of substantial uncertainty. Further, they must deal with the fact that limitations in budgets and temporal constraints have led to a lack of knowledge on the systems we are trying to preserve and on the benefits of the actions we have available (Balmford & Cowling 2006). Given this paucity of decision-informing data there is a considerable need to assess the impact of uncertainty on the benefit of management options (Regan et al. 2005). Although models of management impact can improve decision making (e.g.Tenhumberg et al. 2004), they typically rely on assumptions around which there is substantial uncertainty. Ignoring this 'model uncertainty', can lead to inferior decision-making (Regan et al. 2005), and potentially, the loss of the species we are trying to protect. Current methods used in ecology allow model uncertainty to be incorporated into the model selection process (Burnham & Anderson 2002; Link & Barker 2006), but do not enable decision-makers to assess how this uncertainty would change a decision. This is the basis of information-gap decision theory (info-gap); finding strategies most robust to model uncertainty (Ben-Haim 2006). Info-gap has permitted conservation biology to make the leap from recognizing uncertainty to explicitly incorporating severe uncertainty into decision-making. In this paper we present a summary of McDonald-Madden et al (2008a) who use an info-gap framework to address the impact of uncertainty in the functional representations of biological systems on conservation decision-making. Furthermore, we highlight the importance of two key elements limiting conservation decision-making - funding and knowledge - and how they interact to influence the best management strategy for a threatened species. Copyright © ASCE 2011.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an unmanned aircraft system (UAS) that uses a probabilistic model for autonomous front-on environmental sensing or photography of a target. The system is based on low-cost and readily-available sensor systems in dynamic environments and with the general intent of improving the capabilities of dynamic waypoint-based navigation systems for a low-cost UAS. The behavioural dynamics of target movement for the design of a Kalman filter and Markov model-based prediction algorithm are included. Geometrical concepts and the Haversine formula are applied to the maximum likelihood case in order to make a prediction regarding a future state of a target, thus delivering a new waypoint for autonomous navigation. The results of the application to aerial filming with low-cost UAS are presented, achieving the desired goal of maintained front-on perspective without significant constraint to the route or pace of target movement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Monitoring studies revealed high concentrations of pesticides in the drainage canal of paddy fields. It is important to have a way to predict these concentrations in different management scenarios as an assessment tool. A simulation model for predicting the pesticide concentration in a paddy block (PCPF-B) was evaluated and then used to assess the effect of water management practices for controlling pesticide runoff from paddy fields. RESULTS: The PCPF-B model achieved an acceptable performance. The model was applied to a constrained probabilistic approach using the Monte Carlo technique to evaluate the best management practices for reducing runoff of pretilachlor into the canal. The probabilistic model predictions using actual data of pesticide use and hydrological data in the canal showed that the water holding period (WHP) and the excess water storage depth (EWSD) effectively reduced the loss and concentration of pretilachlor from paddy fields to the drainage canal. The WHP also reduced the timespan of pesticide exposure in the drainage canal. CONCLUSIONS: It is recommended that: (1) the WHP be applied for as long as possible, but for at least 7 days, depending on the pesticide and field conditions; (2) an EWSD greater than 2 cm be maintained to store substantial rainfall in order to prevent paddy runoff, especially during the WHP.