858 resultados para Robust Probabilistic Model, Dyslexic Users, Rewriting, Question-Answering
Resumo:
The purpose of this study is to develop statistical methodology to facilitate indirect estimation of the concentration of antiretroviral drugs and viral loads in the prostate gland and the seminal vesicle. The differences in antiretroviral drug concentrations in these organs may lead to suboptimal concentrations in one gland compared to the other. Suboptimal levels of the antiretroviral drugs will not be able to fully suppress the virus in that gland, lead to a source of sexually transmissible virus and increase the chance of selecting for drug resistant virus. This information may be useful selecting antiretroviral drug regimen that will achieve optimal concentrations in most of male genital tract glands. Using fractionally collected semen ejaculates, Lundquist (1949) measured levels of surrogate markers in each fraction that are uniquely produced by specific male accessory glands. To determine the original glandular concentrations of the surrogate markers, Lundquist solved a simultaneous series of linear equations. This method has several limitations. In particular, it does not yield a unique solution, it does not address measurement error, and it disregards inter-subject variability in the parameters. To cope with these limitations, we developed a mechanistic latent variable model based on the physiology of the male genital tract and surrogate markers. We employ a Bayesian approach and perform a sensitivity analysis with regard to the distributional assumptions on the random effects and priors. The model and Bayesian approach is validated on experimental data where the concentration of a drug should be (biologically) differentially distributed between the two glands. In this example, the Bayesian model-based conclusions are found to be robust to model specification and this hierarchical approach leads to more scientifically valid conclusions than the original methodology. In particular, unlike existing methods, the proposed model based approach was not affected by a common form of outliers.
Resumo:
This doctoral thesis focuses on the modeling of multimedia systems to create personalized recommendation services based on the analysis of users’ audiovisual consumption. Research is focused on the characterization of both users’ audiovisual consumption and content, specifically images and video. This double characterization converges into a hybrid recommendation algorithm, adapted to different application scenarios covering different specificities and constraints. Hybrid recommendation systems use both content and user information as input data, applying the knowledge from the analysis of these data as the initial step to feed the algorithms in order to generate personalized recommendations. Regarding the user information, this doctoral thesis focuses on the analysis of audiovisual consumption to infer implicitly acquired preferences. The inference process is based on a new probabilistic model proposed in the text. This model takes into account qualitative and quantitative consumption factors on the one hand, and external factors such as zapping factor or company factor on the other. As for content information, this research focuses on the modeling of descriptors and aesthetic characteristics, which influence the user and are thus useful for the recommendation system. Similarly, the automatic extraction of these descriptors from the audiovisual piece without excessive computational cost has been considered a priority, in order to ensure applicability to different real scenarios. Finally, a new content-based recommendation algorithm has been created from the previously acquired information, i.e. user preferences and content descriptors. This algorithm has been hybridized with a collaborative filtering algorithm obtained from the current state of the art, so as to compare the efficiency of this hybrid recommender with the individual techniques of recommendation (different hybridization techniques of the state of the art have been studied for suitability). The content-based recommendation focuses on the influence of the aesthetic characteristics on the users. The heterogeneity of the possible users of these kinds of systems calls for the use of different criteria and attributes to create effective recommendations. Therefore, the proposed algorithm is adaptable to different perceptions producing a dynamic representation of preferences to obtain personalized recommendations for each user of the system. The hypotheses of this doctoral thesis have been validated by conducting a set of tests with real users, or by querying a database containing user preferences - available to the scientific community. This thesis is structured based on the different research and validation methodologies of the techniques involved. In the three central chapters the state of the art is studied and the developed algorithms and models are validated via self-designed tests. It should be noted that some of these tests are incremental and confirm the validation of previously discussed techniques. Resumen Esta tesis doctoral se centra en el modelado de sistemas multimedia para la creación de servicios personalizados de recomendación a partir del análisis de la actividad de consumo audiovisual de los usuarios. La investigación se focaliza en la caracterización tanto del consumo audiovisual del usuario como de la naturaleza de los contenidos, concretamente imágenes y vídeos. Esta doble caracterización de usuarios y contenidos confluye en un algoritmo de recomendación híbrido que se adapta a distintos escenarios de aplicación, cada uno de ellos con distintas peculiaridades y restricciones. Todo sistema de recomendación híbrido toma como datos de partida tanto información del usuario como del contenido, y utiliza este conocimiento como entrada para algoritmos que permiten generar recomendaciones personalizadas. Por la parte de la información del usuario, la tesis se centra en el análisis del consumo audiovisual para inferir preferencias que, por lo tanto, se adquieren de manera implícita. Para ello, se ha propuesto un nuevo modelo probabilístico que tiene en cuenta factores de consumo tanto cuantitativos como cualitativos, así como otros factores de contorno, como el factor de zapping o el factor de compañía, que condicionan la incertidumbre de la inferencia. En cuanto a la información del contenido, la investigación se ha centrado en la definición de descriptores de carácter estético y morfológico que resultan influyentes en el usuario y que, por lo tanto, son útiles para la recomendación. Del mismo modo, se ha considerado una prioridad que estos descriptores se puedan extraer automáticamente de un contenido sin exigir grandes requisitos computacionales y, de tal forma que se garantice la posibilidad de aplicación a escenarios reales de diverso tipo. Por último, explotando la información de preferencias del usuario y de descripción de los contenidos ya obtenida, se ha creado un nuevo algoritmo de recomendación basado en contenido. Este algoritmo se cruza con un algoritmo de filtrado colaborativo de referencia en el estado del arte, de tal manera que se compara la eficiencia de este recomendador híbrido (donde se ha investigado la idoneidad de las diferentes técnicas de hibridación del estado del arte) con cada una de las técnicas individuales de recomendación. El algoritmo de recomendación basado en contenido que se ha creado se centra en las posibilidades de la influencia de factores estéticos en los usuarios, teniendo en cuenta que la heterogeneidad del conjunto de usuarios provoca que los criterios y atributos que condicionan las preferencias de cada individuo sean diferentes. Por lo tanto, el algoritmo se adapta a las diferentes percepciones y articula una metodología dinámica de representación de las preferencias que permite obtener recomendaciones personalizadas, únicas para cada usuario del sistema. Todas las hipótesis de la tesis han sido debidamente validadas mediante la realización de pruebas con usuarios reales o con bases de datos de preferencias de usuarios que están a disposición de la comunidad científica. La diferente metodología de investigación y validación de cada una de las técnicas abordadas condiciona la estructura de la tesis, de tal manera que los tres capítulos centrales se estructuran sobre su propio estudio del estado del arte y los algoritmos y modelos desarrollados se validan mediante pruebas autónomas, sin impedir que, en algún caso, las pruebas sean incrementales y ratifiquen la validación de técnicas expuestas anteriormente.
Resumo:
Thanks to their inherent properties, probabilistic graphical models are one of the prime candidates for machine learning and decision making tasks especially in uncertain domains. Their capabilities, like representation, inference and learning, if used effectively, can greatly help to build intelligent systems that are able to act accordingly in different problem domains. Evolutionary algorithms is one such discipline that has employed probabilistic graphical models to improve the search for optimal solutions in complex problems. This paper shows how probabilistic graphical models have been used in evolutionary algorithms to improve their performance in solving complex problems. Specifically, we give a survey of probabilistic model building-based evolutionary algorithms, called estimation of distribution algorithms, and compare different methods for probabilistic modeling in these algorithms.
Resumo:
We present a biomolecular probabilistic model driven by the action of a DNA toolbox made of a set of DNA templates and enzymes that is able to perform Bayesian inference. The model will take single-stranded DNA as input data, representing the presence or absence of a specific molecular signal (the evidence). The program logic uses different DNA templates and their relative concentration ratios to encode the prior probability of a disease and the conditional probability of a signal given the disease. When the input and program molecules interact, an enzyme-driven cascade of reactions (DNA polymerase extension, nicking and degradation) is triggered, producing a different pair of single-stranded DNA species. Once the system reaches equilibrium, the ratio between the output species will represent the application of Bayes? law: the conditional probability of the disease given the signal. In other words, a qualitative diagnosis plus a quantitative degree of belief in that diagno- sis. Thanks to the inherent amplification capability of this DNA toolbox, the resulting system will be able to to scale up (with longer cascades and thus more input signals) a Bayesian biosensor that we designed previously.
Resumo:
Los sistemas de búsqueda de respuestas (BR) se pueden considerar como potenciales sucesores de los buscadores tradicionales de información en la Web. Para que sean precisos deben adaptarse a dominios concretos mediante el uso de recursos semánticos adecuados. La adaptación no es una tarea trivial, ya que deben integrarse e incorporarse a sistemas de BR existentes varios recursos heterogéneos relacionados con un dominio restringido. Se presenta la herramienta Maraqa, cuya novedad radica en el uso de técnicas de ingeniería del software, como el desarrollo dirigido por modelos, para automatizar dicho proceso de adaptación a dominios restringidos. Se ha evaluado Maraqa mediante una serie de experimentos (sobre el dominio agrícola) que demuestran su viabilidad, mejorando en un 29,5% la precisión del sistema adaptado.
Resumo:
This reply to Gash’s (Found Sci 2013) commentary on Nescolarde-Selva and Usó-Doménech (Found Sci 2013) answers the three questions raised and at the same time opens up new questions.
Resumo:
Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.
Resumo:
Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.
Resumo:
Diversos estudos sobre a aceitação e o uso de tecnologias de informação têm sido realizados por diferentes modelos conceituais que buscam demonstrar a existência de fatores que influenciam a intenção e comportamento dos usuários sobre um determinado sistema de informação. Alguns aspectos que potencializam as intenções de uso podem ser fundamentais para que o indivíduo adote, ou não, uma determinada tecnologia. Este estudo analisou os fatores antecedentes que podem influenciar a intenção de uso da Biblioteca Virtual de uma Instituição de Ensino Superior (IES). Dentre os modelos conceituais que estudam a adoção de tecnologias de informação, optou-se nesta pesquisa pelo Modelo de Aceitação de Tecnologia (TAM) proposto por Davis (1986). Neste modelo ainda foram adicionadas as variáveis externas Estímulo Docente e Hábito a fim de ampliar o poder de explicação da intenção de uso dos usuários em questão. Através de uma abordagem de investigação quantitativa, os dados foram coletados por meio de um instrumento de pesquisa com obtenção de resposta de 406 questionários. Os resultados obtidos por esta pesquisa confirmam a influência de diversos fatores posicionados em diferentes dimensões e proporcionam conclusões relevantes à adoção da biblioteca virtual. Conclui-se que o Estímulo Docente influencia positivamente a Facilidade de Uso, Utilidade Percebida e o Habito dos alunos de graduação sobre a intenção de uso da Biblioteca Virtual. Com isso, constatou-se que a influência do professor torna-se um fator determinante na intenção de uso da biblioteca virtual, o que ressalta a importância da orientação e recomendação dos professores no uso desta ferramenta tecnológica. O estudo concluiu que a utilidade percebida revelou-se o fator que mais influencia a intenção de uso desta tecnologia, pois há a percepção de que, caso o aluno venha utilizar a biblioteca virtual, pode-se melhorar seu desempenho nos estudos, entre outros aspectos. O Hábito também pode influenciar à adoção da biblioteca virtual de forma paralela às percepções sobre os benefícios que podem ser obtidos com o uso desta tecnologia.
Resumo:
Despite extensive progress on the theoretical aspects of spectral efficient communication systems, hardware impairments, such as phase noise, are the key bottlenecks in next generation wireless communication systems. The presence of non-ideal oscillators at the transceiver introduces time varying phase noise and degrades the performance of the communication system. Significant research literature focuses on joint synchronization and decoding based on joint posterior distribution, which incorporate both the channel and code graph. These joint synchronization and decoding approaches operate on well designed sum-product algorithms, which involves calculating probabilistic messages iteratively passed between the channel statistical information and decoding information. Channel statistical information, generally entails a high computational complexity because its probabilistic model may involve continuous random variables. The detailed knowledge about the channel statistics for these algorithms make them an inadequate choice for real world applications due to power and computational limitations. In this thesis, novel phase estimation strategies are proposed, in which soft decision-directed iterative receivers for a separate A Posteriori Probability (APP)-based synchronization and decoding are proposed. These algorithms do not require any a priori statistical characterization of the phase noise process. The proposed approach relies on a Maximum A Posteriori (MAP)-based algorithm to perform phase noise estimation and does not depend on the considered modulation/coding scheme as it only exploits the APPs of the transmitted symbols. Different variants of APP-based phase estimation are considered. The proposed algorithm has significantly lower computational complexity with respect to joint synchronization/decoding approaches at the cost of slight performance degradation. With the aim to improve the robustness of the iterative receiver, we derive a new system model for an oversampled (more than one sample per symbol interval) phase noise channel. We extend the separate APP-based synchronization and decoding algorithm to a multi-sample receiver, which exploits the received information from the channel by exchanging the information in an iterative fashion to achieve robust convergence. Two algorithms based on sliding block-wise processing with soft ISI cancellation and detection are proposed, based on the use of reliable information from the channel decoder. Dually polarized systems provide a cost-and spatial-effective solution to increase spectral efficiency and are competitive candidates for next generation wireless communication systems. A novel soft decision-directed iterative receiver, for separate APP-based synchronization and decoding, is proposed. This algorithm relies on an Minimum Mean Square Error (MMSE)-based cancellation of the cross polarization interference (XPI) followed by phase estimation on the polarization of interest. This iterative receiver structure is motivated from Master/Slave Phase Estimation (M/S-PE), where M-PE corresponds to the polarization of interest. The operational principle of a M/S-PE block is to improve the phase tracking performance of both polarization branches: more precisely, the M-PE block tracks the co-polar phase and the S-PE block reduces the residual phase error on the cross-polar branch. Two variants of MMSE-based phase estimation are considered; BW and PLP.
Resumo:
It is well known that even slight changes in nonuniform illumination lead to a large image variability and are crucial for many visual tasks. This paper presents a new ICA related probabilistic model where the number of sources exceeds the number of sensors to perform an image segmentation and illumination removal, simultaneously. We model illumination and reflectance in log space by a generalized autoregressive process and Hidden Gaussian Markov random field, respectively. The model ability to deal with segmentation of illuminated images is compared with a Canny edge detector and homomorphic filtering. We apply the model to two problems: synthetic image segmentation and sea surface pollution detection from intensity images.
Resumo:
Online communities are prime sources of information. The Web is rich with forums and Question Answering (Q&A) communities where people go to seek answers to all kinds of questions. Most systems employ manual answer-rating procedures to encourage people to provide quality answers and to help users locate the best answers in a given thread. However, in the datasets we collected from three online communities, we found that half their threads lacked best answer markings. This stresses the need for methods to assess the quality of available answers to: 1) provide automated ratings to fill in for, or support, manually assigned ones, and; 2) to assist users when browsing such answers by filtering in potential best answers. In this paper, we collected data from three online communities and converted it to RDF based on the SIOC ontology. We then explored an approach for predicting best answers using a combination of content, user, and thread features. We show how the influence of such features on predicting best answers differs across communities. Further we demonstrate how certain features unique to some of our community systems can boost predictability of best answers.
Resumo:
This paper examines the problems in the definition of the General Non-Parametric Corporate Performance (GNCP) and introduces a multiplicative linear programming as an alternative model for corporate performance. We verified and tested a statistically significant difference between the two models based on the application of 27 UK industries using six performance ratios. Our new model is found to be a more robust performance model than the previous standard Data Envelopment Analysis (DEA) model.
Resumo:
PowerAqua is a Question Answering system, which takes as input a natural language query and is able to return answers drawn from relevant semantic resources found anywhere on the Semantic Web. In this paper we provide two novel contributions: First, we detail a new component of the system, the Triple Similarity Service, which is able to match queries effectively to triples found in different ontologies on the Semantic Web. Second, we provide a first evaluation of the system, which in addition to providing data about PowerAqua's competence, also gives us important insights into the issues related to using the Semantic Web as the target answer set in Question Answering. In particular, we show that, despite the problems related to the noisy and incomplete conceptualizations, which can be found on the Semantic Web, good results can already be obtained.
Resumo:
Mobile advertising is a rapidly growing sector providing brands and marketing agencies the opportunity to connect with consumers beyond traditional and digital media and instead communicate directly on their mobile phones. Mobile advertising will be intrinsically linked with mobile search, which has transported from the internet to the mobile and is identified as an area of potential growth. The result of mobile searching show that as a general rule such search result exceed 160 characters; the dialog is required to deliver the relevant portion of a response to the mobile user. In this paper we focus initially on mobile search and mobile advert creation, and later the mechanism of interaction between the user’s request, the result of searching, advertising and dialog.