942 resultados para Data Driven Modeling


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis work we develop a new generative model of social networks belonging to the family of Time Varying Networks. The importance of correctly modelling the mechanisms shaping the growth of a network and the dynamics of the edges activation and inactivation are of central importance in network science. Indeed, by means of generative models that mimic the real-world dynamics of contacts in social networks it is possible to forecast the outcome of an epidemic process, optimize the immunization campaign or optimally spread an information among individuals. This task can now be tackled taking advantage of the recent availability of large-scale, high-quality and time-resolved datasets. This wealth of digital data has allowed to deepen our understanding of the structure and properties of many real-world networks. Moreover, the empirical evidence of a temporal dimension in networks prompted the switch of paradigm from a static representation of graphs to a time varying one. In this work we exploit the Activity-Driven paradigm (a modeling tool belonging to the family of Time-Varying-Networks) to develop a general dynamical model that encodes fundamental mechanism shaping the social networks' topology and its temporal structure: social capital allocation and burstiness. The former accounts for the fact that individuals does not randomly invest their time and social interactions but they rather allocate it toward already known nodes of the network. The latter accounts for the heavy-tailed distributions of the inter-event time in social networks. We then empirically measure the properties of these two mechanisms from seven real-world datasets and develop a data-driven model, analytically solving it. We then check the results against numerical simulations and test our predictions with real-world datasets, finding a good agreement between the two. Moreover, we find and characterize a non-trivial interplay between burstiness and social capital allocation in the parameters phase space. Finally, we present a novel approach to the development of a complete generative model of Time-Varying-Networks. This model is inspired by the Kaufman's adjacent possible theory and is based on a generalized version of the Polya's urn. Remarkably, most of the complex and heterogeneous feature of real-world social networks are naturally reproduced by this dynamical model, together with many high-order topological properties (clustering coefficient, community structure etc.).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As torrents of new data now emerge from microbial genomics, bioinformatic prediction of immunogenic epitopes remains challenging but vital. In silico methods often produce paradoxically inconsistent results: good prediction rates on certain test sets but not others. The inherent complexity of immune presentation and recognition processes complicates epitope prediction. Two encouraging developments – data driven artificial intelligence sequence-based methods for epitope prediction and molecular modeling methods based on three-dimensional protein structures – offer hope for the future.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper introduces a method for dependencies discovery during human-machine interaction. It is based on an analysis of numerical data sets in knowledge-poor environments. The driven procedures are independent and they interact on a competitive principle. The research focuses on seven of them. The application is in Number Theory.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The performance of a supply chain depends critically on the coordinating actions and decisions undertaken by the trading partners. The sharing of product and process information plays a central role in the coordination and is a key driver for the success of the supply chain. In this paper we propose the concept of "Linked pedigrees" - linked datasets, that enable the sharing of traceability information of products as they move along the supply chain. We present a distributed and decentralised, linked data driven architecture that consumes real time supply chain linked data to generate linked pedigrees. We then present a communication protocol to enable the exchange of linked pedigrees among trading partners. We exemplify the utility of linked pedigrees by illustrating examples from the perishable goods logistics supply chain.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In non-linear random effects some attention has been very recently devoted to the analysis ofsuitable transformation of the response variables separately (Taylor 1996) or not (Oberg and Davidian 2000) from the transformations of the covariates and, as far as we know, no investigation has been carried out on the choice of link function in such models. In our study we consider the use of a random effect model when a parameterized family of links (Aranda-Ordaz 1981, Prentice 1996, Pregibon 1980, Stukel 1988 and Czado 1997) is introduced. We point out the advantages and the drawbacks associated with the choice of this data-driven kind of modeling. Difficulties in the interpretation of regression parameters, and therefore in understanding the influence of covariates, as well as problems related to loss of efficiency of estimates and overfitting, are discussed. A case study on radiotherapy usage in breast cancer treatment is discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La tesi presenta uno studio della libreria grafica per web D3, sviluppata in javascript, e ne presenta una catalogazione dei grafici implementati e reperibili sul web. Lo scopo è quello di valutare la libreria e studiarne i pregi e difetti per capire se sia opportuno utilizzarla nell'ambito di un progetto Europeo. Per fare questo vengono studiati i metodi di classificazione dei grafici presenti in letteratura e viene esposto e descritto lo stato dell'arte del data visualization. Viene poi descritto il metodo di classificazione proposto dal team di progettazione e catalogata la galleria di grafici presente sul sito della libreria D3. Infine viene presentato e studiato in maniera formale un algoritmo per selezionare un grafico in base alle esigenze dell'utente.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

One of the global phenomena with threats to environmental health and safety is artisanal mining. There are ambiguities in the manner in which an ore-processing facility operates which hinders the mining capacity of these miners in Ghana. These problems are reviewed on the basis of current socio-economic, health and safety, environmental, and use of rudimentary technologies which limits fair-trade deals to miners. This research sought to use an established data-driven, geographic information (GIS)-based system employing the spatial analysis approach for locating a centralized processing facility within the Wassa Amenfi-Prestea Mining Area (WAPMA) in the Western region of Ghana. A spatial analysis technique that utilizes ModelBuilder within the ArcGIS geoprocessing environment through suitability modeling will systematically and simultaneously analyze a geographical dataset of selected criteria. The spatial overlay analysis methodology and the multi-criteria decision analysis approach were selected to identify the most preferred locations to site a processing facility. For an optimal site selection, seven major criteria including proximity to settlements, water resources, artisanal mining sites, roads, railways, tectonic zones, and slopes were considered to establish a suitable location for a processing facility. Site characterizations and environmental considerations, incorporating identified constraints such as proximity to large scale mines, forest reserves and state lands to site an appropriate position were selected. The analysis was limited to criteria that were selected and relevant to the area under investigation. Saaty’s analytical hierarchy process was utilized to derive relative importance weights of the criteria and then a weighted linear combination technique was applied to combine the factors for determination of the degree of potential site suitability. The final map output indicates estimated potential sites identified for the establishment of a facility centre. The results obtained provide intuitive areas suitable for consideration

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Americans are accustomed to a wide range of data collection in their lives: census, polls, surveys, user registrations, and disclosure forms. When logging onto the Internet, users’ actions are being tracked everywhere: clicking, typing, tapping, swiping, searching, and placing orders. All of this data is stored to create data-driven profiles of each user. Social network sites, furthermore, set the voluntarily sharing of personal data as the default mode of engagement. But people’s time and energy devoted to creating this massive amount of data, on paper and online, are taken for granted. Few people would consider their time and energy spent on data production as labor. Even if some people do acknowledge their labor for data, they believe it is accessory to the activities at hand. In the face of pervasive data collection and the rising time spent on screens, why do people keep ignoring their labor for data? How has labor for data been become invisible, as something that is disregarded by many users? What does invisible labor for data imply for everyday cultural practices in the United States? Invisible Labor for Data addresses these questions. I argue that three intertwined forces contribute to framing data production as being void of labor: data production institutions throughout history, the Internet’s technological infrastructure (especially with the implementation of algorithms), and the multiplication of virtual spaces. There is a common tendency in the framework of human interactions with computers to deprive data and bodies of their materiality. My Introduction and Chapter 1 offer theoretical interventions by reinstating embodied materiality and redefining labor for data as an ongoing process. The middle Chapters present case studies explaining how labor for data is pushed to the margin of the narratives about data production. I focus on a nationwide debate in the 1960s on whether the U.S. should build a databank, contemporary Big Data practices in the data broker and the Internet industries, and the group of people who are hired to produce data for other people’s avatars in the virtual games. I conclude with a discussion on how the new development of crowdsourcing projects may usher in the new chapter in exploiting invisible and discounted labor for data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Computers employing some degree of data flow organisation are now well established as providing a possible vehicle for concurrent computation. Although data-driven computation frees the architecture from the constraints of the single program counter, processor and global memory, inherent in the classic von Neumann computer, there can still be problems with the unconstrained generation of fresh result tokens if a pure data flow approach is adopted. The advantages of allowing serial processing for those parts of a program which are inherently serial, and of permitting a demand-driven, as well as data-driven, mode of operation are identified and described. The MUSE machine described here is a structured architecture supporting both serial and parallel processing which allows the abstract structure of a program to be mapped onto the machine in a logical way.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Human relationships have long been studied by scientists from domains like sociology, psychology, literature, etc. for understanding people's desires, goals, actions and expected behaviors. In this dissertation we study inter-personal relationships as expressed in natural language text. Modeling inter-personal relationships from text finds application in general natural language understanding, as well as real-world domains such as social networks, discussion forums, intelligent virtual agents, etc. We propose that the study of relationships should incorporate not only linguistic cues in text, but also the contexts in which these cues appear. Our investigations, backed by empirical evaluation, support this thesis, and demonstrate that the task benefits from using structured models that incorporate both types of information. We present such structured models to address the task of modeling the nature of relationships between any two given characters from a narrative. To begin with, we assume that relationships are of two types: cooperative and non-cooperative. We first describe an approach to jointly infer relationships between all characters in the narrative, and demonstrate how the task of characterizing the relationship between two characters can benefit from including information about their relationships with other characters in the narrative. We next formulate the relationship-modeling problem as a sequence prediction task to acknowledge the evolving nature of human relationships, and demonstrate the need to model the history of a relationship in predicting its evolution. Thereafter, we present a data-driven method to automatically discover various types of relationships such as familial, romantic, hostile, etc. Like before, we address the task of modeling evolving relationships but don't restrict ourselves to two types of relationships. We also demonstrate the need to incorporate not only local historical but also global context while solving this problem. Lastly, we demonstrate a practical application of modeling inter-personal relationships in the domain of online educational discussion forums. Such forums offer opportunities for its users to interact and form deeper relationships. With this view, we address the task of identifying initiation of such deeper relationships between a student and the instructor. Specifically, we analyze contents of the forums to automatically suggest threads to the instructors that require their intervention. By highlighting scenarios that need direct instructor-student interactions, we alleviate the need for the instructor to manually peruse all threads of the forum and also assist students who have limited avenues for communicating with instructors. We do this by incorporating the discourse structure of the thread through latent variables that abstractly represent contents of individual posts and model the flow of information in the thread. Such latent structured models that incorporate the linguistic cues without losing their context can be helpful in other related natural language understanding tasks as well. We demonstrate this by using the model for a very different task: identifying if a stated desire has been fulfilled by the end of a story.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Collecting and analyzing consumer data is essential in today’s data-driven business environment. However, consumers are becoming more aware of the value of the information they can provide to companies, thereby being more reluctant to share it for free. Therefore, companies need to find ways to motivate consumers to disclose personal information. The main research question of the study was formed as “How can companies motivate consumers to disclose personal information?” and it was further divided into two subquestions: 1) What types of benefits motivate consumers to disclose personal information? 2) How does the disclosure context affect the consumers’ information disclosure behavior? The conceptual framework consisted of a classification of extrinsic and intrinsic benefits, and moderating factors, which were recognized on the basis of prior research in the field. The study was conducted by using qualitative research methods. The primary data was collected by interviewing ten representatives from eight companies. The data was analyzed and reported according to predetermined themes. The findings of the study confirm that consumers can be motivated to disclose personal information by offering different types of extrinsic (monetary saving, time saving, self-enhancement, and social adjustment) and intrinsic (novelty, pleasure, and altruism) benefits. However, not all the benefits are equally useful ways to convince the customer to disclose information. Moreover, different factors in the disclosure context can either alleviate or increase the effectiveness of the benefits and the consumers’ motivation to disclose personal information. Such factors include the consumer’s privacy concerns, perceived trust towards the company, the relevancy of the requested information, personalization, website elements (especially security, usability, and aesthetics of a website), and the consumer’s shopping motivation. This study has several contributions. It is essential that companies recognize the most attractive benefits regarding their business and their customers, and that they understand how the disclosure context affects the consumer’s information disclosure behavior. The likelihood of information disclosure can be increased, for example, by offering benefits that meet the consumers’ needs and preferences, improving the relevancy of the asked information, stating the reasons for data collection, creating and maintaining a trustworthy image of the company, and enhancing the quality of the company’s website.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Part 14: Interoperability and Integration

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Exhibitium Project , awarded by the BBVA Foundation, is a data-driven project developed by an international consortium of research groups . One of its main objectives is to build a prototype that will serve as a base to produce a platform for the recording and exploitation of data about art-exhibitions available on the Internet . Therefore, our proposal aims to expose the methods, procedures and decision-making processes that have governed the technological implementation of this prototype, especially with regard to the reuse of WordPress (WP) as development framework.