844 resultados para Heterogeneous information network
Resumo:
The possibility to analyze, quantify and forecast epidemic outbreaks is fundamental when devising effective disease containment strategies. Policy makers are faced with the intricate task of drafting realistically implementable policies that strike a balance between risk management and cost. Two major techniques policy makers have at their disposal are: epidemic modeling and contact tracing. Models are used to forecast the evolution of the epidemic both globally and regionally, while contact tracing is used to reconstruct the chain of people who have been potentially infected, so that they can be tested, isolated and treated immediately. However, both techniques might provide limited information, especially during an already advanced crisis when the need for action is urgent. In this paper we propose an alternative approach that goes beyond epidemic modeling and contact tracing, and leverages behavioral data generated by mobile carrier networks to evaluate contagion risk on a per-user basis. The individual risk represents the loss incurred by not isolating or treating a specific person, both in terms of how likely it is for this person to spread the disease as well as how many secondary infections it will cause. To this aim, we develop a model, named Progmosis, which quantifies this risk based on movement and regional aggregated statistics about infection rates. We develop and release an open-source tool that calculates this risk based on cellular network events. We simulate a realistic epidemic scenarios, based on an Ebola virus outbreak; we find that gradually restricting the mobility of a subset of individuals reduces the number of infected people after 30 days by 24%.
Resumo:
GitHub is the most popular repository for open source code (Finley 2011). It has more than 3.5 million users, as the company declared in April 2013, and more than 10 million repositories, as of December 2013. It has a publicly accessible API and, since March 2012, it also publishes a stream of all the events occurring on public projects. Interactions among GitHub users are of a complex nature and take place in different forms. Developers create and fork repositories, push code, approve code pushed by others, bookmark their favorite projects and follow other developers to keep track of their activities. In this paper we present a characterization of GitHub, as both a social network and a collaborative platform. To the best of our knowledge, this is the first quantitative study about the interactions happening on GitHub. We analyze the logs from the service over 18 months (between March 11, 2012 and September 11, 2013), describing 183.54 million events and we obtain information about 2.19 million users and 5.68 million repositories, both growing linearly in time. We show that the distributions of the number of contributors per project, watchers per project and followers per user show a power-law-like shape. We analyze social ties and repository-mediated collaboration patterns, and we observe a remarkably low level of reciprocity of the social connections. We also measure the activity of each user in terms of authored events and we observe that very active users do not necessarily have a large number of followers. Finally, we provide a geographic characterization of the centers of activity and we investigate how distance influences collaboration.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015
Resumo:
The aim of this paper is to explore the management of information in an aerospace manufacturer's supply chain by analysing supply chain disruption risks. The social network perspective will be used to examine the flows of information in the supply chain. The examination of information flows will also be explored in terms of push and pull information management. The supply chain risk management (SCRM) strategy is to assess the management of information that allows companies to gather information which will allow them to mitigate that risk before any disruption to the supply chain occurs. There is a shortage of models in analysing the supply chain risk associated with information flows, possibly due to the omission of appropriate modelling techniques in this area (Tang and Nurmaya, 2011). This paper uses an exploratory case study consisting of a multi method qualitative approach using fifteen interviews and four focus groups.
Resumo:
The focus of this thesis is the extension of topographic visualisation mappings to allow for the incorporation of uncertainty. Few visualisation algorithms in the literature are capable of mapping uncertain data with fewer able to represent observation uncertainties in visualisations. As such, modifications are made to NeuroScale, Locally Linear Embedding, Isomap and Laplacian Eigenmaps to incorporate uncertainty in the observation and visualisation spaces. The proposed mappings are then called Normally-distributed NeuroScale (N-NS), T-distributed NeuroScale (T-NS), Probabilistic LLE (PLLE), Probabilistic Isomap (PIso) and Probabilistic Weighted Neighbourhood Mapping (PWNM). These algorithms generate a probabilistic visualisation space with each latent visualised point transformed to a multivariate Gaussian or T-distribution, using a feed-forward RBF network. Two types of uncertainty are then characterised dependent on the data and mapping procedure. Data dependent uncertainty is the inherent observation uncertainty. Whereas, mapping uncertainty is defined by the Fisher Information of a visualised distribution. This indicates how well the data has been interpolated, offering a level of ‘surprise’ for each observation. These new probabilistic mappings are tested on three datasets of vectorial observations and three datasets of real world time series observations for anomaly detection. In order to visualise the time series data, a method for analysing observed signals and noise distributions, Residual Modelling, is introduced. The performance of the new algorithms on the tested datasets is compared qualitatively with the latent space generated by the Gaussian Process Latent Variable Model (GPLVM). A quantitative comparison using existing evaluation measures from the literature allows performance of each mapping function to be compared. Finally, the mapping uncertainty measure is combined with NeuroScale to build a deep learning classifier, the Cascading RBF. This new structure is tested on the MNist dataset achieving world record performance whilst avoiding the flaws seen in other Deep Learning Machines.
Resumo:
This work looks into video quality assessment applied to the field of telecare and proposes an alternative metric to the more traditionally used PSNR based on the requirements of such an application. We show that the Pause Intensity metric introduced in [1] is also relevant and applicable to heterogeneous networks with a wireless last hop connected to a wired TCP backbone. We demonstrate through our emulation testbed that the impairments experienced in such a network architecture are dominated by continuity based impairments rather than artifacts, such as motion drift or blockiness. We also look into the implication of using Pause Intensity as a metric in terms of the overall video latency, which is potentially problematic should the video be sent and acted upon in real-time. We conclude that Pause Intensity may be used alongside the video characteristics which have been suggested as a measure of the overall video quality. © 2012 IEEE.
Resumo:
As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.
Resumo:
Heterogeneous datasets arise naturally in most applications due to the use of a variety of sensors and measuring platforms. Such datasets can be heterogeneous in terms of the error characteristics and sensor models. Treating such data is most naturally accomplished using a Bayesian or model-based geostatistical approach; however, such methods generally scale rather badly with the size of dataset, and require computationally expensive Monte Carlo based inference. Recently within the machine learning and spatial statistics communities many papers have explored the potential of reduced rank representations of the covariance matrix, often referred to as projected or fixed rank approaches. In such methods the covariance function of the posterior process is represented by a reduced rank approximation which is chosen such that there is minimal information loss. In this paper a sequential Bayesian framework for inference in such projected processes is presented. The observations are considered one at a time which avoids the need for high dimensional integrals typically required in a Bayesian approach. A C++ library, gptk, which is part of the INTAMAP web service, is introduced which implements projected, sequential estimation and adds several novel features. In particular the library includes the ability to use a generic observation operator, or sensor model, to permit data fusion. It is also possible to cope with a range of observation error characteristics, including non-Gaussian observation errors. Inference for the covariance parameters is explored, including the impact of the projected process approximation on likelihood profiles. We illustrate the projected sequential method in application to synthetic and real datasets. Limitations and extensions are discussed. © 2010 Elsevier Ltd.
Resumo:
In recent years, the boundaries between e-commerce and social networking have become increasingly blurred. Many e-commerce websites support the mechanism of social login where users can sign on the websites using their social network identities such as their Facebook or Twitter accounts. Users can also post their newly purchased products on microblogs with links to the e-commerce product web pages. In this paper, we propose a novel solution for cross-site cold-start product recommendation, which aims to recommend products from e-commerce websites to users at social networking sites in 'cold-start' situations, a problem which has rarely been explored before. A major challenge is how to leverage knowledge extracted from social networking sites for cross-site cold-start product recommendation. We propose to use the linked users across social networking sites and e-commerce websites (users who have social networking accounts and have made purchases on e-commerce websites) as a bridge to map users' social networking features to another feature representation for product recommendation. In specific, we propose learning both users' and products' feature representations (called user embeddings and product embeddings, respectively) from data collected from e-commerce websites using recurrent neural networks and then apply a modified gradient boosting trees method to transform users' social networking features into user embeddings. We then develop a feature-based matrix factorization approach which can leverage the learnt user embeddings for cold-start product recommendation. Experimental results on a large dataset constructed from the largest Chinese microblogging service Sina Weibo and the largest Chinese B2C e-commerce website JingDong have shown the effectiveness of our proposed framework.
Resumo:
We present in this article an automated framework that extracts product adopter information from online reviews and incorporates the extracted information into feature-based matrix factorization formore effective product recommendation. In specific, we propose a bootstrapping approach for the extraction of product adopters from review text and categorize them into a number of different demographic categories. The aggregated demographic information of many product adopters can be used to characterize both products and users in the form of distributions over different demographic categories. We further propose a graphbased method to iteratively update user- and product-related distributions more reliably in a heterogeneous user-product graph and incorporate them as features into the matrix factorization approach for product recommendation. Our experimental results on a large dataset crawled from JINGDONG, the largest B2C e-commerce website in China, show that our proposed framework outperforms a number of competitive baselines for product recommendation.
Resumo:
PHAR-QA, funded by the European Commission, is producing a framework of competences for pharmacy practice. The framework is in line with the EU directive on sectoral professions and takes into account the diversity of the pharmacy profession and the on-going changes in healthcare systems (with an increasingly important role for pharmacists), and in the pharmaceutical industry. PHAR-QA is asking academia, students and practicing pharmacists to rank competences required for practice. The results show that competences in the areas of drug interactions, need for drug treatment and provision of information and service were ranked highest whereas those in the areas of ability to design and conduct research and development and production of medicines were ranked lower. For the latter two categories, industrial pharmacists ranked them higher than did the other five groups
Resumo:
A tanulmány Magyarország egyik legnagyobb foglalkoztatójának megrendelésére készült abból a célból, hogy milyen megoldásokkal lehetne a vállalati működést hatékonyabbá tenni. Ennek keretében a szerzők megvizsgálták, hol tart ma a HR adatbányászati kutatás a világban. Milyen eszközök állnak rendelkezésre ahhoz, hogy a munkavállalói elmenetelt előre jelezzék, illetve figyeljék, valamint milyen lehetőség van a hálózati kutatások felhasználására a biztonság területén. Szerencsés, hogy a vállalkozói kérdések és erőforrások találkozhattak a kutatói szféra aktuális kutatási területeivel. A tanulmány szerzői úgy gondolják, hogy a cikkben megfogalmazott állítások, következtetések, eredmények a jövőben hasznosíthatók lesznek a vállalat és más cégek számára is. _____ The authors were pleased to take part in this research project initiated by one of Hungary’s largest employer. The goal of the project was to work out BI solutions to improve upon their business process. In the framework of the project first the authors made a survey on the current trends in the world of HR datamining. They reviewed the available tools for the prediction of employee promotion and investigated the question on how to utilize results achieved in social network analysis in the field of enterprise security. When real business problems and resources meet the mainstream research of the scientific community it is always a fortunate and it is rather fruitful. The authors are certain that the results published in this document will be beneficial for Foxconn in the near future. Of course, they are not done. There are continually new research perspectives opening up and huge amount of information is accumulating in the enterprises just waiting for getting discovered and analysed. Also the environment in which an enterprise operates is dynamically changing and thus the company faces new challenges and new type of business problems arise. The authors are in the hope that their research experience will help decision makers also in the future to solve real world business problems.
Resumo:
Az elektronikus hírközlő hálózat rohamszerű fejlesztésének igénye az elektronikus szolgáltatások széles körű elterjedésével az állami döntéshozókat is fejlesztéspolitikai koncepciók kidolgozására és azok végrehajtására ösztönzi. Az (információs) társadalom fejlődése és az ennek alapjául szolgáló infokommunikációs szolgáltatások használata alapvetően függ a szélessávú infrastruktúra fejlesztésétől, az elektronikus hírközlő hálózat elérésének lehetőségétől. Az állami szerepvállalási hajlandóság 2011-től kezdődően jelentősen megnőtt az elektronikus hírközlési területen. Az MVM NET Zrt. megalapítása, a NISZ Zrt. átszervezése, a GOP 3.1.2-es pályázat és a 4. mobilszolgáltató létrehozásának terve mind mutatják a kormányzat erőteljes szándékát a terület fejlesztésére. A tanulmányban bemutatásra kerül, hogy az állam milyen beavatkozási eszközökkel rendelkezik az elektronikus hírközlő hálózat fejlesztésének ösztönzésére. A szerző ezt követően a négy, jelentős állami beavatkozás elemzését végzi el annak vizsgálatára, hogy megfelelő alapozottsággal született-e döntés az állami szerepvállalásról. _____ With the widespread use of the Internet, the need for the rapid development of the digital communication networks has prompted government policy makers also to conceptualize and implement development policy. The advancement of the (information) society and the use of information communication technology as a prerequisite of it are fundamentally determined by the development of broadband infrastructure and whether broadband access to the digital telecommunication network is available. The propensity of the government to play a bigger role in the field of electronical communication has increased significantly from 2011. The setup of MVM NET Zrt. / Hungarian Electricity NET Ltd./, the realignment of NISZ Zrt. / National Info communication Services Company Limited by Shares - NISZ Ltd./, the GOP 3.1.2. tender and the plan to enable a new, i.e. the fourth mobile network operator to enter the market all indicate the robust intention of the government to develop this field. The study shows the tools of government intervention for the incentive of the development of the electronical communication network. Then the author analyses the four main government interventions to examine whether the decision on the role of the state was adequately well-founded.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
Query processing is a commonly performed procedure and a vital and integral part of information processing. It is therefore important and necessary for information processing applications to continuously improve the accessibility of data sources as well as the ability to perform queries on those data sources. ^ It is well known that the relational database model and the Structured Query Language (SQL) are currently the most popular tools to implement and query databases. However, a certain level of expertise is needed to use SQL and to access relational databases. This study presents a semantic modeling approach that enables the average user to access and query existing relational databases without the concern of the database's structure or technicalities. This method includes an algorithm to represent relational database schemas in a more semantically rich way. The result of which is a semantic view of the relational database. The user performs queries using an adapted version of SQL, namely Semantic SQL. This method substantially reduces the size and complexity of queries. Additionally, it shortens the database application development cycle and improves maintenance and reliability by reducing the size of application programs. Furthermore, a Semantic Wrapper tool illustrating the semantic wrapping method is presented. ^ I further extend the use of this semantic wrapping method to heterogeneous database management. Relational, object-oriented databases and the Internet data sources are considered to be part of the heterogeneous database environment. Semantic schemas resulting from the algorithm presented in the method were employed to describe the structure of these data sources in a uniform way. Semantic SQL was utilized to query various data sources. As a result, this method provides users with the ability to access and perform queries on heterogeneous database systems in a more innate way. ^