946 resultados para user data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cloud storage has rapidly become a cornerstone of many businesses and has moved from an early adopters stage to an early majority, where we typically see explosive deployments. As companies rush to join the cloud revolution, it has become vital to create the necessary tools that will effectively protect users' data from unauthorized access. Nevertheless, sharing data between multiple users' under the same domain in a secure and efficient way is not trivial. In this paper, we propose Sharing in the Rain – a protocol that allows cloud users' to securely share their data based on predefined policies. The proposed protocol is based on Attribute-Based Encryption (ABE) and allows users' to encrypt data based on certain policies and attributes. Moreover, we use a Key-Policy Attribute-Based technique through which access revocation is optimized. More precisely, we show how to securely and efficiently remove access to a file, for a certain user that is misbehaving or is no longer part of a user group, without having to decrypt and re-encrypt the original data with a new key or a new policy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

L’évaluation de l’action humanitaire (ÉAH) est un outil valorisé pour soutenir l’imputabilité, la transparence et l’efficience de programmes humanitaires contribuant à diminuer les inéquités et à promouvoir la santé mondiale. L’EAH est incontournable pour les parties prenantes de programme, les bailleurs de fonds, décideurs et intervenants souhaitant intégrer les données probantes aux pratiques et à la prise de décisions. Cependant, l’utilisation de l’évaluation (UÉ) reste incertaine, l’ÉAH étant fréquemment menée, mais inutilisé. Aussi, les conditions influençant l’UÉ varient selon les contextes et leur présence et applicabilité au sein d’organisations non-gouvernementales (ONG) humanitaires restent peu documentées. Les évaluateurs, parties prenantes et décideurs en contexte humanitaire souhaitant assurer l’UÉ pérenne détiennent peu de repères puisque rares sont les études examinant l’UÉ et ses conditions à long terme. La présente thèse tend à clarifier ces enjeux en documentant sur une période de deux ans l’UÉ et les conditions qui la détermine, au sein d’une stratégie d’évaluation intégrée au programme d’exemption de paiement des soins de santé d’une ONG humanitaire. L’objectif de ce programme est de faciliter l’accès à la santé aux mères, aux enfants de moins de cinq ans et aux indigents de districts sanitaires au Niger et au Burkina Faso, régions du Sahel où des crises alimentaires et économiques ont engendré des taux élevés de malnutrition, de morbidité et de mortalité. Une première évaluation du programme d’exemption au Niger a mené au développement de la stratégie d’évaluation intégrée à ce même programme au Burkina Faso. La thèse se compose de trois articles. Le premier présente une étude d’évaluabilité, étape préliminaire à la thèse et permettant de juger de sa faisabilité. Les résultats démontrent une logique cohérente et plausible de la stratégie d’évaluation, l’accessibilité de données et l’utilité d’étudier l’UÉ par l’ONG. Le second article documente l’UÉ des parties prenantes de la stratégie et comment celle-ci servit le programme d’exemption. L’utilisation des résultats fut instrumentale, conceptuelle et persuasive, alors que l’utilisation des processus ne fut qu’instrumentale et conceptuelle. Le troisième article documente les conditions qui, selon les parties prenantes, ont progressivement influencé l’UÉ. L’attitude des utilisateurs, les relations et communications interpersonnelles et l’habileté des évaluateurs à mener et à partager les connaissances adaptées aux besoins des utilisateurs furent les conditions clés liées à l’UÉ. La thèse contribue à l’avancement des connaissances sur l’UÉ en milieu humanitaire et apporte des recommandations aux parties prenantes de l’ONG.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A user manual for the GeoMedia software program, along with some discussion of digital orthophotos.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recommendation systems aim to help users make decisions more efficiently. The most widely used method in recommendation systems is collaborative filtering, of which, a critical step is to analyze a user's preferences and make recommendations of products or services based on similarity analysis with other users' ratings. However, collaborative filtering is less usable for recommendation facing the "cold start" problem, i.e. few comments being given to products or services. To tackle this problem, we propose an improved method that combines collaborative filtering and data classification. We use hotel recommendation data to test the proposed method. The accuracy of the recommendation is determined by the rankings. Evaluations regarding the accuracies of Top-3 and Top-10 recommendation lists using the 10-fold cross-validation method and ROC curves are conducted. The results show that the Top-3 hotel recommendation list proposed by the combined method has the superiority of the recommendation performance than the Top-10 list under the cold start condition in most of the times.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent paradigms in wireless communication architectures describe environments where nodes present a highly dynamic behavior (e.g., User Centric Networks). In such environments, routing is still performed based on the regular packet-switched behavior of store-and-forward. Albeit sufficient to compute at least an adequate path between a source and a destination, such routing behavior cannot adequately sustain the highly nomadic lifestyle that Internet users are today experiencing. This thesis aims to analyse the impact of the nodes’ mobility on routing scenarios. It also aims at the development of forwarding concepts that help in message forwarding across graphs where nodes exhibit human mobility patterns, as is the case of most of the user-centric wireless networks today. The first part of the work involved the analysis of the mobility impact on routing, and we found that node mobility significance can affect routing performance, and it depends on the link length, distance, and mobility patterns of nodes. The study of current mobility parameters showed that they capture mobility partially. The routing protocol robustness to node mobility depends on the routing metric sensitivity to node mobility. As such, mobility-aware routing metrics were devised to increase routing robustness to node mobility. Two categories of routing metrics proposed are the time-based and spatial correlation-based. For the validation of the metrics, several mobility models were used, which include the ones that mimic human mobility patterns. The metrics were implemented using the Network Simulator tool using two widely used multi-hop routing protocols of Optimized Link State Routing (OLSR) and Ad hoc On Demand Distance Vector (AODV). Using the proposed metrics, we reduced the path re-computation frequency compared to the benchmark metric. This means that more stable nodes were used to route data. The time-based routing metrics generally performed well across the different node mobility scenarios used. We also noted a variation on the performance of the metrics, including the benchmark metric, under different mobility models, due to the differences in the node mobility governing rules of the models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Americans are accustomed to a wide range of data collection in their lives: census, polls, surveys, user registrations, and disclosure forms. When logging onto the Internet, users’ actions are being tracked everywhere: clicking, typing, tapping, swiping, searching, and placing orders. All of this data is stored to create data-driven profiles of each user. Social network sites, furthermore, set the voluntarily sharing of personal data as the default mode of engagement. But people’s time and energy devoted to creating this massive amount of data, on paper and online, are taken for granted. Few people would consider their time and energy spent on data production as labor. Even if some people do acknowledge their labor for data, they believe it is accessory to the activities at hand. In the face of pervasive data collection and the rising time spent on screens, why do people keep ignoring their labor for data? How has labor for data been become invisible, as something that is disregarded by many users? What does invisible labor for data imply for everyday cultural practices in the United States? Invisible Labor for Data addresses these questions. I argue that three intertwined forces contribute to framing data production as being void of labor: data production institutions throughout history, the Internet’s technological infrastructure (especially with the implementation of algorithms), and the multiplication of virtual spaces. There is a common tendency in the framework of human interactions with computers to deprive data and bodies of their materiality. My Introduction and Chapter 1 offer theoretical interventions by reinstating embodied materiality and redefining labor for data as an ongoing process. The middle Chapters present case studies explaining how labor for data is pushed to the margin of the narratives about data production. I focus on a nationwide debate in the 1960s on whether the U.S. should build a databank, contemporary Big Data practices in the data broker and the Internet industries, and the group of people who are hired to produce data for other people’s avatars in the virtual games. I conclude with a discussion on how the new development of crowdsourcing projects may usher in the new chapter in exploiting invisible and discounted labor for data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The wide adaptation of Internet Protocol (IP) as de facto protocol for most communication networks has established a need for developing IP capable data link layer protocol solutions for Machine to machine (M2M) and Internet of Things (IoT) networks. However, the wireless networks used for M2M and IoT applications usually lack the resources commonly associated with modern wireless communication networks. The existing IP capable data link layer solutions for wireless IoT networks provide the necessary overhead minimising and frame optimising features, but are often built to be compatible only with IPv6 and specific radio platforms. The objective of this thesis is to design IPv4 compatible data link layer for Netcontrol Oy's narrow band half-duplex packet data radio system. Based on extensive literature research, system modelling and solution concept testing, this thesis proposes the usage of tunslip protocol as the basis for the system data link layer protocol development. In addition to the functionality of tunslip, this thesis discusses the additional network, routing, compression, security and collision avoidance changes required to be made to the radio platform in order for it to be IP compatible while still being able to maintain the point-to-multipoint and multi-hop network characteristics. The data link layer design consists of the radio application, dynamic Maximum Transmission Unit (MTU) optimisation daemon and the tunslip interface. The proposed design uses tunslip for creating an IP capable data link protocol interface. The radio application receives data from tunslip and compresses the packets and uses the IP addressing information for radio network addressing and routing before forwarding the message to radio network. The dynamic MTU size optimisation daemon controls the tunslip interface maximum MTU size according to the link quality assessment calculated from the radio network diagnostic data received from the radio application. For determining the usability of tunslip as the basis for data link layer protocol, testing of the tunslip interface is conducted with both IEEE 802.15.4 radios and packet data radios. The test cases measure the radio network usability for User Datagram Protocol (UDP) based applications without applying any header or content compression. The test results for the packet data radios reveal that the typical success rate for packet reception through a single-hop link is above 99% with a round-trip-delay of 0.315s for 63B packets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sharpening is a powerful image transformation because sharp edges can bring out image details. Sharpness is achieved by increasing local contrast and reducing edge widths. We present a method that enhances sharpness of images and thereby their perceptual quality. Most existing enhancement techniques require user input to improve the perception of the scene in a manner most pleasing to the particular user. Our goal of image enhancement is to improve the perception of sharpness in digital images for human viewers. We consider two parameters in order to exaggerate the differences between local intensities. The two parameters exploit local contrast and widths of edges. We start from the assumption that color, texture, or objects of focus such as faces affect the human perception of photographs. When human raters are presented with a collection of images with different sharpness and asked to rank them according to perceived sharpness, the results have shown that there is a statistical consensus among the raters. We introduce a ramp enhancement technique by modifying the optimal overshoot in the ramp for different region contrasts as well as the new ramp width. Optimal parameter values are searched to be applied to regions under the criteria mentioned above. In this way, we aim to enhance digital images automatically to create pleasing image output for common users.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data leakage is a serious issue and can result in the loss of sensitive data, compromising user accounts and details, potentially affecting millions of internet users. This paper contributes to research in online security and reducing personal footprint by evaluating the levels of privacy provided by the Firefox browser. The aim of identifying conditions that would minimize data leakage and maximize data privacy is addressed by assessing and comparing data leakage in the four possible browsing modes: normal and private modes using a browser installed on the host PC or using a portable browser from a connected USB device respectively. To provide a firm foundation for analysis, a series of carefully designed, pre-planned browsing sessions were repeated in each of the various modes of Firefox. This included low RAM environments to determine any effects low RAM may have on browser data leakage. The results show that considerable data leakage may occur within Firefox. In normal mode, all of the browsing information is stored within the Mozilla profile folder in Firefox-specific SQLite databases and sessionstore.js. While passwords were not stored as plain text, other confidential information such as credit card numbers could be recovered from the Form history under certain conditions. There is no difference when using a portable browser in normal mode, except that the Mozilla profile folder is located on the USB device rather than the host's hard disk. By comparison, private browsing reduces data leakage. Our findings confirm that no information is written to the Firefox-related locations on the hard disk or USB device during private browsing, implying that no deletion would be necessary and no remnants of data would be forensically recoverable from unallocated space. However, two aspects of data leakage occurred equally in all four browsing modes. Firstly, all of the browsing history was stored in the live RAM and was therefore accessible while the browser remained open. Secondly, in low RAM situations, the operating system caches out RAM to pagefile.sys on the host's hard disk. Irrespective of the browsing mode used, this may include Firefox history elements which can then remain forensically recoverable for considerable time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Personal information is increasingly gathered and used for providing services tailored to user preferences, but the datasets used to provide such functionality can represent serious privacy threats if not appropriately protected. Work in privacy-preserving data publishing targeted privacy guarantees that protect against record re-identification, by making records indistinguishable, or sensitive attribute value disclosure, by introducing diversity or noise in the sensitive values. However, most approaches fail in the high-dimensional case, and the ones that don’t introduce a utility cost incompatible with tailored recommendation scenarios. This paper aims at a sensible trade-off between privacy and the benefits of tailored recommendations, in the context of privacy-preserving data publishing. We empirically demonstrate that significant privacy improvements can be achieved at a utility cost compatible with tailored recommendation scenarios, using a simple partition-based sanitization method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Choosing a single similarity threshold for cutting dendrograms is not sufficient for performing hierarchical clustering analysis of heterogeneous data sets. In addition, alternative automated or semi-automated methods that cut dendrograms in multiple levels make assumptions about the data in hand. In an attempt to help the user to find patterns in the data and resolve ambiguities in cluster assignments, we developed MLCut: a tool that provides visual support for exploring dendrograms of heterogeneous data sets in different levels of detail. The interactive exploration of the dendrogram is coordinated with a representation of the original data, shown as parallel coordinates. The tool supports three analysis steps. Firstly, a single-height similarity threshold can be applied using a dynamic slider to identify the main clusters. Secondly, a distinctiveness threshold can be applied using a second dynamic slider to identify “weak-edges” that indicate heterogeneity within clusters. Thirdly, the user can drill-down to further explore the dendrogram structure - always in relation to the original data - and cut the branches of the tree at multiple levels. Interactive drill-down is supported using mouse events such as hovering, pointing and clicking on elements of the dendrogram. Two prototypes of this tool have been developed in collaboration with a group of biologists for analysing their own data sets. We found that enabling the users to cut the tree at multiple levels, while viewing the effect in the original data, is a promising method for clustering which could lead to scientific discoveries.