882 resultados para large scale data gathering
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.
Resumo:
Reconstructing Northern Hemisphere ice-sheet oscillations and meltwater routing to the ocean is important to better understand the mechanisms behind abrupt climate changes. To date, research efforts have mainly focused on the North American (Laurentide) ice-sheets (LIS), leaving the potential role of the European Ice Sheet (EIS), and of the Scandinavian ice-sheet (SIS) in particular, largely unexplored. Using neodymium isotopes in detrital sediments deposited off the Channel River, we provide a continuous and well-dated record for the evolution of the EIS southern margin through the end of the last glacial period and during the deglaciation. Our results reveal that the evolution of EIS margins was accompanied with substantial ice recession (especially of the SIS) and simultaneous release of meltwater to the North Atlantic. These events occurred both in the course of the EIS to its LGM position (i.e., during Heinrich Stadial –HS– 3 and HS2; ∼31–29 ka and ∼26–23 ka, respectively) and during the deglaciation (i.e., at ∼22 ka, ∼20–19 ka and from 18.2 ± 0.2 to 16.7 ± 0.2 ka that corresponds to the first part of HS1). The deglaciation was discontinuous in character, and similar in timing to that of the southern LIS margin, with moderate ice-sheet retreat (from 22.5 ± 0.2 ka in the Baltic lowlands) as soon as the northern summer insolation increase (from ∼23 ka) and an acceleration of the margin retreat thereafter (from ∼20 ka). Importantly, our results show that EIS retreat events and release of meltwater to the North Atlantic during the deglaciation coincide with AMOC destabilisation and interhemispheric climate changes. They thus suggest that the EIS, together with the LIS, could have played a critical role in the climatic reorganization that accompanied the last deglaciation. Finally, our data suggest that meltwater discharges to the North Atlantic produced by large-scale recession of continental parts of Northern Hemisphere ice sheets during HS, could have been a possible source for the oceanic perturbations (i.e., AMOC shutdown) responsible for the marine-based ice stream purge cycle, or so-called HE's, that punctuate the last glacial period.
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
Nigerian scam, also known as advance fee fraud or 419 scam, is a prevalent form of online fraudulent activity that causes financial loss to individuals and businesses. Nigerian scam has evolved from simple non-targeted email messages to more sophisticated scams targeted at users of classifieds, dating and other websites. Even though such scams are observed and reported by users frequently, the community’s understanding of Nigerian scams is limited since the scammers operate “underground”. To better understand the underground Nigerian scam ecosystem and seek effective methods to deter Nigerian scam and cybercrime in general, we conduct a series of active and passive measurement studies. Relying upon the analysis and insight gained from the measurement studies, we make four contributions: (1) we analyze the taxonomy of Nigerian scam and derive long-term trends in scams; (2) we provide an insight on Nigerian scam and cybercrime ecosystems and their underground operation; (3) we propose a payment intervention as a potential deterrent to cybercrime operation in general and evaluate its effectiveness; and (4) we offer active and passive measurement tools and techniques that enable in-depth analysis of cybercrime ecosystems and deterrence on them. We first created and analyze a repository of more than two hundred thousand user-reported scam emails, stretching from 2006 to 2014, from four major scam reporting websites. We select ten most commonly observed scam categories and tag 2,000 scam emails randomly selected from our repository. Based upon the manually tagged dataset, we train a machine learning classifier and cluster all scam emails in the repository. From the clustering result, we find a strong and sustained upward trend for targeted scams and downward trend for non-targeted scams. We then focus on two types of targeted scams: sales scams and rental scams targeted users on Craigslist. We built an automated scam data collection system and gathered large-scale sales scam emails. Using the system we posted honeypot ads on Craigslist and conversed automatically with the scammers. Through the email conversation, the system obtained additional confirmation of likely scam activities and collected additional information such as IP addresses and shipping addresses. Our analysis revealed that around 10 groups were responsible for nearly half of the over 13,000 total scam attempts we received. These groups used IP addresses and shipping addresses in both Nigeria and the U.S. We also crawled rental ads on Craigslist, identified rental scam ads amongst the large number of benign ads and conversed with the potential scammers. Through in-depth analysis of the rental scams, we found seven major scam campaigns employing various operations and monetization methods. We also found that unlike sales scammers, most rental scammers were in the U.S. The large-scale scam data and in-depth analysis provide useful insights on how to design effective deterrence techniques against cybercrime in general. We study underground DDoS-for-hire services, also known as booters, and measure the effectiveness of undermining a payment system of DDoS Services. Our analysis shows that the payment intervention can have the desired effect of limiting cybercriminals’ ability and increasing the risk of accepting payments.
Resumo:
Racism continues to thrive on the Internet. Yet, little is known about racism in online settings and the potential consequences. The purpose of this study was to develop the Perceived Online Racism Scale (PORS), the first measure to assess people’s perceived online racism experiences as they interact with others and consume information on the Internet. Items were developed through a multi-stage process based on literature review, focus-groups, and qualitative data collection. Based on a racially diverse large-scale sample (N = 1023), exploratory and confirmatory factor analyses provided support for a 30-item bifactor model with the following three factors: (a) 14-item PORS-IP (personal experiences of racism in online interactions), (b) 5-item PORS-V (observations of other racial/ethnic minorities being offended), and (c) 11-item PORS-I (consumption of online contents and information denigrating racial/ethnic minorities and highlighting racial injustice in society). Initial construct validity examinations suggest that PORS is significantly linked to psychological distress.
Resumo:
Forecasting abrupt variations in wind power generation (the so-called ramps) helps achieve large scale wind power integration. One of the main issues to be confronted when addressing wind power ramp forecasting is the way in which relevant information is identified from large datasets to optimally feed forecasting models. To this end, an innovative methodology oriented to systematically relate multivariate datasets to ramp events is presented. The methodology comprises two stages: the identification of relevant features in the data and the assessment of the dependence between these features and ramp occurrence. As a test case, the proposed methodology was employed to explore the relationships between atmospheric dynamics at the global/synoptic scales and ramp events experienced in two wind farms located in Spain. The achieved results suggested different connection degrees between these atmospheric scales and ramp occurrence. For one of the wind farms, it was found that ramp events could be partly explained from regional circulations and zonal pressure gradients. To perform a comprehensive analysis of ramp underlying causes, the proposed methodology could be applied to datasets related to other stages of the wind-topower conversion chain.
Resumo:
By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.
Resumo:
The continual eruptive activity, occurrence of an ancestral catastrophic collapse, and inherent geologic features of Pacaya volcano (Guatemala) demands an evaluation of potential collapse hazards. This thesis merges techniques in the field and laboratory for a better rock mass characterization of volcanic slopes and slope stability evaluation. New field geological, structural, rock mechanical and geotechnical data on Pacaya is reported and is integrated with laboratory tests to better define the physical-mechanical rock mass properties. Additionally, this data is used in numerical models for the quantitative evaluation of lateral instability of large sector collapses and shallow landslides. Regional tectonics and local structures indicate that the local stress regime is transtensional, with an ENE-WSW sigma 3 stress component. Aligned features trending NNW-SSE can be considered as an expression of this weakness zone that favors magma upwelling to the surface. Numerical modeling suggests that a large-scale collapse could be triggered by reasonable ranges of magma pressure (greater than or equal to 7.7 MPa if constant along a central dyke) and seismic acceleration (greater than or equal to 460 cm/s2), and that a layer of pyroclastic deposits beneath the edifice could have been a factor which controlled the ancestral collapse. Finally, the formation of shear cracks within zones of maximum shear strain could provide conduits for lateral flow, which would account for long lava flows erupted at lower elevations.
Resumo:
Analyzing large-scale gene expression data is a labor-intensive and time-consuming process. To make data analysis easier, we developed a set of pipelines for rapid processing and analysis poplar gene expression data for knowledge discovery. Of all pipelines developed, differentially expressed genes (DEGs) pipeline is the one designed to identify biologically important genes that are differentially expressed in one of multiple time points for conditions. Pathway analysis pipeline was designed to identify the differentially expression metabolic pathways. Protein domain enrichment pipeline can identify the enriched protein domains present in the DEGs. Finally, Gene Ontology (GO) enrichment analysis pipeline was developed to identify the enriched GO terms in the DEGs. Our pipeline tools can analyze both microarray gene data and high-throughput gene data. These two types of data are obtained by two different technologies. A microarray technology is to measure gene expression levels via microarray chips, a collection of microscopic DNA spots attached to a solid (glass) surface, whereas high throughput sequencing, also called as the next-generation sequencing, is a new technology to measure gene expression levels by directly sequencing mRNAs, and obtaining each mRNA’s copy numbers in cells or tissues. We also developed a web portal (http://sys.bio.mtu.edu/) to make all pipelines available to public to facilitate users to analyze their gene expression data. In addition to the analyses mentioned above, it can also perform GO hierarchy analysis, i.e. construct GO trees using a list of GO terms as an input.
Resumo:
Recent marine long-offset transient electromagnetic (LOTEM) measurements yielded the offshore delineation of a fresh groundwater body beneath the seafloor in the region of Bat Yam, Israel. The LOTEM application was effective in detecting this freshwater body underneath the Mediterranean Sea and allowed an estimation of its seaward extent. However, the measured data set was insufficient to understand the hydrogeological configuration and mechanism controlling the occurrence of this fresh groundwater discovery. Especially the lateral geometry of the freshwater boundary, important for the hydrogeological modelling, could not be resolved. Without such an understanding, a rational management of this unexploited groundwater reservoir is not possible. Two new high-resolution marine time-domain electromagnetic methods are theoretically developed to derive the hydrogeological structure of the western aquifer boundary. The first is called Circular Electric Dipole (CED). It is the land-based analogous of the Vertical Electric Dipole (VED), which is commonly applied to detect resistive structures in the subsurface. Although the CED shows exceptional detectability characteristics in the step-off signal towards the sub-seafloor freshwater body, an actual application was not carried out in the extent of this study. It was found that the method suffers from an insufficient signal strength to adequately delineate the resistive aquifer under realistic noise conditions. Moreover, modelling studies demonstrated that severe signal distortions are caused by the slightest geometrical inaccuracies. As a result, a successful application of CED in Israel proved to be rather doubtful. A second method called Differential Electric Dipole (DED) is developed as an alternative to the intended CED method. Compared to the conventional marine time-domain electromagnetic system that commonly applies a horizontal electric dipole transmitter, the DED is composed of two horizontal electric dipoles in an in-line configuration that share a common central electrode. Theoretically, DED has similar detectability/resolution characteristics compared to the conventional LOTEM system. However, the superior lateral resolution towards multi-dimensional resistivity structures make an application desirable. Furthermore, the method is less susceptible towards geometrical errors making an application in Israel feasible. In the extent of this thesis, the novel marine DED method is substantiated using several one-dimensional (1D) and multi-dimensional (2D/3D) modelling studies. The main emphasis lies on the application in Israel. Preliminary resistivity models are derived from the previous marine LOTEM measurement and tested for a DED application. The DED method is effective in locating the two-dimensional resistivity structure at the western aquifer boundary. Moreover, a prediction regarding the hydrogeological boundary conditions are feasible, provided a brackish water zone exists at the head of the interface. A seafloor-based DED transmitter/receiver system is designed and built at the Institute of Geophysics and Meteorology at the University of Cologne. The first DED measurements were carried out in Israel in April 2016. The acquired data set is the first of its kind. The measured data is processed and subsequently interpreted using 1D inversion. The intended aim of interpreting both step-on and step-off signals failed, due to the insufficient data quality of the latter. Yet, the 1D inversion models of the DED step-on signals clearly detect the freshwater body for receivers located close to the Israeli coast. Additionally, a lateral resistivity contrast is observable in the 1D inversion models that allow to constrain the seaward extent of this freshwater body. A large-scale 2D modelling study followed the 1D interpretation. In total, 425 600 forward calculations are conducted to find a sub-seafloor resistivity distribution that adequately explains the measured data. The results indicate that the western aquifer boundary is located at 3600 m - 3700 m before the coast. Moreover, a brackish water zone of 3 Omega*m to 5 Omega*m with a lateral extent of less than 300 m is likely located at the head of the freshwater aquifer. Based on these results, it is predicted that the sub-seafloor freshwater body is indeed open to the sea and may be vulnerable to seawater intrusion.
Resumo:
Canopy and aerodynamic conductances (gC and gA) are two of the key land surface biophysical variables that control the land surface response of land surface schemes in climate models. Their representation is crucial for predicting transpiration (λET) and evaporation (λEE) flux components of the terrestrial latent heat flux (λE), which has important implications for global climate change and water resource management. By physical integration of radiometric surface temperature (TR) into an integrated framework of the Penman?Monteith and Shuttleworth?Wallace models, we present a novel approach to directly quantify the canopy-scale biophysical controls on λET and λEE over multiple plant functional types (PFTs) in the Amazon Basin. Combining data from six LBA (Large-scale Biosphere-Atmosphere Experiment in Amazonia) eddy covariance tower sites and a TR-driven physically based modeling approach, we identified the canopy-scale feedback-response mechanism between gC, λET, and atmospheric vapor pressure deficit (DA), without using any leaf-scale empirical parameterizations for the modeling. The TR-based model shows minor biophysical control on λET during the wet (rainy) seasons where λET becomes predominantly radiation driven and net radiation (RN) determines 75 to 80 % of the variances of λET. However, biophysical control on λET is dramatically increased during the dry seasons, and particularly the 2005 drought year, explaining 50 to 65 % of the variances of λET, and indicates λET to be substantially soil moisture driven during the rainfall deficit phase. Despite substantial differences in gA between forests and pastures, very similar canopy?atmosphere "coupling" was found in these two biomes due to soil moisture-induced decrease in gC in the pasture. This revealed the pragmatic aspect of the TR-driven model behavior that exhibits a high sensitivity of gC to per unit change in wetness as opposed to gA that is marginally sensitive to surface wetness variability. Our results reveal the occurrence of a significant hysteresis between λET and gC during the dry season for the pasture sites, which is attributed to relatively low soil water availability as compared to the rainforests, likely due to differences in rooting depth between the two systems. Evaporation was significantly influenced by gA for all the PFTs and across all wetness conditions. Our analytical framework logically captures the responses of gC and gA to changes in atmospheric radiation, DA, and surface radiometric temperature, and thus appears to be promising for the improvement of existing land?surface?atmosphere exchange parameterizations across a range of spatial scales.
Resumo:
Canopy and aerodynamic conductances (gC and gA) are two of the key land surface biophysical variables that control the land surface response of land surface schemes in climate models. Their representation is crucial for predicting transpiration (?ET) and evaporation (?EE) flux components of the terrestrial latent heat flux (?E), which has important implications for global climate change and water resource management. By physical integration of radiometric surface temperature (TR) into an integrated framework of the Penman?Monteith and Shuttleworth?Wallace models, we present a novel approach to directly quantify the canopy-scale biophysical controls on ?ET and ?EE over multiple plant functional types (PFTs) in the Amazon Basin. Combining data from six LBA (Large-scale Biosphere-Atmosphere Experiment in Amazonia) eddy covariance tower sites and a TR-driven physically based modeling approach, we identified the canopy-scale feedback-response mechanism between gC, ?ET, and atmospheric vapor pressure deficit (DA), without using any leaf-scale empirical parameterizations for the modeling. The TR-based model shows minor biophysical control on ?ET during the wet (rainy) seasons where ?ET becomes predominantly radiation driven and net radiation (RN) determines 75 to 80?% of the variances of ?ET. However, biophysical control on ?ET is dramatically increased during the dry seasons, and particularly the 2005 drought year, explaining 50 to 65?% of the variances of ?ET, and indicates ?ET to be substantially soil moisture driven during the rainfall deficit phase. Despite substantial differences in gA between forests and pastures, very similar canopy?atmosphere "coupling" was found in these two biomes due to soil moisture-induced decrease in gC in the pasture. This revealed the pragmatic aspect of the TR-driven model behavior that exhibits a high sensitivity of gC to per unit change in wetness as opposed to gA that is marginally sensitive to surface wetness variability. Our results reveal the occurrence of a significant hysteresis between ?ET and gC during the dry season for the pasture sites, which is attributed to relatively low soil water availability as compared to the rainforests, likely due to differences in rooting depth between the two systems. Evaporation was significantly influenced by gA for all the PFTs and across all wetness conditions. Our analytical framework logically captures the responses of gC and gA to changes in atmospheric radiation, DA, and surface radiometric temperature, and thus appears to be promising for the improvement of existing land?surface?atmosphere exchange parameterizations across a range of spatial scales.
Resumo:
The fourth industrial revolution, also known as Industry 4.0, has rapidly gained traction in businesses across Europe and the world, becoming a central theme in small, medium, and large enterprises alike. This new paradigm shifts the focus from locally-based and barely automated firms to a globally interconnected industrial sector, stimulating economic growth and productivity, and supporting the upskilling and reskilling of employees. However, despite the maturity and scalability of information and cloud technologies, the support systems already present in the machine field are often outdated and lack the necessary security, access control, and advanced communication capabilities. This dissertation proposes architectures and technologies designed to bridge the gap between Operational and Information Technology, in a manner that is non-disruptive, efficient, and scalable. The proposal presents cloud-enabled data-gathering architectures that make use of the newest IT and networking technologies to achieve the desired quality of service and non-functional properties. By harnessing industrial and business data, processes can be optimized even before product sale, while the integrated environment enhances data exchange for post-sale support. The architectures have been tested and have shown encouraging performance results, providing a promising solution for companies looking to embrace Industry 4.0, enhance their operational capabilities, and prepare themselves for the upcoming fifth human-centric revolution.
Resumo:
The purpose of this research study is to discuss privacy and data protection-related regulatory and compliance challenges posed by digital transformation in healthcare in the wake of the COVID-19 pandemic. The public health crisis accelerated the development of patient-centred remote/hybrid healthcare delivery models that make increased use of telehealth services and related digital solutions. The large-scale uptake of IoT-enabled medical devices and wellness applications, and the offering of healthcare services via healthcare platforms (online doctor marketplaces) have catalysed these developments. However, the use of new enabling technologies (IoT, AI) and the platformisation of healthcare pose complex challenges to the protection of patient’s privacy and personal data. This happens at a time when the EU is drawing up a new regulatory landscape for the use of data and digital technologies. Against this background, the study presents an interdisciplinary (normative and technology-oriented) critical assessment on how the new regulatory framework may affect privacy and data protection requirements regarding the deployment and use of Internet of Health Things (hardware) devices and interconnected software (AI systems). The study also assesses key privacy and data protection challenges that affect healthcare platforms (online doctor marketplaces) in their offering of video API-enabled teleconsultation services and their (anticipated) integration into the European Health Data Space. The overall conclusion of the study is that regulatory deficiencies may create integrity risks for the protection of privacy and personal data in telehealth due to uncertainties about the proper interplay, legal effects and effectiveness of (existing and proposed) EU legislation. The proliferation of normative measures may increase compliance costs, hinder innovation and ultimately, deprive European patients from state-of-the-art digital health technologies, which is paradoxically, the opposite of what the EU plans to achieve.