962 resultados para data complexity
Resumo:
Advocates of Big Data assert that we are in the midst of an epistemological revolution, promising the displacement of the modernist methodological hegemony of causal analysis and theory generation. It is alleged that the growing ‘deluge’ of digitally generated data, and the development of computational algorithms to analyse them, has enabled new inductive ways of accessing everyday relational interactions through their ‘datafication’. This paper critically engages with these discourses of Big Data and complexity, particularly as they operate in the discipline of International Relations, where it is alleged that Big Data approaches have the potential for developing self-governing societal capacities for resilience and adaptation through the real-time reflexive awareness and management of risks and problems as they arise. The epistemological and ontological assumptions underpinning Big Data are then analysed to suggest that critical and posthumanist approaches have come of age through these discourses, enabling process-based and relational understandings to be translated into policy and governance practices. The paper thus raises some questions for the development of critical approaches to new posthuman forms of governance and knowledge production.
How the World Learned to Stop Worrying and Love Failure: Big Data, Resilience and Emergent Causality
Resumo:
In modernity, failure was the discourse of critique, today, it is increasingly the discourse of power: failure has changed its allegiances. Over the last two decades, failure has been enfolded into discourses of power, facilitating the development of new policy approaches. Foremost among governing approaches that seek to include and to govern through failure is that of resilience. This article seeks to reflect upon how the understanding of failure has become transformed in this process, particularly linking this transformation to the radical appreciation of contingency and of the limits to instrumental cause-and-effect approaches to rule. Whereas modernity was shaped by a contestation over failure as an epistemological boundary, under conditions of contingency and complexity there appears to be a new consensus on failure as an ontological necessity. This problematic ‘ontological turn’ is illustrated using examples of changing approaches to risks, especially anthropogenic understandings of environmental threats, formerly seen as ‘natural’.
Resumo:
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.
Resumo:
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.
Resumo:
Ecosystem engineers that increase habitat complexity are keystone species in marine systems, increasing shelter and niche availability, and therefore biodiversity. For example, kelp holdfasts form intricate structures and host the largest number of organisms in kelp ecosystems. However, methods that quantify 3D habitat complexity have only seldom been used in marine habitats, and never in kelp holdfast communities. This study investigated the role of kelp holdfasts (Laminaria hyperborea) in supporting benthic faunal biodiversity. Computer-aided tomography (CT-) scanning was used to quantify the three-dimensional geometrical complexity of holdfasts, including volume, surface area and surface fractal dimension (FD). Additionally, the number of haptera, number of haptera per unit of volume, and age of kelps were estimated. These measurements were compared to faunal biodiversity and community structure, using partial least-squares regression and multivariate ordination. Holdfast volume explained most of the variance observed in biodiversity indices, however all other complexity measures also strongly contributed to the variance observed. Multivariate ordinations further revealed that surface area and haptera per unit of volume accounted for the patterns observed in faunal community structure. Using 3D image analysis, this study makes a strong contribution to elucidate quantitative mechanisms underlying the observed relationship between biodiversity and habitat complexity. Furthermore, the potential of CT-scanning as an ecological tool is demonstrated, and a methodology for its use in future similar studies is established. Such spatially resolved imager analysis could help identify structurally complex areas as biodiversity hotspots, and may support the prioritization of areas for conservation.
Resumo:
Ecosystem engineers that increase habitat complexity are keystone species in marine systems, increasing shelter and niche availability, and therefore biodiversity. For example, kelp holdfasts form intricate structures and host the largest number of organisms in kelp ecosystems. However, methods that quantify 3D habitat complexity have only seldom been used in marine habitats, and never in kelp holdfast communities. This study investigated the role of kelp holdfasts (Laminaria hyperborea) in supporting benthic faunal biodiversity. Computer-aided tomography (CT-) scanning was used to quantify the three-dimensional geometrical complexity of holdfasts, including volume, surface area and surface fractal dimension (FD). Additionally, the number of haptera, number of haptera per unit of volume, and age of kelps were estimated. These measurements were compared to faunal biodiversity and community structure, using partial least-squares regression and multivariate ordination. Holdfast volume explained most of the variance observed in biodiversity indices, however all other complexity measures also strongly contributed to the variance observed. Multivariate ordinations further revealed that surface area and haptera per unit of volume accounted for the patterns observed in faunal community structure. Using 3D image analysis, this study makes a strong contribution to elucidate quantitative mechanisms underlying the observed relationship between biodiversity and habitat complexity. Furthermore, the potential of CT-scanning as an ecological tool is demonstrated, and a methodology for its use in future similar studies is established. Such spatially resolved imager analysis could help identify structurally complex areas as biodiversity hotspots, and may support the prioritization of areas for conservation.
Resumo:
Provenance plays a pivotal in tracing the origin of something and determining how and why something had occurred. With the emergence of the cloud and the benefits it encompasses, there has been a rapid proliferation of services being adopted by commercial and government sectors. However, trust and security concerns for such services are on an unprecedented scale. Currently, these services expose very little internal working to their customers; this can cause accountability and compliance issues especially in the event of a fault or error, customers and providers are left to point finger at each other. Provenance-based traceability provides a mean to address part of this problem by being able to capture and query events occurred in the past to understand how and why it took place. However, due to the complexity of the cloud infrastructure, the current provenance models lack the expressibility required to describe the inner-working of a cloud service. For a complete solution, a provenance-aware policy language is also required for operators and users to define policies for compliance purpose. The current policy standards do not cater for such requirement. To address these issues, in this paper we propose a provenance (traceability) model cProv, and a provenance-aware policy language (cProvl) to capture traceability data, and express policies for validating against the model. For implementation, we have extended the XACML3.0 architecture to support provenance, and provided a translator that converts cProvl policy and request into XACML type.
Resumo:
The last decades have been characterized by a continuous adoption of IT solutions in the healthcare sector, which resulted in the proliferation of tremendous amounts of data over heterogeneous systems. Distinct data types are currently generated, manipulated, and stored, in the several institutions where patients are treated. The data sharing and an integrated access to this information will allow extracting relevant knowledge that can lead to better diagnostics and treatments. This thesis proposes new integration models for gathering information and extracting knowledge from multiple and heterogeneous biomedical sources. The scenario complexity led us to split the integration problem according to the data type and to the usage specificity. The first contribution is a cloud-based architecture for exchanging medical imaging services. It offers a simplified registration mechanism for providers and services, promotes remote data access, and facilitates the integration of distributed data sources. Moreover, it is compliant with international standards, ensuring the platform interoperability with current medical imaging devices. The second proposal is a sensor-based architecture for integration of electronic health records. It follows a federated integration model and aims to provide a scalable solution to search and retrieve data from multiple information systems. The last contribution is an open architecture for gathering patient-level data from disperse and heterogeneous databases. All the proposed solutions were deployed and validated in real world use cases.
Resumo:
La possibilité d’estimer l’impact du changement climatique en cours sur le comportement hydrologique des hydro-systèmes est une nécessité pour anticiper les adaptations inévitables et nécessaires que doivent envisager nos sociétés. Dans ce contexte, ce projet doctoral présente une étude sur l’évaluation de la sensibilité des projections hydrologiques futures à : (i) La non-robustesse de l’identification des paramètres des modèles hydrologiques, (ii) l’utilisation de plusieurs jeux de paramètres équifinaux et (iii) l’utilisation de différentes structures de modèles hydrologiques. Pour quantifier l’impact de la première source d’incertitude sur les sorties des modèles, quatre sous-périodes climatiquement contrastées sont tout d’abord identifiées au sein des chroniques observées. Les modèles sont calés sur chacune de ces quatre périodes et les sorties engendrées sont analysées en calage et en validation en suivant les quatre configurations du Different Splitsample Tests (Klemeš, 1986;Wilby, 2005; Seiller et al. (2012);Refsgaard et al. (2014)). Afin d’étudier la seconde source d’incertitude liée à la structure du modèle, l’équifinalité des jeux de paramètres est ensuite prise en compte en considérant pour chaque type de calage les sorties associées à des jeux de paramètres équifinaux. Enfin, pour évaluer la troisième source d’incertitude, cinq modèles hydrologiques de différents niveaux de complexité sont appliqués (GR4J, MORDOR, HSAMI, SWAT et HYDROTEL) sur le bassin versant québécois de la rivière Au Saumon. Les trois sources d’incertitude sont évaluées à la fois dans conditions climatiques observées passées et dans les conditions climatiques futures. Les résultats montrent que, en tenant compte de la méthode d’évaluation suivie dans ce doctorat, l’utilisation de différents niveaux de complexité des modèles hydrologiques est la principale source de variabilité dans les projections de débits dans des conditions climatiques futures. Ceci est suivi par le manque de robustesse de l’identification des paramètres. Les projections hydrologiques générées par un ensemble de jeux de paramètres équifinaux sont proches de celles associées au jeu de paramètres optimal. Par conséquent, plus d’efforts devraient être investis dans l’amélioration de la robustesse des modèles pour les études d’impact sur le changement climatique, notamment en développant les structures des modèles plus appropriés et en proposant des procédures de calage qui augmentent leur robustesse. Ces travaux permettent d’apporter une réponse détaillée sur notre capacité à réaliser un diagnostic des impacts des changements climatiques sur les ressources hydriques du bassin Au Saumon et de proposer une démarche méthodologique originale d’analyse pouvant être directement appliquée ou adaptée à d’autres contextes hydro-climatiques.
Resumo:
This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.
Resumo:
Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.
Resumo:
The increasing use of fossil fuels in line with cities demographic explosion carries out to huge environmental impact in society. For mitigate these social impacts, regulatory requirements have positively influenced the environmental consciousness of society, as well as, the strategic behavior of businesses. Along with this environmental awareness, the regulatory organs have conquered and formulated new laws to control potentially polluting activities, mostly in the gas stations sector. Seeking for increasing market competitiveness, this sector needs to quickly respond to internal and external pressures, adapting to the new standards required in a strategic way to get the Green Badge . Gas stations have incorporated new strategies to attract and retain new customers whom present increasingly social demand. In the social dimension, these projects help the local economy by generating jobs and income distribution. In this survey, the present research aims to align the social, economic and environmental dimensions to set the sustainable performance indicators at Gas Stations sector in the city of Natal/RN. The Sustainable Balanced Scorecard (SBSC) framework was create with a set of indicators for mapping the production process of gas stations. This mapping aimed at identifying operational inefficiencies through multidimensional indicators. To carry out this research, was developed a system for evaluating the sustainability performance with application of Data Envelopment Analysis (DEA) through a quantitative method approach to detect system s efficiency level. In order to understand the systemic complexity, sub organizational processes were analyzed by the technique Network Data Envelopment Analysis (NDEA) figuring their micro activities to identify and diagnose the real causes of overall inefficiency. The sample size comprised 33 Gas stations and the conceptual model included 15 indicators distributed in the three dimensions of sustainability: social, environmental and economic. These three dimensions were measured by means of classical models DEA-CCR input oriented. To unify performance score of individual dimensions, was designed a unique grouping index based upon two means: arithmetic and weighted. After this, another analysis was performed to measure the four perspectives of SBSC: learning and growth, internal processes, customers, and financial, unifying, by averaging the performance scores. NDEA results showed that no company was assessed with excellence in sustainability performance. Some NDEA higher efficiency Gas Stations proved to be inefficient under certain perspectives of SBSC. In the sequence, a comparative sustainable performance and assessment analyzes among the gas station was done, enabling entrepreneurs evaluate their performance in the market competitors. Diagnoses were also obtained to support the decision making of entrepreneurs in improving the management of organizational resources and promote guidelines the regulators. Finally, the average index of sustainable performance was 69.42%, representing the efforts of the environmental suitability of the Gas station. This results point out a significant awareness of this segment, but it still needs further action to enhance sustainability in the long term
Resumo:
The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.
Resumo:
Background: This study is part of an interactive improvement intervention aimed to facilitate empowerment-based chronic kidney care using data from persons with CKD and their family members. There are many challenges to implementing empowerment-based care, and it is therefore necessary to study the implementation process. The aim of this study was to generate knowledge regarding the implementation process of an improvement intervention of empowerment for those who require chronic kidney care. Methods: A prospective single qualitative case study was chosen to follow the process of the implementation over a two year period. Twelve health care professionals were selected based on their various role(s) in the implementation of the improvement intervention. Data collection comprised of digitally recorded project group meetings, field notes of the meetings, and individual interviews before and after the improvement project. These multiple data were analyzed using qualitative latent content analysis. Results: Two facilitator themes emerged: Moving spirit and Encouragement. The healthcare professionals described a willingness to individualize care and to increase their professional development in the field of chronic kidney care. The implementation process was strongly reinforced by both the researchers working interactively with the staff, and the project group. One theme emerged as a barrier: the Limitations of the organization. Changes in the organization hindered the implementation of the intervention throughout the study period, and the lack of interplay in the organization most impeded the process. Conclusions: The findings indicated the complexity of maintaining a sustainable and lasting implementation over a period of two years. Implementing empowerment-based care was found to be facilitated by the cooperation between all involved healthcare professionals. Furthermore, long-term improvement interventions need strong encouragement from all levels of the organization to maintain engagement, even when it is initiated by the health care professionals themselves.
Resumo:
The advent of omic data production has opened many new perspectives in the quest for modelling complexity in biophysical systems. With the capability of characterizing a complex organism through the patterns of its molecular states, observed at different levels through various omics, a new paradigm of investigation is arising. In this thesis, we investigate the links between perturbations of the human organism, described as the ensemble of crosstalk of its molecular states, and health. Machine learning plays a key role within this picture, both in omic data analysis and model building. We propose and discuss different frameworks developed by the author using machine learning for data reduction, integration, projection on latent features, pattern analysis, classification and clustering of omic data, with a focus on 1H NMR metabolomic spectral data. The aim is to link different levels of omic observations of molecular states, from nanoscale to macroscale, to study perturbations such as diseases and diet interpreted as changes in molecular patterns. The first part of this work focuses on the fingerprinting of diseases, linking cellular and systemic metabolomics with genomic to asses and predict the downstream of perturbations all the way down to the enzymatic network. The second part is a set of frameworks and models, developed with 1H NMR metabolomic at its core, to study the exposure of the human organism to diet and food intake in its full complexity, from epidemiological data analysis to molecular characterization of food structure.