952 resultados para Data recovery (Computer science)
Resumo:
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.
Resumo:
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.
Resumo:
Ecosystem engineers that increase habitat complexity are keystone species in marine systems, increasing shelter and niche availability, and therefore biodiversity. For example, kelp holdfasts form intricate structures and host the largest number of organisms in kelp ecosystems. However, methods that quantify 3D habitat complexity have only seldom been used in marine habitats, and never in kelp holdfast communities. This study investigated the role of kelp holdfasts (Laminaria hyperborea) in supporting benthic faunal biodiversity. Computer-aided tomography (CT-) scanning was used to quantify the three-dimensional geometrical complexity of holdfasts, including volume, surface area and surface fractal dimension (FD). Additionally, the number of haptera, number of haptera per unit of volume, and age of kelps were estimated. These measurements were compared to faunal biodiversity and community structure, using partial least-squares regression and multivariate ordination. Holdfast volume explained most of the variance observed in biodiversity indices, however all other complexity measures also strongly contributed to the variance observed. Multivariate ordinations further revealed that surface area and haptera per unit of volume accounted for the patterns observed in faunal community structure. Using 3D image analysis, this study makes a strong contribution to elucidate quantitative mechanisms underlying the observed relationship between biodiversity and habitat complexity. Furthermore, the potential of CT-scanning as an ecological tool is demonstrated, and a methodology for its use in future similar studies is established. Such spatially resolved imager analysis could help identify structurally complex areas as biodiversity hotspots, and may support the prioritization of areas for conservation.
Resumo:
Ecosystem engineers that increase habitat complexity are keystone species in marine systems, increasing shelter and niche availability, and therefore biodiversity. For example, kelp holdfasts form intricate structures and host the largest number of organisms in kelp ecosystems. However, methods that quantify 3D habitat complexity have only seldom been used in marine habitats, and never in kelp holdfast communities. This study investigated the role of kelp holdfasts (Laminaria hyperborea) in supporting benthic faunal biodiversity. Computer-aided tomography (CT-) scanning was used to quantify the three-dimensional geometrical complexity of holdfasts, including volume, surface area and surface fractal dimension (FD). Additionally, the number of haptera, number of haptera per unit of volume, and age of kelps were estimated. These measurements were compared to faunal biodiversity and community structure, using partial least-squares regression and multivariate ordination. Holdfast volume explained most of the variance observed in biodiversity indices, however all other complexity measures also strongly contributed to the variance observed. Multivariate ordinations further revealed that surface area and haptera per unit of volume accounted for the patterns observed in faunal community structure. Using 3D image analysis, this study makes a strong contribution to elucidate quantitative mechanisms underlying the observed relationship between biodiversity and habitat complexity. Furthermore, the potential of CT-scanning as an ecological tool is demonstrated, and a methodology for its use in future similar studies is established. Such spatially resolved imager analysis could help identify structurally complex areas as biodiversity hotspots, and may support the prioritization of areas for conservation.
Resumo:
This research paper presents the work on feature recognition, tool path data generation and integration with STEP-NC (AP-238 format) for features having Free form / Irregular Contoured Surface(s) (FICS). Initially, the FICS features are modelled / imported in UG CAD package and a closeness index is generated. This is done by comparing the FICS features with basic B-Splines / Bezier curves / surfaces. Then blending functions are caculated by adopting convolution theorem. Based on the blending functions, contour offsett tool paths are generated and simulated for 5 axis milling environment. Finally, the tool path (CL) data is integrated with STEP-NC (AP-238) format. The tool path algorithm and STEP- NC data is tested with various industrial parts through an automated UFUNC plugin.
Resumo:
Stealthy attackers move patiently through computer networks - taking days, weeks or months to accomplish their objectives in order to avoid detection. As networks scale up in size and speed, monitoring for such attack attempts is increasingly a challenge. This paper presents an efficient monitoring technique for stealthy attacks. It investigates the feasibility of proposed method under number of different test cases and examines how design of the network affects the detection. A methodological way for tracing anonymous stealthy activities to their approximate sources is also presented. The Bayesian fusion along with traffic sampling is employed as a data reduction method. The proposed method has the ability to monitor stealthy activities using 10-20% size sampling rates without degrading the quality of detection.
Resumo:
Abstract: Decision support systems have been widely used for years in companies to gain insights from internal data, thus making successful decisions. Lately, thanks to the increasing availability of open data, these systems are also integrating open data to enrich decision making process with external data. On the other hand, within an open-data scenario, decision support systems can be also useful to decide which data should be opened, not only by considering technical or legal constraints, but other requirements, such as "reusing potential" of data. In this talk, we focus on both issues: (i) open data for decision making, and (ii) decision making for opening data. We will first briefly comment some research problems regarding using open data for decision making. Then, we will give an outline of a novel decision-making approach (based on how open data is being actually used in open-source projects hosted in Github) for supporting open data publication. Bio of the speaker: Jose-Norberto Mazón holds a PhD from the University of Alicante (Spain). He is head of the "Cátedra Telefónica" on Big Data and coordinator of the Computing degree at the University of Alicante. He is also member of the WaKe research group at the University of Alicante. His research work focuses on open data management, data integration and business intelligence within "big data" scenarios, and their application to the tourism domain (smart tourism destinations). He has published his research in international journals, such as Decision Support Systems, Information Sciences, Data & Knowledge Engineering or ACM Transaction on the Web. Finally, he is involved in the open data project in the University of Alicante, including its open data portal at http://datos.ua.es
Resumo:
Abstract Massive Open Online Courses (MOOCs) generate enormous amounts of data. The University of Southampton has run and is running dozens of MOOC instances. The vast amount of data resulting from our MOOCs can provide highly valuable information to all parties involved in the creation and delivery of these courses. However, analysing and visualising such data is a task that not all educators have the time or skills to undertake. The recently developed MOOC Dashboard is a tool aimed at bridging such a gap: it provides reports and visualisations based on the data generated by learners in MOOCs. Speakers Manuel Leon is currently a Lecturer in Online Teaching and Learning in the Institute for Learning Innovation and Development (ILIaD). Adriana Wilde is a Teaching Fellow in Electronics and Computer Science, with research interests in MOOCs and Learning Analytics. Darron Tang (4th Year BEng Computer Science) and Jasmine Cheng (BSc Mathematics & Actuarial Science and starting MSc Data Science shortly) have been working as interns over this Summer (2016) as have been developing the MOOC Dashboard.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Abstract not available
Resumo:
Abstract not available
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
Responsible Research Data Management (RDM) is a pillar of quality research. In practice good RDM requires the support of a well-functioning Research Data Infrastructure (RDI). One of the challenges the research community is facing is how to fund the management of research data and the required infrastructure. Knowledge Exchange and Science Europe have both defined activities to explore how RDM/RDI are, or can be, funded. Independently they each planned to survey users and providers of data services and on becoming aware of the similar objectives and approaches, the Science Europe Working Group on Research Data and the Knowledge Exchange Research Data expert group joined forces and devised a joint activity to to inform the discussion on the funding of RDM/RDI in Europe.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.