124 resultados para big data storage

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The concept of big data has already outperformed traditional data management efforts in almost all industries. Other instances it has succeeded in obtaining promising results that provide value from large-scale integration and analysis of heterogeneous data sources for example Genomic and proteomic information. Big data analytics have become increasingly important in describing the data sets and analytical techniques in software applications that are so large and complex due to its significant advantages including better business decisions, cost reduction and delivery of new product and services [1]. In a similar context, the health community has experienced not only more complex and large data content, but also information systems that contain a large number of data sources with interrelated and interconnected data attributes. That have resulted in challenging, and highly dynamic environments leading to creation of big data with its enumerate complexities, for instant sharing of information with the expected security requirements of stakeholders. When comparing big data analysis with other sectors, the health sector is still in its early stages. Key challenges include accommodating the volume, velocity and variety of healthcare data with the current deluge of exponential growth. Given the complexity of big data, it is understood that while data storage and accessibility are technically manageable, the implementation of Information Accountability measures to healthcare big data might be a practical solution in support of information security, privacy and traceability measures. Transparency is one important measure that can demonstrate integrity which is a vital factor in the healthcare service. Clarity about performance expectations is considered to be another Information Accountability measure which is necessary to avoid data ambiguity and controversy about interpretation and finally, liability [2]. According to current studies [3] Electronic Health Records (EHR) are key information resources for big data analysis and is also composed of varied co-created values [3]. Common healthcare information originates from and is used by different actors and groups that facilitate understanding of the relationship for other data sources. Consequently, healthcare services often serve as an integrated service bundle. Although a critical requirement in healthcare services and analytics, it is difficult to find a comprehensive set of guidelines to adopt EHR to fulfil the big data analysis requirements. Therefore as a remedy, this research work focus on a systematic approach containing comprehensive guidelines with the accurate data that must be provided to apply and evaluate big data analysis until the necessary decision making requirements are fulfilled to improve quality of healthcare services. Hence, we believe that this approach would subsequently improve quality of life.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present rate of technological advance continues to place significant demands on data storage devices. The sheer amount of digital data being generated each year along with consumer expectations, fuels these demands. At present, most digital data is stored magnetically, in the form of hard disk drives or on magnetic tape. The increase in areal density (AD) of magnetic hard disk drives over the past 50 years has been of the order of 100 million times, and current devices are storing data at ADs of the order of hundreds of gigabits per square inch. However, it has been known for some time that the progress in this form of data storage is approaching fundamental limits. The main limitation relates to the lower size limit that an individual bit can have for stable storage. Various techniques for overcoming these fundamental limits are currently the focus of considerable research effort. Most attempt to improve current data storage methods, or modify these slightly for higher density storage. Alternatively, three dimensional optical data storage is a promising field for the information storage needs of the future, offering very high density, high speed memory. There are two ways in which data may be recorded in a three dimensional optical medium; either bit-by-bit (similar in principle to an optical disc medium such as CD or DVD) or by using pages of bit data. Bit-by-bit techniques for three dimensional storage offer high density but are inherently slow due to the serial nature of data access. Page-based techniques, where a two-dimensional page of data bits is written in one write operation, can offer significantly higher data rates, due to their parallel nature. Holographic Data Storage (HDS) is one such page-oriented optical memory technique. This field of research has been active for several decades, but with few commercial products presently available. Another page-oriented optical memory technique involves recording pages of data as phase masks in a photorefractive medium. A photorefractive material is one by which the refractive index can be modified by light of the appropriate wavelength and intensity, and this property can be used to store information in these materials. In phase mask storage, two dimensional pages of data are recorded into a photorefractive crystal, as refractive index changes in the medium. A low-intensity readout beam propagating through the medium will have its intensity profile modified by these refractive index changes and a CCD camera can be used to monitor the readout beam, and thus read the stored data. The main aim of this research was to investigate data storage using phase masks in the photorefractive crystal, lithium niobate (LiNbO3). Firstly the experimental methods for storing the two dimensional pages of data (a set of vertical stripes of varying lengths) in the medium are presented. The laser beam used for writing, whose intensity profile is modified by an amplitudemask which contains a pattern of the information to be stored, illuminates the lithium niobate crystal and the photorefractive effect causes the patterns to be stored as refractive index changes in the medium. These patterns are read out non-destructively using a low intensity probe beam and a CCD camera. A common complication of information storage in photorefractive crystals is the issue of destructive readout. This is a problem particularly for holographic data storage, where the readout beam should be at the same wavelength as the beam used for writing. Since the charge carriers in the medium are still sensitive to the read light field, the readout beam erases the stored information. A method to avoid this is by using thermal fixing. Here the photorefractive medium is heated to temperatures above 150�C; this process forms an ionic grating in the medium. This ionic grating is insensitive to the readout beam and therefore the information is not erased during readout. A non-contact method for determining temperature change in a lithium niobate crystal is presented in this thesis. The temperature-dependent birefringent properties of the medium cause intensity oscillations to be observed for a beam propagating through the medium during a change in temperature. It is shown that each oscillation corresponds to a particular temperature change, and by counting the number of oscillations observed, the temperature change of the medium can be deduced. The presented technique for measuring temperature change could easily be applied to a situation where thermal fixing of data in a photorefractive medium is required. Furthermore, by using an expanded beam and monitoring the intensity oscillations over a wide region, it is shown that the temperature in various locations of the crystal can be monitored simultaneously. This technique could be used to deduce temperature gradients in the medium. It is shown that the three dimensional nature of the recording medium causes interesting degradation effects to occur when the patterns are written for a longer-than-optimal time. This degradation results in the splitting of the vertical stripes in the data pattern, and for long writing exposure times this process can result in the complete deterioration of the information in the medium. It is shown in that simply by using incoherent illumination, the original pattern can be recovered from the degraded state. The reason for the recovery is that the refractive index changes causing the degradation are of a smaller magnitude since they are induced by the write field components scattered from the written structures. During incoherent erasure, the lower magnitude refractive index changes are neutralised first, allowing the original pattern to be recovered. The degradation process is shown to be reversed during the recovery process, and a simple relationship is found relating the time at which particular features appear during degradation and recovery. A further outcome of this work is that the minimum stripe width of 30 ìm is required for accurate storage and recovery of the information in the medium, any size smaller than this results in incomplete recovery. The degradation and recovery process could be applied to an application in image scrambling or cryptography for optical information storage. A two dimensional numerical model based on the finite-difference beam propagation method (FD-BPM) is presented and used to gain insight into the pattern storage process. The model shows that the degradation of the patterns is due to the complicated path taken by the write beam as it propagates through the crystal, and in particular the scattering of this beam from the induced refractive index structures in the medium. The model indicates that the highest quality pattern storage would be achieved with a thin 0.5 mm medium; however this type of medium would also remove the degradation property of the patterns and the subsequent recovery process. To overcome the simplistic treatment of the refractive index change in the FD-BPM model, a fully three dimensional photorefractive model developed by Devaux is presented. This model shows significant insight into the pattern storage, particularly for the degradation and recovery process, and confirms the theory that the recovery of the degraded patterns is possible since the refractive index changes responsible for the degradation are of a smaller magnitude. Finally, detailed analysis of the pattern formation and degradation dynamics for periodic patterns of various periodicities is presented. It is shown that stripe widths in the write beam of greater than 150 ìm result in the formation of different types of refractive index changes, compared with the stripes of smaller widths. As a result, it is shown that the pattern storage method discussed in this thesis has an upper feature size limit of 150 ìm, for accurate and reliable pattern storage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data is big news in almost every sector including crisis communication. However, not everyone has access to big data and even if we have access to big data, we often do not have necessary tools to analyze and cross reference such a large data set. Therefore this paper looks at patterns in small data sets that we have ability to collect with our current tools to understand if we can find actionable information from what we already have. We have analyzed 164390 tweets collected during 2011 earthquake to find out what type of location specific information people mention in their tweet and when do they talk about that. Based on our analysis we find that even a small data set that has far less data than a big data set can be useful to find priority disaster specific areas quickly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summary: More than ever before contemporary societies are characterised by the huge amounts of data being transferred. Authorities, companies, academia and other stakeholders refer to Big Data when discussing the importance of large and complex datasets and developing possible solutions for their use. Big Data promises to be the next frontier of innovation for institutions and individuals, yet it also offers possibilities to predict and influence human behaviour with ever-greater precision

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main theme of this thesis is to allow the users of cloud services to outsource their data without the need to trust the cloud provider. The method is based on combining existing proof-of-storage schemes with distance-bounding protocols. Specifically, cloud customers will be able to verify the confidentiality, integrity, availability, fairness (or mutual non-repudiation), data freshness, geographic assurance and replication of their stored data directly, without having to rely on the word of the cloud provider.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data is a rising IT trend similar to cloud computing, social networking or ubiquitous computing. Big Data can offer beneficial scenarios in the e-health arena. However, one of the scenarios can be that Big Data needs to be kept secured for a long period of time in order to gain its benefits such as finding cures for infectious diseases and protecting patient privacy. From this connection, it is beneficial to analyse Big Data to make meaningful information while the data is stored securely. Therefore, the analysis of various database encryption techniques is essential. In this study, we simulated 3 types of technical environments, namely, Plain-text, Microsoft Built-in Encryption, and custom Advanced Encryption Standard, using Bucket Index in Data-as-a-Service. The results showed that custom AES-DaaS has a faster range query response time than MS built-in encryption. Furthermore, while carrying out the scalability test, we acknowledged that there are performance thresholds depending on physical IT resources. Therefore, for the purpose of efficient Big Data management in eHealth it is noteworthy to examine their scalability limits as well even if it is under a cloud computing environment. In addition, when designing an e-health database, both patient privacy and system performance needs to be dealt as top priorities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The promise of ‘big data’ has generated a significant deal of interest in the development of new approaches to research in the humanities and social sciences, as well as a range of important critical interventions which warn of an unquestioned rush to ‘big data’. Drawing on the experiences made in developing innovative ‘big data’ approaches to social media research, this paper examines some of the repercussions for the scholarly research and publication practices of those researchers who do pursue the path of ‘big data’–centric investigation in their work. As researchers import the tools and methods of highly quantitative, statistical analysis from the ‘hard’ sciences into computational, digital humanities research, must they also subscribe to the language and assumptions underlying such ‘scientificity’? If so, how does this affect the choices made in gathering, processing, analysing, and disseminating the outcomes of digital humanities research? In particular, is there a need to rethink the forms and formats of publishing scholarly work in order to enable the rigorous scrutiny and replicability of research outcomes?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic sensing is a promising approach to scaling faunal biodiversity monitoring. Scaling the analysis of audio collected by acoustic sensors is a big data problem. Standard approaches for dealing with big acoustic data include automated recognition and crowd based analysis. Automatic methods are fast at processing but hard to rigorously design, whilst manual methods are accurate but slow at processing. In particular, manual methods of acoustic data analysis are constrained by a 1:1 time relationship between the data and its analysts. This constraint is the inherent need to listen to the audio data. This paper demonstrates how the efficiency of crowd sourced sound analysis can be increased by an order of magnitude through the visual inspection of audio visualized as spectrograms. Experimental data suggests that an analysis speedup of 12× is obtainable for suitable types of acoustic analysis, given that only spectrograms are shown.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data is certainly the buzz term in executive networking circles at the moment. Heralded by management consultancies and research organisations alike as the next big thing in business efficiency, it is shooting up the Gartner hype cycle to the giddy heights of the peak of inflated expectations before it tumbles down in to the trough of disillusionment

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One cannot help but be impressed by the inroads that digital oilfield technologies have made into the exploration and production (E&P) industry in the past decade. Today’s production systems can be monitored by “smart” sensors that allow engineers to observe almost any aspect of performance in real time. Our understanding of how reservoirs are behaving has improved considerably since the dawn of this revolution, and the industry has been able to move away from point answers to more holistic “big picture” integrated solutions. Indeed, the industry has already reaped the rewards of many of these kinds of investments. Many billions of dollars of value have been delivered by this heightened awareness of what is going on within our assets and the world around them (Van Den Berg et al. 2010).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Metaphors are a common instrument of human cognition, activated when seeking to make sense of novel and abstract phenomena. In this article we assess some of the values and assumptions encoded in the framing of the term big data, drawing on the framework of conceptual metaphor. We first discuss the terms data and big data and the meanings historically attached to them by different usage communities and then proceed with a discourse analysis of Internet news items about big data. We conclude by characterizing two recurrent framings of the concept: as a natural force to be controlled and as a resource to be consumed.