893 resultados para Big data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hotel chains have access to a treasure trove of “big data” on individual hotels’ monthly electricity and water consumption. Benchmarked comparisons of hotels within a specific chain create the opportunity to cost-effectively improve the environmental performance of specific hotels. This paper describes a simple approach for using such data to achieve the joint goals of reducing operating expenditure and achieving broad sustainability goals. In recent years, energy economists have used such “big data” to generate insights about the energy consumption of the residential, commercial, and industrial sectors. Lessons from these studies are directly applicable for the hotel sector. A hotel’s administrative data provide a “laboratory” for conducting random control trials to establish what works in enhancing hotel energy efficiency.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The mainstay of Big Data is prediction in that it allows practitioners, researchers, and policy analysts to predict trends based upon the analysis of large and varied sources of data. These can range from changing social and political opinions, patterns in crimes, and consumer behaviour. Big Data has therefore shifted the criterion of success in science from causal explanations to predictive modelling and simulation. The 19th-century science sought to capture phenomena and seek to show the appearance of it through causal mechanisms while 20th-century science attempted to save the appearance and relinquish causal explanations. Now 21st-century science in the form of Big Data is concerned with the prediction of appearances and nothing more. However, this pulls social science back in the direction of a more rule- or law-governed reality model of science and away from a consideration of the internal nature of rules in relation to various practices. In effect Big Data offers us no more than a world of surface appearance and in doing so it makes disappear any context-specific conceptual sensitivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Relaatiotietokannat ovat olleet vallitseva suunta suurissa tietokantajärjestelmissä jo 80-luvulta lähtien. Viimeisen vuosikymmenen aikana lähes kaikki teollinen ja henkilökohtainen tiedonvaihto on siirtynyt sähköiseen maailmaan. Tämä on aiheuttanut valtaisan kasvun datamäärissä. Sama kasvu jatkuu edelleen eksponentiaalisesti. Samalla ei-relaatiotietokannat eli NoSQL-tietokannat ovat nousseet huomattavaan asemaan. Monet organisaatiot käsittelevät suuria määriä järjestämätöntä dataa, jolloin perinteisen relaatiotietokannan käyttö yksin ei välttämättä ole paras, tai edes riittävä vaihtoehto. Web 2.0 -termin takana oleva internet-kulttuurin muutos tukee mukautuvampia ja skaalautuvia NoSQL-järjestelmiä. Internetin käyttäjät, erityisesti sosiaalisessa mediassa tuottavat valtavia määriä järjestymätöntä dataa. Kerättävä tieto ei ole enää tietyn mallin mukaan muotoiltua, vaan yksittäiseen tietueeseen saattaa liittyä esimerkiksi kuvia, videoita, viittauksia muiden käyttäjien luomiin instansseihin tai osoitetietoja. Tässä tutkielmassa käsitellään NoSQL-järjestelmien rakennetta sekä asemaa erityisesti suurissa tietojärjestelmissä ja vertaillaan niiden hyötyjä ja haittoja relaatiotietokantojen suhteen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The speed with which data has moved from being scarce, expensive and valuable, thus justifying detailed and careful verification and analysis to a situation where the streams of detailed data are almost too large to handle has caused a series of shifts to occur. Legal systems already have severe problems keeping up with, or even in touch with, the rate at which unexpected outcomes flow from information technology. The capacity to harness massive quantities of existing data has driven Big Data applications until recently. Now the data flows in real time are rising swiftly, become more invasive and offer monitoring potential that is eagerly sought by commerce and government alike. The ambiguities as to who own this often quite remarkably intrusive personal data need to be resolved – and rapidly - but are likely to encounter rising resistance from industrial and commercial bodies who see this data flow as ‘theirs’. There have been many changes in ICT that has led to stresses in the resolution of the conflicts between IP exploiters and their customers, but this one is of a different scale due to the wide potential for individual customisation of pricing, identification and the rising commercial value of integrated streams of diverse personal data. A new reconciliation between the parties involved is needed. New business models, and a shift in the current confusions over who owns what data into alignments that are in better accord with the community expectations. After all they are the customers, and the emergence of information monopolies needs to be balanced by appropriate consumer/subject rights. This will be a difficult discussion, but one that is needed to realise the great benefits to all that are clearly available if these issues can be positively resolved. The customers need to make these data flow contestable in some form. These Big data flows are only going to grow and become ever more instructive. A better balance is necessary, For the first time these changes are directly affecting governance of democracies, as the very effective micro targeting tools deployed in recent elections have shown. Yet the data gathered is not available to the subjects. This is not a survivable social model. The Private Data Commons needs our help. Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons. This Web extra is the audio part of a video in which author Marcus Wigan expands on his article "Big Data's Big Unintended Consequences" and discusses how businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

66 p.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Comunicação apresentada na 44th SEFI Conference, 12-­15 September 2016, Tampere, Finland

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While governments are capturing as much data as they can about us for security reasons, private companies are doing the same in a practice that has become known as “Big Data”. Martin Hirst (p.19) explains how the data we generate is leading to a “surveillance economy” in which businesses build data profiles that enable them to target their advertising at people more effectively, while Benjamin Shiller (p.22) explains how your online purchasing decisions will enable merchants to alter their prices to extract the maximum value from individual customers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The intersection of network function virtualisation (NFV) technologies and big data has the potential of revolutionising today's telecommunication networks from deployment to operations resulting in significant reductions in capital expenditure (CAPEX) and operational expenditure, as well as cloud vendor and additional revenue growths for the operators. One of the contributions of this article is the comparisons of the requirements for big data and network virtualisation and the formulation of the key performance indicators for the distributed big data NFVs at the operator's infrastructures. Big data and virtualisation are highly interdependent and their intersections and dependencies are analysed and the potential optimisation gains resulted from open interfaces between big data and carrier networks NFV functional blocks for an adaptive environment are then discussed. Another contribution of this article is a comprehensive discussion on open interface recommendations which enables global collaborative and scalable virtualised big data applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data analytics for traffic accidents is a hot topic and has significant values for a smart and safe traffic in the city. Based on the massive traffic accident data from October 2014 to March 2015 in Xiamen, China, we propose a novel accident occurrences analytics method in both spatial and temporal dimensions to predict when and where an accident with a specific crash type will occur consequentially by whom. Firstly, we analyze and visualize accident occurrences in both temporal and spatial view. Second, we illustrate spatio-temporal visualization results through two case studies in multiple road segments, and the impact of weather on crash types. These findings of accident occurrences analysis and visualization would not only help traffic police department implement instant personnel assignments among simultaneous accidents, but also inform individual drivers about accident-prone sections and the time span which requires their most attention.