3 resultados para Impala, Hadoop, Big Data, HDFS, Social Business Intelligence, SBI, cloudera

em Cambridge University Engineering Department Publications Database


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper will provide a rationale for developing control systems based on the availability of automated identification (Auto ID) information provision. Much of the Auto-ID research has to date focussed on developing the essential infrastructure for dynamically extracting, networking and storing product data. These developments will help to revolutionise the accuracy, quality and timeliness of data acquired by Business Information Systems and should lead to major cost savings and performance improvements as a result. This paper introduces an additional phase of Auto ID research and development in which the nature of control system decisions is reconsidered in the light of the availability of ubiquitous, unique, item-level information. The paper will: (i) Indicate why the availability of ubiquitous, unique, item-level data can enable enhanced and fundamentally different control approaches and highlight potential benefits from control systems incorporating this Auto ID data (ii) Demonstrate what is required to develop control systems based around the availability of Auto ID data. (iii) Outline the research challenges in determining how such systems will be developed.