9 resultados para Impala, Hadoop, Big Data, HDFS, Social Business Intelligence, SBI, cloudera
em Bulgarian Digital Mathematics Library at IMI-BAS
Resumo:
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.
Resumo:
Sustainable development support, balanced scorecard development and business process modeling are viewed from the position of systemology. Extensional, intentional and potential properties of a system are considered as necessary to satisfy functional requirements of a meta-system. The correspondence between extensional, intentional and potential properties of a system and sustainable, unsustainable, crisis and catastrophic states of a system is determined. The inaccessibility cause of the system mission is uncovered. The correspondence between extensional, intentional and potential properties of a system and balanced scorecard perspectives is showed. The IDEF0 function modeling method is checked against balanced scorecard perspectives. The correspondence between balanced scorecard perspectives and IDEF0 notations is considered.
Resumo:
The real purpose of collecting big data is to identify causality in the hope that this will facilitate credible predictivity . But the search for causality can trap one into infinite regress, and thus one takes refuge in seeking associations between variables in data sets. Regrettably, the mere knowledge of associations does not enable predictivity. Associations need to be embedded within the framework of probability calculus to make coherent predictions. This is so because associations are a feature of probability models, and hence they do not exist outside the framework of a model. Measures of association, like correlation, regression, and mutual information merely refute a preconceived model. Estimated measures of associations do not lead to a probability model; a model is the product of pure thought. This paper discusses these and other fundamentals that are germane to seeking associations in particular, and machine learning in general. ACM Computing Classification System (1998): H.1.2, H.2.4., G.3.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015
Resumo:
Красимир Манев, Нели Манева, Хараламби Хараламбиев - Подходът с използване на бизнес правила (БП) беше въведен в края на миналия век, за да се улесни специфицирането на фирмен софтуер и да може той да задоволи по-добре нуждите на съответния бизнес. Днес повечето от целите на подхода са постигнати. Но усилията, в научно-изследователски и практически аспект, за постигане на „’формална основа за обратно извличане на БП от съществуващи системи “продължават. В статията е представен подход за извличане на БП от програмен код, базиран на методи за статичен анализ на кода. Посочени са някои предимства и недостатъци на такъв подход.
Resumo:
Open Research Data - A step by step guide through the research data lifecycle, data set creation, big data vs long-tail, metadata, data centres/data repositories, open access for data, data sharing, data citation and publication.
Resumo:
This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
Resumo:
The authors analyse some of the research outcomes achieved during the implementation of the EC GUIDE research project “Creating an European Identity Management Architecture for eGovernment”, as well as their personal experience. The project goals and achievements are however considered in a broader context. The key role of Identity in the Information Society was emphasised, that the research and development in this field is in its initial phase. The scope of research related to Identity, including the one related to Identity Management and Interoperability of Identity Management Systems, is expected to be further extended. The authors analyse the abovementioned issues in the context established by the EC European Interoperability Framework (EIF) as a reference document on interoperability for the Interoperable Delivery of European eGovernment Services to Public Administrations, Business and Citizens (IDABC) Work Programme. This programme aims at supporting the pan-European delivery of electronic government services.
Resumo:
Представлено формальное описание многомерной модели данных, реализованной в программном комплексе METAS BI-Platform. В статью включено описание объектов многомерной модели (измерений и множеств измерений и т.д.), их свойств и организации, а также операций, выполняемых над ними. Описаны методы агрегации многомерных данных, позволяющие эффективно агрегировать массивы числовых показателей. Программный комплекс METAS BI-Platform предназначен для многомерного анализа данных, получаемых из гетерогенных источников, и позволяет упростить разработку BI-приложений. Программный комплекс представляет собой многоуровневое приложение с архитектурой «Клиент-сервер». Каждый уровень комплекса соответствует степени абстракции данных. На самом низком уровне расположены драйверы доступа к специфическим физическим источникам данных. Следующий уровень – уровень виртуальной СУБД, позволяющей осуществлять унифицированный доступ к данным, что избавляет от необходимости учитывать специфику конкретных СУБД при разработке BI-приложений. Реализован программный интерфейс комплекса (API). В распоряжение разработчиков предоставляется набор готовых компонентов, которые могут быть использованы при создании BI-приложений. Это позволяет разрабатывать на основе комплекса BI-приложения, отвечающие современным требованиям, предъявляемым к подобным системам.