6 resultados para Impala, Hadoop, Big Data, HDFS, Social Business Intelligence, SBI, cloudera

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data on social organization of two bands of black-and-white snub-nosed monkeys (Rhinopithecus bieti) 14 were collected when the monkeys were crossing an open spot at Nanren and Bamei (northwest of Yunnan, China) using a sampling rule where individuals wit

Relevância:

100.00% 100.00%

Publicador:

Resumo:

在介绍了数据分析代理的概念后,提出了数据分析代理模式的体系结构,讨论了在不同类型企业中数据分析代理的具体应用模式企业内代理模式和企业外代理模式,对比分析了数据分析传统模式和代理模式二者之间特点,最后举例说明了数据分析代理模式在企业中的具体实践。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

随着国内金融行业的逐步开放,中国银联也面临着跨国银行卡组织的激烈竞争,跨国银行卡组织拥有先进的IT技术和经营管理经验,和中国银联相比,具有很大竞争优势。银联为了积极面对跨国银行卡组织的挑战,成为一个知名品牌,必须加快信息化建设,转变经营观念,从传统的以业务为中心转移到以客户为中心,而商业智能技术——数据仓库和数据挖掘正是银联信息化建设的重要保障。 本文首先分析了银联的实际业务需求,结合银联的具体业务特点,设计与实现了银联的数据仓库系统,着重对数据仓库技术在银联中的应用现状进行了详细表述;该系统采用总线式的设计架构,有很好的一致性和可扩展性;系统采用维度建模方法进行数据仓库的逻辑设计,维度建模方法能很好地提高系统查询性能,在逻辑设计基础上本文也进行了数据仓库的物理设计。同时本文也详细介绍了数据仓库的重点部分——ETL系统的设计和实现,该ETL系统采用模块化的设计方法,采用元数据作为驱动方式,加强了调度管理和监控的功能,使该ETL工具更具智能性和更好的适应性。 本文在完成银联数据仓库系统建设的基础上,详细分析了银联业务系统要实现的OLAP分析目标,介绍了数据挖掘技术在银联客户分类中的应用,首次尝试在银联数据仓库系统中构建客户分类模型。在客户分类模型的构建中,我们首先采用聚类技术进行客户群分类,然后使用改进的SLIQ算法构建分类模型,本文对SLIQ算法中的符号型属性处理方法及其剪枝算法进行改进,并对结果进行对比分析,得到了一个较为合理的客户分类模型,取得了很好的应用效果,从而为银联数据仓库系统开发应用提供了可借鉴的操作思路。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Population data which collected and saved according to administrative region is a kind of statistical data. As a traditional method of spatial data expression, average distribution in every administrative region brings population data on a low spatial and temporal precision. Now, an accurate population data with high spatial resolution is becoming more and more important in regional planning, environment protection, policy making and rural-urban development. Spatial distribution of population data is becoming more important in GIS study area. In this article, the author reviewed the progress of research on spatial distribution of population. Under the support of GIS, correlative geographical theories and Grid data model, Remote Sensing data, terrain data, traffic data, river data, resident data, and social economic statistic were applied to calculate the spatial distribution of population in Fujian province, which includes following parts: (1) Simulating of boundary at township level. Based on access cost index, land use data, traffic data, river data, DEM, and correlative social economic statistic data, the access cost surface in study area was constructed. Supported by the lowest cost path query and weighted Voronoi diagram, DVT model (Demarcation of Villages and Towns) was established to simulate the boundary at township level in Fujian province. (2) Modeling of population spatial distribution. Based on the knowledge in geography, seven impact factors, such as land use, altitude, slope, residential area, railway, road, and river were chosen as the parameters in this study. Under the support of GIS, the relations of population distribution to these impact factors were analyzed quantificationally, and the coefficients of population density on pixel scale were calculated. Last, the model of population spatial distribution at township level was established through multiplicative fusion of population density coefficients and simulated boundary of towns. (3) Error test and analysis of population spatial distribution base on modeling. The author not only analyzed the numerical character of modeling error, but also its spatial distribution. The reasons of error were discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Based on social survey data conducted by local research group in some counties executed in the nearly past five years in China, the author proposed and solved two kernel problems in the field of social situation forecasting: i) How can the attitudes’ data on individual level be integrated with social situation data on macrolevel; ii) How can the powers of forecasting models’ constructed by different statistic methods be compared? Five integrative statistics were applied to the research: 1) algorithm average (MEAN); 2) standard deviation (SD); 3) coefficient variability (CV); 4) mixed secondary moment (M2); 5) Tendency (TD). To solve the former problem, the five statistics were taken to synthesize the individual and mocrolevel data of social situations on the levels of counties’ regions, and form novel integrative datasets, from the basis of which, the latter problem was accomplished by the author: modeling methods such as Multiple Regression Analysis (MRA), Discriminant Analysis (DA) and Support Vector Machine (SVM) were used to construct several forecasting models. Meanwhile, on the dimensions of stepwise vs. enter, short-term vs. long-term forecasting and different integrative (statistic) models, meta-analysis and power analysis were taken to compare the predicting power of each model within and among modeling methods. Finally, it can be concluded from the research of the dissertation: 1) Exactly significant difference exists among different integrative (statistic) models, in which, tendency (TD) integrative models have the highest power, but coefficient variability (CV) ones have the lowest; 2) There is no significant difference of the power between stepwise and enter models as well as short-term and long-term forecasting models; 3) There is significant difference among models constructed by different methods, of which, support vector machine (SVM) has the highest statistic power. This research founded basis in all facets for exploring the optimal forecasting models of social situation’s more deeply, further more, it is the first time methods of meta-analysis and power analysis were immersed into the assessments of such forecasting models.