868 resultados para Data mining
Resumo:
研究宏观网络安全数据挖掘系统的目的是保护大型网络中关键网络基础设施的可用性、机密性和完整性。为此,首先提出了一种宏观网络数据挖掘的系统框架;然后分析了宏观网络挖掘子系统和态势分析子系统;最后利用网格计算技术实现了该平台,并给出了其运行环境。该系统具有可扩展性,能有效进行宏观网络的数据挖掘和实时势态感知.
Resumo:
市场细分导致的产品多样化使预测与计划更加困难,而相似性产品的确定可以改善库存结构,促进销售的增长。文章在论述数据库设计和基础数据准备的基础上,提出了三种方法用于确定相似性产品,并给出了实例,最后作了比较分析。
Resumo:
A new model of pattern recognition principles-Biomimetic Pattern Recognition, which is based on "matter cognition" instead of "matter classification", has been proposed. As a important means realizing Biomimetic Pattern Recognition, the mathematical model and analyzing method of ANN get breakthrough: a novel all-purpose mathematical model has been advanced, which can simulate all kinds of neuron architecture, including RBF and BP models. As the same time this model has been realized using hardware; the high-dimension space geometry method, a new means to analyzing ANN, has been researched.
Resumo:
基于状态的维护(CBM, Condition Based Maintenance)是近年来新兴的一种设 备维护策略,它的基本理念是在机械设备需要维护的时候才对其进行维护,强调 维护要及时、准确和经济。采用这种维护策略,能够提高工业生产的安全性和可 靠性,系统地降低企业运营成本。 机械设备状态预诊断是实现CBM 的核心支撑技术,对其进行深入研究,对推 动CBM 的发展具有重要意义。但是,由于相关研究起步不久,目前预诊断技术还 未能得到很好的实现,研究人员有必要不断地尝试各种新的有效方法来更好地解 决这一问题,加快其实现方法与技术应用的成熟进程。基于此,本文从数据挖掘 的角度,探索了机械设备预诊断新的解决方法和途径,深入研究和探讨了基于时 间序列数据挖掘的旋转机械预诊断方法。本文的主要工作包括: 1. 结合CBM 的基本理念和应用实际的需求,对机械设备状态预诊断的基本 内涵进行了系统分析。将状态评估、故障预测和剩余有效使用寿命预测三个预诊 断基本功能进一步抽象,提出了包含特征提取、状态预测和模式匹配三个子问题 的预诊断一般流程模式。在详细分析机械设备状态预诊断理论方法和应用技术研 究现状的基础上,提出了预诊断技术研究的发展趋势及各子问题的研究侧重点。 并对利用时间序列数据挖掘这一理论方法解决机械设备状态预诊断问题的可行性 进行了分析。 2. 针对具有波动频繁、噪声干扰严重等特点的原始振动量时间序列无法直接 用于旋转机械性能状态分析的问题,结合全息诊断信息融合分析旋转机械振动全 貌的思想,提出了全息状态矩阵的概念并给出定义,用类时间轴上的多维序列表 征转子系统振动全貌,以实现振动量时间序列的高级表示,为后续预测与匹配分 类工作提供良好的数据源,同时增强全息诊断的信息检索和知识自动获取的能力。 3. 将旋转机械性能状态预测,归结为旋转机械设备维护应用背景下的一维数 值型时间序列预测问题来进行深入研究。针对现有预测方法长期预测能力较弱, 且自动化水平低的不足,提出了用于旋转机械性能状态预测的ARIMA 动态间隔预 测法。该方法以动态间隔获取时间序列样本建模并预测的策略,提高了ARIMA 模 型用于设备状态长期预测的准确性,并且能够实现建模与预测的自动化,满足CBM 系统的实时性要求。 4. 针对全息状态矩阵表示的旋转机械性能状态特征数据,提出了一种全息状 态矩阵相似性匹配方法。结合旋转机械预诊断领域应用的特点定义了全息状态矩 阵的相似性度量模型,基于全息状态矩阵近似距离三角不等式设计了剪枝搜索策 略,并在此基础上设计了全息状态矩阵相似性高效准确匹配算法,不需要借助专家经验和人工识别确认,在一定阈值范围内能够实现高质量的旋转机械性能状态 相似性匹配。 5. 旋转机械基本振动量特征时间序列具有海量、超高维度、短期波动频繁和 大量噪声等特征,与时间序列数据挖掘传统应用的金融商业领域数据不同,直接 采用传统方法会存在搜索速度大幅度降低的问题。针对这一问题,提出了基于随 机投影的时间序列相似性搜索方法。该方法利用近年来新兴的随机投影统计学降 维法,将原始时间序列集映射到低维空间,并利用R*树进行索引,能够在保持高 准确率的同时,实现旋转机械基本振动量特征时间序列相似性快速搜索。 6. 针对现有机械设备性能状态分类方法不考虑误分类代价的问题,提出了一 种代价敏感直推式旋转机械设备性能状态分类法。该方法将代价敏感分类和直推 式学习的基本思想和理论相结合,采用一种代价敏感的直推式分类机制,实现了 机械设备性能状态的代价敏感分类。该方法在保证较高分类准确率的基础上,明 显地降低了误分类总代价。 7. 基于CBM 的基本理念,设计了旋转机械CBM 系统的基本结构,并以本 文理论方法的研究成果为核心,详细设计了各模块的基本功能和处理逻辑,采用 VC#.net 与Matlab 混合编程的方式开发了一个面向大型旋转机械的CBM 系统原 型,以验证本文机械设备预诊断方法研究成果的可操作性和实用性,为CBM 系统 应用技术研究做出了有益的探索。
Resumo:
Web数据挖掘是将数据挖掘技术和理论应用于对WWW资源进行挖掘的一个新兴的研究领域,本文介绍了Web数据挖掘的基本概念,分类,并给出 Web数据挖掘的基本原理,基本方法,最后指出 Web数据挖掘的用途,展望了其美好的发展前景。
Resumo:
对于一个企业来说,质量是产品和服务的生命。质量受企业生产经营管理活动中多种因素的影响,是企业各项工作的综合反映。目前企业产品质量指标的检测大多是在产品生产出来后才进行的,检测需要成本,有时还需要进行破坏性试验,如测量产品的抗拉强度,要做拉断检验。这样滞后的质量数据对生产过程的实时质量控制帮助不大,而且当发现产品质量不合格时,损失已无法挽回,这样极大地影响了企业的生产质量和效益。贯彻预防原则是现代质量管理的核心与精髓,要保证和提高产品质量,必须把影响质量的各个指标全面系统地管理起来。因而,如何将这些生产过程参数与产品质量特性关联起来成为企业生产故障预测及诊断的瓶颈问题。 本文首先从功能结构组织,数据库逻辑结构设计,类关系等多个方面描述了制造执行系统(MES)平台统计过程控制(SPC)子系统开发的相关工作。并针对所开发的SPC子系统在异常原因识别方面的不足,将数据挖掘与统计过程控制(SPC)技术结合起来,对变速箱总装线质量信息进行统计分析和深度挖掘,提出一种生产过程在线质量控制和诊断模型。该模型运用SPC对生产异常状态进行监测,并基于数据挖掘技术对大量过程检测数据进行分析,找到最有可能出现问题的工序和加工设备,将控制图异常状态与生产过程参数关联起来,实现异常状态的实时检测与诊断。研究表明,数据挖掘的理论和方法适合于质量控制领域,可以为产品质量控制提供一种新的途径。
Resumo:
On the issue of geological hazard evaluation(GHE), taking remote sensing and GIS systems as experimental environment, assisting with some programming development, this thesis combines multi-knowledges of geo-hazard mechanism, statistic learning, remote sensing (RS), high-spectral recognition, spatial analysis, digital photogrammetry as well as mineralogy, and selects geo-hazard samples from Hong Kong and Three Parallel River region as experimental data, to study two kinds of core questions of GHE, geo-hazard information acquiring and evaluation model. In the aspect of landslide information acquiring by RS, three detailed topics are presented, image enhance for visual interpretation, automatic recognition of landslide as well as quantitative mineral mapping. As to the evaluation model, the latest and powerful data mining method, support vector machine (SVM), is introduced to GHE field, and a serious of comparing experiments are carried out to verify its feasibility and efficiency. Furthermore, this paper proposes a method to forecast the distribution of landslides if rainfall in future is known baseing on historical rainfall and corresponding landslide susceptibility map. The details are as following: (a) Remote sensing image enhancing methods for geo-hazard visual interpretation. The effect of visual interpretation is determined by RS data and image enhancing method, for which the most effective and regular technique is image merge between high-spatial image and multi-spectral image, but there are few researches concerning the merging methods of geo-hazard recognition. By the comparing experimental of six mainstream merging methods and combination of different remote sensing data source, this thesis presents merits of each method ,and qualitatively analyzes the effect of spatial resolution, spectral resolution and time phase on merging image. (b) Automatic recognition of shallow landslide by RS image. The inventory of landslide is the base of landslide forecast and landslide study. If persistent collecting of landslide events, updating the geo-hazard inventory in time, and promoting prediction model incessantly, the accuracy of forecast would be boosted step by step. RS technique is a feasible method to obtain landslide information, which is determined by the feature of geo-hazard distribution. An automatic hierarchical approach is proposed to identify shallow landslides in vegetable region by the combination of multi-spectral RS imagery and DEM derivatives, and the experiment is also drilled to inspect its efficiency. (c) Hazard-causing factors obtaining. Accurate environmental factors are the key to analyze and predict the risk of regional geological hazard. As to predict huge debris flow, the main challenge is still to determine the startup material and its volume in debris flow source region. Exerting the merits of various RS technique, this thesis presents the methods to obtain two important hazard-causing factors, DEM and alteration mineral, and through spatial analysis, finds the relationship between hydrothermal clay alteration minerals and geo-hazards in the arid-hot valleys of Three Parallel Rivers region. (d) Applying support vector machine (SVM) to landslide susceptibility mapping. Introduce the latest and powerful statistical learning theory, SVM, to RGHE. SVM that proved an efficient statistic learning method can deal with two-class and one-class samples, with feature avoiding produce ‘pseudo’ samples. 55 years historical samples in a natural terrain of Hong Kong are used to assess this method, whose susceptibility maps obtained by one-class SVM and two-class SVM are compared to that obtained by logistic regression method. It can conclude that two-class SVM possesses better prediction efficiency than logistic regression and one-class SVM. However, one-class SVM, only requires failed cases, has an advantage over the other two methods as only "failed" case information is usually available in landslide susceptibility mapping. (e) Predicting the distribution of rainfall-induced landslides by time-series analysis. Rainfall is the most dominating factor to bring in landslides. More than 90% losing and casualty by landslides is introduced by rainfall, so predicting landslide sites under certain rainfall is an important geological evaluating issue. With full considering the contribution of stable factors (landslide susceptibility map) and dynamic factors (rainfall), the time-series linear regression analysis between rainfall and landslide risk mapis presented, and experiments based on true samples prove that this method is perfect in natural region of Hong Kong. The following 4 practicable or original findings are obtained: 1) The RS ways to enhance geo-hazards image, automatic recognize shallow landslides, obtain DEM and mineral are studied, and the detailed operating steps are given through examples. The conclusion is practical strongly. 2) The explorative researching about relationship between geo-hazards and alteration mineral in arid-hot valley of Jinshajiang river is presented. Based on standard USGS mineral spectrum, the distribution of hydrothermal alteration mineral is mapped by SAM method. Through statistic analysis between debris flows and hazard-causing factors, the strong correlation between debris flows and clay minerals is found and validated. 3) Applying SVM theory (especially one-class SVM theory) to the landslide susceptibility mapping and system evaluation for its performance is also carried out, which proves that advantages of SVM in this field. 4) Establishing time-serial prediction method for rainfall induced landslide distribution. In a natural study area, the distribution of landslides induced by a storm is predicted successfully under a real maximum 24h rainfall based on the regression between 4 historical storms and corresponding landslides.
Resumo:
Population research is a front area concerned by domestic and overseas, especially its researches on its spatial visualization and its geo-visualization system design, which provides a sound base for understanding and analysis of the regional difference in population distribution and its spatial rules. With the development of GIS, the theory of geo-visualization more and more plays an important role in many research fields, especially in population information visualization, and has been made the big achievements recently. Nevertheless, the current research is less attention paid to the system design for statistical-geo visualization for population information. This paper tries to explore the design theories and methodologies for statistical-geo-visualization system for population information. The researches are mainly focused on the framework, the methodologies and techniques for the system design and construction. The purpose of the research is developed a platform for population atlas by the integration of the former owned copy software of the research group in statistical mapping system. As a modern tool, the system will provide a spatial visual environment for user to analyze the characteristics of population distribution and differentiate the interrelations of the population components. Firstly, the paper discusses the essentiality of geo-visualization for population information and brings forward the key issue in statistical-geo visualization system design based on the analysis of inland and international trends. Secondly, the geo-visualization system for population design, including its structure, functionality, module, user interface design, is studied based on the concepts of theory and technology of geo-visualization. The system design is proposed and further divided into three parts: support layer, technical layer, user layer. The support layer is a basic operation module and main part of the system. The technical layer is a core part of the system, supported by database and function modules. The database module mainly include the integrated population database (comprises spatial data, attribute data and geographical features information), the cartographic symbol library, the color library, the statistical analysis model. The function module of the system consists of thematic map maker component, statistical graph maker component, database management component and statistical analysis component. The user layer is an integrated platform, which provides the functions to design and implement a visual interface for user to query, analysis and management the statistic data and the electronic map. Based on the above, China's E-atlas for population was designed and developed by the integration of the national fifth census data with 1:400 million scaled spatial data. The atlas illustrates the actual development level of the population nowadays in China by about 200 thematic maps relating with 10 map categories(environment, population distribution, sex and age, immigration, nation, family and marriage, birth, education, employment, house). As a scientific reference tool, China's E-atlas for population has already received the high evaluation after published in early 2005. Finally, the paper makes the deep analysis of the sex ratio in China, to show how to use the functions of the system to analyze the specific population problem and how to make the data mining. The analysis results showed that: 1. The sex ratio has been increased in many regions after fourth census in 1990 except the cities in the east region, and the high sex ratio is highly located in hilly and low mountain areas where with the high illiteracy rate and the high poor rate; 2. The statistical-geo visualization system is a powerful tool to handle population information, which can be used to reflect the regional differences and the regional variations of population in China and indicate the interrelations of the population with other environment factors. Although the author tries to bring up a integrate design frame of the statistical-geo visualization system, there are still many problems needed to be resolved with the development of geo-visualization studies.
Resumo:
Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.
Resumo:
O Sistema de Indução C4.5. Requerimentos-chave para a utilização do software. Um exemplo ilustrativo. Algumas dicas de uso.
Resumo:
Clare, A. and King R.D. (2003) Predicting gene function in Saccharomyces cerevisiae. 2nd European Conference on Computational Biology (ECCB '03). (published as a journal supplement in Bioinformatics 19: ii42-ii49)