28 resultados para Naïve Bayes classifier
em Chinese Academy of Sciences Institutional Repositories Grid Portal
Resumo:
准确的网络流量分类是众多网络研究工作的基础,也一直是网络测量领域的研究热点.近年来,利用机器学习方法处理流量分类问题成为了该领域一个新兴的研究方向.在目前研究中应用较多的是朴素贝叶斯(nave Bayes,NB)及其改进算法.这些方法具有实现简单、分类高效的特点.但该方法过分依赖于样本空间的分布,具有内在的不稳定性.因此,提出一种基于支持向量机(support vector machine,SVM)的流量分类方法.该方法利用非线性变换和结构风险最小化(structural risk minimization,SRM)原则将流量分类问题转化为二次寻优问题,具有良好的分类准确率和稳定性.在理论分析的基础上,通过在实际网络流集合上与朴素贝叶斯算法的对比实验,可以看出使用支持向量机方法处理流量分类问题,具有以下3个优势:1)网络流属性不必满足条件独立假设,无须进行属性过滤;2)能够在先验知识相对不足的情况下,仍保持较高的分类准确率;3)不依赖于样本空间的分布,具有较好的分类稳定性.
Resumo:
Raman spectroscopy on single, living epithelial cells captured in a laser trap is shown to have diagnostic power over colorectal cancer. This new single-cell technology comprises three major components: primary culture processing of human tissue samples to produce single-cell suspensions, Raman detection on singly trapped cells, and diagnoses of the cells by artificial neural network classifications. it is compared with DNA flow cytometry for similarities and differences. Its advantages over tissue Raman spectroscopy are also discussed. In the actual construction of a diagnostic model for colorectal cancer, real patient data were taken to generate a training set of 320 Raman spectra and, a test set of 80. By incorporating outlier corrections to a conventional binary neural classifier, our network accomplished significantly better predictions than logistic regressions, with sensitivity improved from 77.5% to 86.3% and specificity improved from 81.3% to 86.3% for the training set and moderate improvements for the test set. Most important, the network approach enables a sensitivity map analysis to quantitate the relevance of each Raman band to the normal-to-cancer transform at the cell level. Our technique has direct clinic applications for diagnosing cancers and basic science potential in the study of cell dynamics of carcinogenesis. (C) 2007 Society of Photo-Optical Instrumentation Engineers.
Resumo:
Raman spectroscopy on single, living epithelial cells captured in a laser trap is shown to have diagnostic power over colorectal cancer. This new single-cell technology comprises three major components: primary culture processing of human tissue samples to produce single-cell suspensions, Raman detection on singly trapped cells, and diagnoses of the cells by artificial neural network classifications. it is compared with DNA flow cytometry for similarities and differences. Its advantages over tissue Raman spectroscopy are also discussed. In the actual construction of a diagnostic model for colorectal cancer, real patient data were taken to generate a training set of 320 Raman spectra and, a test set of 80. By incorporating outlier corrections to a conventional binary neural classifier, our network accomplished significantly better predictions than logistic regressions, with sensitivity improved from 77.5% to 86.3% and specificity improved from 81.3% to 86.3% for the training set and moderate improvements for the test set. Most important, the network approach enables a sensitivity map analysis to quantitate the relevance of each Raman band to the normal-to-cancer transform at the cell level. Our technique has direct clinic applications for diagnosing cancers and basic science potential in the study of cell dynamics of carcinogenesis. (C) 2007 Society of Photo-Optical Instrumentation Engineers.
Resumo:
Amblycipitidae Day, 1873 is an Asian family of catfishes (Siluriformes) usually considered to contain 28 species placed in three genera: Amblyceps (14 spp.), Liobagrus (12 spp.) and Xiurenbagrus (2 spp.). Morphology-based systematics has supported the monophyly of this family, with some authors placing Amblycipitidae within a larger group including Akysidae, Sisoridae and Aspredinidae, termed the Sisoroidea. Here we investigate the phylogenetic relationships among four species of Amblyceps, six species of Liobagrus and the two species of Xiurenbagrus with respect to other sisoroid taxa as well as other catfish groups using 6100 aligned base pairs of DNA sequence data from the rag1 and rag2 genes of the nuclear genome and from three regions (cyt b, COL ND4 plus tRNA-His and tRNA-Ser) of the mitochondrial genome. Parsimony and Bayesian analyses of the data indicate strong support for a diphyletic Amblycipitidae in which the genus Amblyceps is the sister group to the Sisoridae and a clade formed by genera Liobagrus and Xiurenbagrus is the sister group to Akysidae. These taxa together form a well supported monophyletic group that assembles all Asian sisoroid taxa, but excludes the South American Aspredinidae. Results for aspredinids are consistent with previous molecular studies that indicate these catfishes are not sisoroids, but the sister group to the South American doradoid catfishes (Auchenipteridae + Doradidae). The redefined sisoroid clade plus Bagridae, Horabagridae and (Ailia + Laides) make up a larger monophyletic group informally termed "Big Asia." Likelihood-based SH tests and Bayes Factor comparisons of the rag and the mitochondrial data partitions considered separately and combined reject both the hypothesis of amblycipitid monophyly and the hypothesis of aspredinid inclusion within Sisoroidea. This result for amblycipitids conflicts with a number of well documented morphological synapomorphies that we briefly review. Possible nomenclatural changes for amblycipitid taxa are noted.
Resumo:
In this paper, a new classifier of speaker identification has been proposed, which is based on Biomimetic pattern recognition (BPR). Distinguished from traditional speaker recognition methods, such as DWT, HMM, GMM, SVM and so on, the proposed classifier is constructed by some finite sub-space which is reasonable covering of the points in high dimensional space according to distributing characteristic of speech feature points. It has been used in the system of speaker identification. Experiment results show that better effect could be obtained especially with lesser samples. Furthermore, the proposed classifier employs a much simpler modeling structure as compared to the GMM. In addition, the basic idea "cognition" of Biomimetic pattern recognition (BPR) results in no requirement of retraining the old system for enrolling new speakers.
Resumo:
This paper describes the ground target detection, classification and sensor fusion problems in distributed fiber seismic sensor network. Compared with conventional piezoelectric seismic sensor used in UGS, fiber optic sensor has advantages of high sensitivity and resistance to electromagnetic disturbance. We have developed a fiber seismic sensor network for target detection and classification. However, ground target recognition based on seismic sensor is a very challenging problem because of the non-stationary characteristic of seismic signal and complicated real life application environment. To solve these difficulties, we study robust feature extraction and classification algorithms adapted to fiber sensor network. An united multi-feature (UMF) method is used. An adaptive threshold detection algorithm is proposed to minimize the false alarm rate. Three kinds of targets comprise personnel, wheeled vehicle and tracked vehicle are concerned in the system. The classification simulation result shows that the SVM classifier outperforms the GMM and BPNN. The sensor fusion method based on D-S evidence theory is discussed to fully utilize information of fiber sensor array and improve overall performance of the system. A field experiment is organized to test the performance of fiber sensor network and gather real signal of targets for classification testing.
Resumo:
Automatic molecular classification of cancer based on DNA microarray has many advantages over conventional classification based on morphological appearance of the tumor. Using artificial neural networks is a general approach for automatic classification. In this paper, Direction-Basis-Function neuron and Priority-Ordered algorithm are applied to neural networks. And the leukemia gene expression dataset is used as an example to testify the classifier. The result of our method is compared to that of SVM. It shows that our method makes a better performance than SVM.
Resumo:
由于Eu~(2+)离子在不同复合氟化物中存在不同的跃迁发射形式,主要有5d → 4f的宽带跃迁,位于365nm-650nm间和4f → 4f的窄带跃迁,中心位置在360nm附近。Eu~(2+)离子的跃迁形式决定于基质的化学组成。本工作就是用多种模式识别方法(KNN,ALKNN,BAYES,LLM,SIMCA和PCA)研究不同复合氟化物基质中Eu~(2+)离子的跃迁发射形式和基质晶体结构之间的关系,找出Eu~(2+)离子产生f → f跃迁其基质构成的一般规律性。收集了90个复合氟化物(AB_mF_n)作为样本集,根据其中Eu~(2+)离子跃迁形式的不同将它们分成两类,一类为具有f → f跃迁的基质45个;另一类为不具有f → f跃迁的基质45个。随机地选用63个基质作为训练集,其余的为验证集。每个基质样本利用其12个晶体结构参数作为描述。由于各参数间差别不大,对原始数据未进行标度化。特征提取是模式识别分析的一个重要步骤,本工作结合变化权重法,BAYES特征量评价法和SIMCA变量相关性评价法的特点,建立了一个以验评价判据式:d(i) = -5.0 + 2.3V(i) + 0.89f(i) + 7.2W(i)根据经验式,选取了变量Z_B/r_(kB),r_(covA)/r_(covB)和Z_B/r_(covB),并删除了变量Xσ_A,Xσ_B,r_(covA)。其它变量由于其D值接近,利用穷举法对它们进行选取,结果M,Z'_A和r_(covB)被选中。这样把这6个被选的变量作为对跃迁发射问题最相关的变量进行进一步分析。采用被选的6维变量对训练集样本施行主成份分析,结果表示前三个主成份已可解释原数据信息量的99%以上。所以分别以主成份1-3及主成份1和主成份3作了三维和二维的映射图。结果表示两类基质样本基本上分在不同区域。进一步分别用12维和6维变量对样本系进行了其它几种模式识别分析。所有这些方法对训练集的分类效果都比较理想。采取6维特征时,其正确分类率达79.4-96.8%,这说明与跃迁问题相关的大部分变量已被选入。但是结果显示,各种方法对训练集的分类有一定的差别。我们认为这是由于各种不同的方法对数据结构要求不同引起的。实验证明Bayes线性判别方法对该样本集数据的分类效果最佳。根据Bayes线性差别方法的执行得到了对基质样本分类模式,由此模式讨论了各结构参数对Eu~(2+)离子光谱结构的影响,并对七个未知基质中Eu~(2+)离子的光谱结构进行了计算机预报,结果表示KTbF_4,KBF_4,NaIn_2F_7和KLu_2F_7为具有f → f跃迁发射的基质,而NaCaF_3,MgBeF_4和MgAlF_5为不具有f → f跃迁发射的基质。
Resumo:
高效抗逆转录病毒治疗(HAART)的应用,极大的降低了AIDS发病率和死亡 率,延长了HIV感染者的生命。但HIV耐药在很大程度上影响了HAART的疗效, 耐药株的产生成为影响抗病毒治疗效果的主要因素。欧洲、美国的耐药监测技术 规范均推荐在新感染未经抗病毒药物治疗的患者中进行原发耐药检测。我国政府 于2003年底出台了艾滋病治疗的“四免一关怀”政策,陆续在全国范围内开展了大 规模的免费抗病毒治疗,监测我国未经抗病毒药物治疗HIV-1感染者中的耐药情 况可以为制定合理的用药方案和减少耐药毒株出现提供科学依据。 根据世界卫生组织(WHO)的“HIV 耐药监测指南”,无偿献血者中的HIV-1 感染者,可以认定为HIV 新诊断未治疗人群。分析了云南无偿献血者的血浆和 外周血单核细胞(PBMC),研究云南无偿献血人群的耐药状况。 已有实验室血清学方法识别HIV-1 新近感染和长期感染,用BED-CEIA 方 法,在河南、安徽、山西自愿咨询检测(VCT)人群中检出新近感染人群,进行耐 药基因研究, 对照研究了部分长期感染人群。 样品提取核酸后,巢式聚合酶链反应(nested-PCR)扩增pol 基因区(含蛋白酶 区1~99 氨基酸全长和逆转录酶区1~242 氨基酸)。PCR 产物双脱氧法测序,所 得序列与洛斯阿拉莫斯HIV 核酸序列库(Los Alamos HIV Database)标准株构建系 统进化树分析亚型;用斯坦福大学耐药数据库(Standford HIV Drug Resistance Database)分析耐药。 研究发现,云南省2005~2006 年无偿献血者中,有52 例为HIV-1 阳性,其 中49 例血浆和相应的PBMC 样品病毒基因扩增成功。序列分析表明,HIV 病毒 的亚型分布为CRF08_BC (51.0%), CRF07_BC (24.5%), CRF01_AE (20.4%)和B (4.1%);所有样品均未发现蛋白酶抑制剂(PI)耐药基因位点主要突变,只在6 例(11.7%)样品中发现7 例次PI 次要耐药位点突变;另外,在9 例(18.4%)样品中发现10 例次核苷类逆转录酶抑制剂(NRTI) 耐药突变,1 例(2.0%)发生非核苷类 逆转录酶抑制剂(NNRTI) 耐药突变;针对具体药物PI/NRTI/NNRTI 均只有1 例 有潜在的低度耐药,临床仍对药物敏感。PBMC 和血浆的病毒耐药没有显著差异。 从河南、安徽、山西27 个VCT 检测点2006~2007 年采集的10310 例样品 中,通过WB 和BED-CEIA 检测出新近感染人群63 例,分析成功50 例血浆样 品;河南VCT 长期感染样品中随机抽样,分析成功19 例样品。分析成功的69 例VCT 样品中,HIV 病毒株的亚型分布分别为B’ (95.7%),CRF01_AE(2.9%)和 C(1.4%)。上诉样品均未检出PI 主要耐药相关突变,只在26 例(37.7%)样品中存 在27 例次PI 次要耐药相关突变;3 例(4.3%)样品出现6 例次NRTI 耐药相关突 变,7 例(10.1%)样品出现8 例次NNRTI 耐药相关突变。通过与斯坦福大学耐药 数据库比对,没有发现针对PI 类药物的临床耐药;但有2 例(2.8%)针对NRTI 类 药物耐药,1 例有M184V 突变导致对拉米夫定(3TC)和氟代拉米夫定(FTC)高度 耐药;1 例样品存在T215Y、M41L、L210W 三重突变位点,对阿巴卡韦(ABC)、 去羟肌苷(ddI)和坦那夫韦(TDF)中度耐药,对齐多夫定(AZT)和司他夫定(d4T)高 度耐药;针对NNRTI 类药物,有3 例(4.3%)毒株有耐药,1 例有K103N 突变导 致对奈韦拉平(NVP)、地拉韦啶(DLV)和依菲韦伦(EFV)的高度耐药;1 例有Y188L 突变导致对NVP 和EFV 的高度耐药;1 例存在K101E 和G190A 双重突变,导 致对NVP 的高度耐药,对DLV、EFV 和依曲韦林(ETR)中度耐药。 比较长期感染和新近感染者之间的亚型和耐药,未发现显著差异。 研究结果表明,云南、河南和安徽未经治疗HIV-1 感染者中耐药处于低流行 状态。亚型分布云南无偿献血者以CRF_BC 为主,河南、安徽VCT 人群以B’ 为主。应持续在未经治疗人群中进行耐药监测。
Resumo:
In this paper, the molecular connectivity indices and the electronic charge parameters of forty-eight phenol compounds nave been calculated. and applied for studying the relationship between partition coefficients and structure of phenol compounds. The results demonstrate that the properties of compounds can be described better with selective parameters, and the results obtained by neural network are superior to that by multiplle regression.