张媛媛1, 殷倩倩2, 杜鹏程3, 丁奕博2, 韩俊燕3, 林倩茹2, 马丽英2,()   
  1. 1. 102206 北京,传染病预防控制国家重点实验室,中国疾病预防控制中心性病艾滋病预防控制中心病免室;100015 北京,新发突发传染病研究北京市重点实验室,首都医科大学附属北京地坛医院传染病研究所
    2. 102206 北京,传染病预防控制国家重点实验室,中国疾病预防控制中心性病艾滋病预防控制中心病免室
    3. 100015 北京,新发突发传染病研究北京市重点实验室,首都医科大学附属北京地坛医院传染病研究所
  • 收稿日期:2020-02-18 出版日期:2021-02-15
  • 通信作者: 马丽英
  • 基金资助:
    "十三五"国家科技重大专项项目(No. 2018ZX10101002); 国家自然科学基金(No. 81871694)

Evaluation on the accuracy of consensus sequence of Next-generation sequencing of blood DNA for human immunodeficiency virus dominant quasispecies

Yuanyuan Zhang1, Qianqian Yin2, Pengcheng Du3, Yibo Ding2, Junyan Han3, Qianru Lin2, Liying Ma2,()   

  1. 1. National Center for AIDS/STD Control and Prevention, Chinese Center for Diseases Control and Prevention, Beijing 102206, China; Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
    2. National Center for AIDS/STD Control and Prevention, Chinese Center for Diseases Control and Prevention, Beijing 102206, China
    3. Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University, Beijing 100015, China
  • Received:2020-02-18 Published:2021-02-15
  • Corresponding author: Liying Ma

共收集29个HIV感染病例的32份全血样本,其中3个病例在随访期间发生耐药,分别采集治疗前和耐药后的两份样本。提取样本DNA,分别用二代测序和一代测序方法测定HIV pol区扩增产物序列。利用软件Sequencher(4.10.1)和Bowtie 2(v2.2.5)处理一代和二代测序数据,并利用自建分析脚本确定二代测序共有序列,利用Mega软件进行聚类分析,比较二代测序共有序列与一代测序序列的差异。






To assess the representativeness and accuracy of consensus sequence generated from Next-generation sequencing (NGS) for human immunodeficiency virus (HIV) quasispecies.


Total of 32 blood samples from 29 patients with HIV infection were collected. Among them, 3 blood samples were collected from 3 patients when virologic failure occurred. HIV DNA pol gene was amplified and sequenced by Sanger sequencing and NGS sequencing, respectively. The sequencing data of Sanger and NGS were processed by Sequencher (v4.10.1) and Bowtie2 (v2.2.5) separately. Furtherly, the consensus sequence from NGS was defined by the homemade PERL scripts. Finally, cluster analysis was processed by Mega software and the different sites between the consensus sequence of NGS and Sanger sequencing were identified and analyzed, respectively.


Cluster analysis showed that 90.6% (29/32) sequences derived from the same sample were clustered in the same branch, the average credibility was 95.5%. The accuracy of each nucleotide site in consensus sequences was 99.6% compared to Sanger sequencing. Furthermore, the 81.6% (155/190) nucleotide type of inconsistent sites was transition, and the transition between Guanine (G) and Adenine (A) was the most common with the occurrence ratio of 39% (74/190).


The consensus sequence of NGS sequencing could be used in the study of HIV dominant quasispecies, which could present the high accurate results with Sanger sequencing.

表1 一代测序和深度测序PCR扩增特异性引物
图1 样本Sanger序列与中国主要亚型参考序列聚类图
图2 Sanger测序序列和NGS共有序列进化树
表2 NGS共有序列与Sanger序列不一致碱基类型和数量
表3 NGS共有序列与Sanger序列完全不一致碱基类型和数量
