解决方案如下:

  1. Download Genbank from NCBI;
  2. Convert Genbank to GFF3 with BioPerl;
  3. Modify the GFF3 file;
  4. Download Fasta from NCBI or use Genbank conversion;
  5. Use the modified GFF3 file in rsem-prepare-reference;

主要代码如下:
bp_genbank2gff3 --GFF_VERSION 3 file.gb awk '$3!~/pseudo/' file.gb.gff > file.rsem.gff # or more restrictly with only protein coding genes awk '$3~/^(gene|mRNA|exon|polypeptide)$/' file.gb.gff > file.cds.gff # remove the version number in fasta accession if no transcript is extract rsem-prepare-reference file.fasta --gff3 file.rsem.gff reference_name

这里主要解决了以下错误:
Line x does not have attribute Parent

由于rsem遵循gene->transcript->exon的结构,并利用exon来作为RNA的最后形式(经过剪切和后处理的RNA),要求exon都有(且仅有)一个Parent。而GFF3文件中pseudogene和pseudogenic_exon是没有Parent的。

以下为rsem-gff3-to-gtf line 27-32。细菌基因组的GFF3文件在gene->transcript->exon中的映射关系如下:gene = type_gene, mRNA(tRNA, etc) = type_transcript, CDS = exon.

type_gene = ["gene", "snRNA_gene", "transposable_element_gene", "ncRNA_gene", "telomerase_RNA_gene", "rRNA_gene", "tRNA_gene", "snoRNA_gene", "mt_gene", "miRNA_gene", "lincRNA_gene", "RNA", "VD_gene_segment"] type_transcript = ["transcript", "primary_transcript", "mRNA", "ncRNA", "tRNA", "rRNA", "snRNA", "snoRNA", "miRNA", "pseudogenic_transcript", "lincRNA", "NMD_transcript_variant", "aberrant_processed_transcript", "nc_primary_transcript", "processed_pseudogene", "mRNA_TE_gene"] type_exon = ["exon", "CDS", "five_prime_UTR", "three_prime_UTR", "UTR", "noncoding_exon", "pseudogenic_exon"]

作者简介

Chun-Hui Gao is a Research Associate at Huazhong Agricultural University.

重复使用

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT. The full source is available at https://github.com/yihui/hugo-prose.

欢迎修订

如果您发现本文里含有任何错误(包括错别字和标点符号),欢迎在本站的 GitHub 项目里提交修订意见。

引用本文

如果您使用了本文的内容,请按照以下方式引用:

gaoch (2017). RSEM: rsem-prepare-reference with NCBI GenBank/GFF3 File. BIO-SPRING. /post/2017/06/01/2017-06-01-rsem-rsem-prepare-reference-with-ncbi-genbankgff3-file/

BibTeX citation

@misc{
  title = "RSEM: rsem-prepare-reference with NCBI GenBank/GFF3 File",
  author = "gaoch",
  year = "2017",
  journal = "BIO-SPRING",
  note = "/post/2017/06/01/2017-06-01-rsem-rsem-prepare-reference-with-ncbi-genbankgff3-file/"
}