Enhanced whole genome sequence and annotation of Clostridium stercorarium DSM8532T using RNA-seq transcriptomics and high-throughput proteomics

dc.contributor.authorSchellenberg, John J
dc.contributor.authorVerbeke, Tobin J
dc.contributor.authorMcQueen, Peter
dc.contributor.authorKrokhin, Oleg V
dc.contributor.authorZhang, Xiangli
dc.contributor.authorAlvare, Graham
dc.contributor.authorFristensky, Brian
dc.contributor.authorThallinger, Gerhard G
dc.contributor.authorHenrissat, Bernard
dc.contributor.authorWilkins, John A
dc.contributor.authorLevin, David B
dc.contributor.authorSparling, Richard
dc.date.accessioned2014-07-15T11:08:16Z
dc.date.available2014-07-15T11:08:16Z
dc.date.issued2014-07-07
dc.date.updated2014-07-15T11:08:17Z
dc.description.abstractAbstract Background Growing interest in cellulolytic clostridia with potential for consolidated biofuels production is mitigated by low conversion of raw substrates to desired end products. Strategies to improve conversion are likely to benefit from emerging techniques to define molecular systems biology of these organisms. Clostridium stercorarium DSM8532T is an anaerobic thermophile with demonstrated high ethanol production on cellulose and hemicellulose. Although several lignocellulolytic enzymes in this organism have been well-characterized, details concerning carbohydrate transporters and central metabolism have not been described. Therefore, the goal of this study is to define an improved whole genome sequence (WGS) for this organism using in-depth molecular profiling by RNA-seq transcriptomics and tandem mass spectrometry-based proteomics. Results A paired-end Roche/454 WGS assembly was closed through application of an in silico algorithm designed to resolve repetitive sequence regions, resulting in a circular replicon with one gap and a region of 2 kilobases with 10 ambiguous bases. RNA-seq transcriptomics resulted in nearly complete coverage of the genome, identifying errors in homopolymer length attributable to 454 sequencing. Peptide sequences resulting from high-throughput tandem mass spectrometry of trypsin-digested protein extracts were mapped to 1,755 annotated proteins (68% of all protein-coding regions). Proteogenomic analysis confirmed the quality of annotation and improvement pipelines, identifying a missing gene and an alternative reading frame. Peptide coverage of genes hypothetically involved in substrate hydrolysis, transport and utilization confirmed multiple pathways for glycolysis, pyruvate conversion and recycling of intermediates. No sequences homologous to transaldolase, a central enzyme in the pentose phosphate pathway, were observed by any method, despite demonstrated growth of this organism on xylose and xylan hemicellulose. Conclusions Complementary omics techniques confirm the quality of genome sequence assembly, annotation and error-reporting. Nearly complete genome coverage by RNA-seq likely indicates background DNA in RNA extracts, however these preps resulted in WGS enhancement and transcriptome profiling in a single Illumina run. No detection of transaldolase by any method despite xylose utilization by this organism indicates an alternative pathway for sedoheptulose-7-phosphate degradation. This report combines next-generation omics techniques to elucidate previously undefined features of substrate transport and central metabolism for this organism and its potential for consolidated biofuels production from lignocellulose.
dc.description.versionPeer Reviewed
dc.identifier.citationBMC Genomics. 2014 Jul 07;15(1):567
dc.identifier.doihttp://dx.doi.org/10.1186/1471-2164-15-567
dc.identifier.urihttp://hdl.handle.net/1993/23691
dc.language.rfc3066en
dc.rightsopen accessen_US
dc.rights.holderJohn J Schellenberg et al.; licensee BioMed Central Ltd.
dc.titleEnhanced whole genome sequence and annotation of Clostridium stercorarium DSM8532T using RNA-seq transcriptomics and high-throughput proteomics
dc.typeJournal Article
Files
Original bundle
Now showing 1 - 5 of 6
Loading...
Thumbnail Image
Name:
1471-2164-15-567.xml
Size:
141.33 KB
Format:
Extensible Markup Language
Description:
Loading...
Thumbnail Image
Name:
1471-2164-15-567.pdf
Size:
3.2 MB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
1471-2164-15-567-S1.docx
Size:
64.24 KB
Format:
Unknown data format
Description:
Loading...
Thumbnail Image
Name:
1471-2164-15-567-S2.txt
Size:
172.28 KB
Format:
Plain Text
Description:
Loading...
Thumbnail Image
Name:
1471-2164-15-567-S3.txt
Size:
7.64 KB
Format:
Plain Text
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.17 KB
Format:
Item-specific license agreed to upon submission
Description: