Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2

<p>Abstract</p> <p>Background</p> <p>The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs,...

Full description

Bibliographic Details
Main Authors: Dutcher Susan K, Kulp David C, Li Linya, Kwan Alan L, Stormo Gary D
Format: Article
Language:English
Published: BMC 2009-05-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/10/210
id doaj-97bdb753ed564045a88366e325f19682
record_format Article
spelling doaj-97bdb753ed564045a88366e325f196822020-11-24T21:15:34ZengBMCBMC Genomics1471-21642009-05-0110121010.1186/1471-2164-10-210Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2Dutcher Susan KKulp David CLi LinyaKwan Alan LStormo Gary D<p>Abstract</p> <p>Background</p> <p>The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs, homology searches and <it>ab initio </it>gene predictions. Previously, the Genie gene-finding algorithm was trained on a small set of <it>Chlamydomonas </it>genes and shown to improve the accuracy of gene prediction in this species compared to other available programs. To improve <it>ab initio </it>gene finding in <it>Chlamydomonas</it>, we assemble a new training set consisting of over 2,300 cDNAs by assembling over 167,000 <it>Chlamydomonas </it>EST entries in GenBank using the EST assembly tool PASA.</p> <p>Results</p> <p>The prediction accuracy of our cDNA-trained gene-finder, GreenGenie2, attains 83% sensitivity and 83% specificity for exons on short-sequence predictions. We predict about 12,000 genes in the version <it>v3 Chlamydomonas </it>genome assembly, most of which (78%) are either identical to or significantly overlap the published catalog of <it>Chlamydomonas </it>genes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. 22% of the published catalog is absent from the GreenGenie2 predictions; there is also a fraction (23%) of GreenGenie2 predictions that are absent from the published gene catalog. Randomly chosen gene models were tested by RT-PCR and most support the GreenGenie2 predictions.</p> <p>Conclusion</p> <p>These data suggest that training with EST assemblies is highly effective and that GreenGenie2 is a valuable, complementary tool for predicting genes in <it>Chlamydomonas reinhardtii</it>.</p> http://www.biomedcentral.com/1471-2164/10/210
collection DOAJ
language English
format Article
sources DOAJ
author Dutcher Susan K
Kulp David C
Li Linya
Kwan Alan L
Stormo Gary D
spellingShingle Dutcher Susan K
Kulp David C
Li Linya
Kwan Alan L
Stormo Gary D
Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2
BMC Genomics
author_facet Dutcher Susan K
Kulp David C
Li Linya
Kwan Alan L
Stormo Gary D
author_sort Dutcher Susan K
title Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2
title_short Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2
title_full Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2
title_fullStr Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2
title_full_unstemmed Improving Gene-finding in <it>Chlamydomonas reinhardtii</it>:GreenGenie2
title_sort improving gene-finding in <it>chlamydomonas reinhardtii</it>:greengenie2
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2009-05-01
description <p>Abstract</p> <p>Background</p> <p>The availability of whole-genome sequences allows for the identification of the entire set of protein coding genes as well as their regulatory regions. This can be accomplished using multiple complementary methods that include ESTs, homology searches and <it>ab initio </it>gene predictions. Previously, the Genie gene-finding algorithm was trained on a small set of <it>Chlamydomonas </it>genes and shown to improve the accuracy of gene prediction in this species compared to other available programs. To improve <it>ab initio </it>gene finding in <it>Chlamydomonas</it>, we assemble a new training set consisting of over 2,300 cDNAs by assembling over 167,000 <it>Chlamydomonas </it>EST entries in GenBank using the EST assembly tool PASA.</p> <p>Results</p> <p>The prediction accuracy of our cDNA-trained gene-finder, GreenGenie2, attains 83% sensitivity and 83% specificity for exons on short-sequence predictions. We predict about 12,000 genes in the version <it>v3 Chlamydomonas </it>genome assembly, most of which (78%) are either identical to or significantly overlap the published catalog of <it>Chlamydomonas </it>genes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. 22% of the published catalog is absent from the GreenGenie2 predictions; there is also a fraction (23%) of GreenGenie2 predictions that are absent from the published gene catalog. Randomly chosen gene models were tested by RT-PCR and most support the GreenGenie2 predictions.</p> <p>Conclusion</p> <p>These data suggest that training with EST assemblies is highly effective and that GreenGenie2 is a valuable, complementary tool for predicting genes in <it>Chlamydomonas reinhardtii</it>.</p>
url http://www.biomedcentral.com/1471-2164/10/210
work_keys_str_mv AT dutchersusank improvinggenefindinginitchlamydomonasreinhardtiiitgreengenie2
AT kulpdavidc improvinggenefindinginitchlamydomonasreinhardtiiitgreengenie2
AT lilinya improvinggenefindinginitchlamydomonasreinhardtiiitgreengenie2
AT kwanalanl improvinggenefindinginitchlamydomonasreinhardtiiitgreengenie2
AT stormogaryd improvinggenefindinginitchlamydomonasreinhardtiiitgreengenie2
_version_ 1716744790430711808