COGs can be used to predict the function of homologous proteins in poorly studied species and can also be used to track the evolutionary divergence from a common ancestor, hence providing a powerful tool for functional annotation of uncharacterized proteins.
NCBI provides a COG database that consists of 4,873 COGs that code for over 136,000 proteins from the genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes. This database uses completely sequenced genomes to classify proteins using the orthology concept. This concept relies on the one-to-many and many-to-many relationships between species to identify orthologous proteins.
- NCBI COGs database
- Chapter 22 of the NCBI handbook: The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes. NCBI Bookshelf ID: NBK21101.
- NCBI News Letter: Protein Families and Genome Evolution. Published Feb 1998.