COGs, Clusters of Orthologous Groups, are groups of three or more ortholog genes, meaning they are direct evolutionary counter parts and are considered to be part of an 'ancient conserved domain'.  A COG is defined as three or more proteins from the genomes of distant species that are more similar to each other than to any other protein within the individual genome. 

Visualization of COGs. Complements of NCBI (3).

COGs can be used to predict the function of homologous proteins in poorly studied species and can also be used to track the evolutionary divergence from a common ancestor, hence providing a powerful tool for functional annotation of uncharacterized proteins.  

NCBI provides a COG database that consists of 4,873 COGs that code for over 136,000 proteins from the genomes of 50 bacteria, 13 archaea and 3 unicellular eukaryotes.  This database uses completely sequenced genomes to classify proteins using the orthology concept.  This concept relies on the one-to-many and many-to-many relationships between species to identify orthologous proteins. 


  1. NCBI COGs database
  2. Chapter 22 of the NCBI handbook: The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes.  NCBI Bookshelf ID: NBK21101.
  3. NCBI News Letter: Protein Families and Genome Evolution.  Published Feb 1998.

Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.