Matters arisingEdit

  1. New section on front page provides direct links to everyone's contributions
  2. Science online discussion of ENCODE
  3. Wikipedia: Human genome
  4. Students are encouraged to drop by my office at any time for a short chat about how well this course seems to be fulfilling your needs so far, and ways that we might improve it.
  5. Completion of textbook chapter 2?
  6. Access to chapter 3?
  7. Other issues?

Sequence DatabasesEdit

  1. International Nucleotide Sequence Database Collaboration (INSDC): Wikipedia article
  2. Genbank at National Center for Biotechnology Information (NCBI): Wikipedia article
  3. EMBL-Bank at European Bioinformatics Institute (EBI)
  4. DDBJ (DNA Data Bank of Japan)

NCBI Data AccessEdit

  1. Google search GenBank ID, if you have it from a publication. Example NC_005253 (Note that this also works well for identification numbers for papers in PubMed (PMID) and PubMed Central (PMCID).
  2. Multi-database web search via NCBI Entrez
  3. Individual database web search via NCBI PubMed page
  4. Search for complete genomes
  5. Search for all sequences
  6. Automating searches via NCBI E-Utilities services
  7. My NCBI accounts and services
  8. Value of connecting to NCBI via UVM
  9. Exercise => Find YFG at NCBI

EBI Data AccessEdit

Multi-database web search via Explore the EBI

  1. Complete genomic sequences
  2. All Nucleotide sequences
  3. Macromolecular structures
  4. ENSEMBL genome browser
  5. UniProt
  6. Exercise => Find YFG at EBI

What is in a sequence file?Edit

  1. FASTA format: Wikipedia article
  2. GenBank format:
  3. RefSeq: The Reference Sequence Database
  4. Exercise => Download YFG in both FASTA and GenBank formats

Introduction to GeneiousEdit

  1. Installation issues?
  2. Introduction to the interface (as time permits)

Assignments for Wed 19 SeptEdit

  1. Download YFG sequence in Geneious
  2. Extract feature table and add to YFG article
  3. Extract Annotated map of gene, mRNA, region of chromosome and add to YFG page
  4. Explore sequence-relatives of YFG (in same or other species, as you wish)
  5. Continue adding other content to your YFG page
  6. Start reading Chapter 3

Class NotesEdit

International Nucleotide Sequence database collaboration: integrates the three big databases: GenBank at National Center for Biotechnology Information, EMBL Bank at European Bioinformatics Institute and DDBJ (DNA Data Bank of Japan). They contain the same information, but the interface varies. NCBI Data Access:

  • Access code: unique ID for each gene, it is the easiest way to find a gene of interest
  • REFSEQ: referenced sequences: this collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. Cca 10% of sequences in GenBank are REFSEQ.
  • FASTA format: is a text-based format for representing DNA sequences. It provides basic information in a compact way.
  • Graphic format: browser window map of the sequence and it's chromosome location. It is possible to select a gene and zoom into a certain region of DNA = genome browser: a web based application.
  • "my NCBI site": NCBI gaterway for your personal research - place to store articles, save researches and have them emailed to you, make collections, search history)
  • NCBI access:
    • Pubmed: provides entrance to a particular database
    • Entrez: a search tool, which searches all databases possible


  • it has a more simple search box than NCBI, it is more redable, userfriendly
  • provides a nice summary of the gene and protein
  • ENSEMBL gene ID is not identical to the NCBI ID