The journal Science held an online discussion with the leaders of the ENCODE project (Ewan Birney and John Stamatoyannopoulos) on 13 September 2012. The trancript is below.
Please note that they took three questions from someone you know!
Elizabeth Pennisi: Good Afternoon and welcome to Science Live. Today we're going to talk about new treasures in our DNA. The sequencing of the human genome drove home the discovery that genes were just a small part of our total DNA—what made up much of the rest remained a big mystery. Last week, a massive international project took a stab at solving this mystery with the publication of more than 30 papers in key journals describing the function of much of our DNA. In them, they assert that much of the genome has biochemical activity. The project is called the Encyclopedia of DNA Elements, or ENCODE.
Elizabeth Pennisi: Well covered in the media, the work has also lit up the blogosphere. Some scientists have questioned the project’s conclusions; others say the media and the project oversold the findings; others tout the results as being supportive of intelligent design theory.
Elizabeth Pennisi: We have with us today two ENCODE researchers. John Stamatoyannopoulos is an associate professor of genome sciences and medicine at the University of Washington School of Medicine. Ewan Birney, associate director of the EMBL-European Bioinformatics Institute (EMBL-EBI), was the data coordinator for the project.
Welcome John and Ewan. Can you tell me what this week has been like for you?
Ewan Birney: Definitely busy :) - fun, but exhausting (both physically and mentally) to talk to so many people in so many different forum. It's great though to see the papers and see questions come in about the datasets and how to use them.
3:03 John A. Stamatoyannopoulos MD: Busy! It was really great to see all of these findings come out -- we've been living with the data for a while as it has been generated and analyzed, and I was excited to see how many people from widely varying backgrounds were able to see and appreciate the results. I'm sure it will take a while for everything to sink in -- the data are incredibly deep. I gave quite a number of interviews, ranging from conventional media to medical and scientific trade publications. The broad interest in the ENCODE data has been very gratifying and I think highlights the general curiosity about the genome and a sense of its importance.
Comment From Jerry Wickey My question might be most appropriate for John, While most of us presume that some or even much non coding regions act to regulate gene expression, to my knowledge, there has been no hypothetical mechanism of regulation proposed which can resolve gene expression generally, only bits and pieces of activity here and there by mechanisms which are often unique the gene in which they are discovered. As a computer programmer, I realize that in all self determining machines, such as the genetic central dogma, a mechanism for resolving what Allen Turing and Alonzo Church styled, the lambda function must exist, otherwise the self determining system is unable to resolve its next activity. e.g. no gene expression. I also find that short sequences of RNA can be written which theoretically pair to resolve the fundamental logical operators. That is: Two short RNA sequences pair resulting in a third short RNA sequence, a portion of which is paired and no longer available for further pairing, while the remaining portion presents a unique sequence available for pairing. In a soup of many such short RNA, many different expression scenarios can be evaluated and a resolution quickly derived. Such a mechanism could provide a "universal language" of gene expression which allows genes from one organism to seamlessly communicate with the genetic machinery of another organism, facilitating evolutionary adaptations, and allowing multiple genes to coordinate gene expression. Has any researcher inve 3:07
John A. Stamatoyannopoulos MD: As you point out, endocrine or hormone response pathways are incredibly conserved -- though some have been repurposed during evolution. Hormones are an interesting case because they act directly through regulatory DNA -- the hormone molecule binds to a protein that recognizes specific DNA sequences and sets of the gene expression events that change how the cell responds. The so-called epigenetic effects are mediated through these core regulatory circuits. One interesting finding that is directly relevant to ENCODE data is that the pre-existing landscape of active regulatory DNA is the major determinant of how cells respond to hormones. Thus the maps ENCODE has created will be directly relevant for understanding how different hormones might influence different cell types -- though this will take time to work out.
Comment From JohnBTV
Any thoughts about how ENCODE and its successors will/should change the way we teach genetics and molecular biology at the university level?
John A. Stamatoyannopoulos MD:
As the instructor of an undergraduate course in this area, I am quite familiar with all of the available texts -- and unfortunately none of them cover genome regulation particularly well. One strategy that we have found to be particularly effective is to introduce the students to an online genome browser that lets them 'surf' ENCODE and other genomic data. These have now become quite easy to use, and require little formal introduction. In short order, they start to develop a feel for how the data are 'lighting up' the genome --- and it stimulates their interest and questions in topics that would otherwise have been quite dry to hear about with lecture slides alone.
Comment From Prabhat
Genes are fate deciding units of a cell and our hard working and dedicated great Scientists are engaged in molecular research, at cumulative rate, to know everything about genes and their control mechanism; why GENETIC ENGINEERING and BIOTECHNOLOGY have limited capability to find cure for diseases like Parkinson, Cancer, AIDS, Prions related diseases etc.?
ENCODE is a foundational resource - the next layer on top of the genome. And it's going involve many people - both basic researchers, clinical research and practicing doctors to help this. It's very unlikely that it will involved genetic engineering, but more prosaic things like better diagnostics, or better understanding of which drugs would work in which scenario.
Comment From G. Sperber
Whu can I not receive the sound from this Webinar?
There is no sound component to this online chat
Comment From mem_somerville
I have a question for Ewan. Some months ago a rogue tweet went around that said this: Ewan Birney: ENCODE data analysis confirms that HeLa cells not human #genomics #bog12 https://twitter.com/genomebiol... Can you clarify that, maybe go into a bit more detail on this? Or point me to one of the papers that has the evaluation of the HeLa features? Honestly I haven't had time to get through all of them yet...
HeLa is one of the best laboratory cell lines; it came from a cervical carcinoma (cancer) from an African american woman, Henrietta Lacks. There is a very good book about this ('The immortal life of Henrietta Lacks') which I highly recommend reading. Like nearly all laboratory cell lines, since it has been kept in the laboratory its genome has undergone rearrangements and mutations in the lab, meaning that although it is an extremely useful model cell line, the *cell line* is no longer an very good model for a normal human genome. In ENCODE we also used a number of primary and ES cell lines which was more like this. The twitter comment was taken from a talk on ENCODE, and out of context (the danger of summarising in 140 characters!)
Comment From mohamad
how much of the genome do we now understand
John A. Stamatoyannopoulos MD:
This is actually a fairly deep question since annotating the genome -- ie, saying that here we see such-and-such kind of element, and there in these DNA bases something else -- does not imply that we fully understand it. This understanding is going to come from further experimentation and analysis that allows us to tie together different components. For example, we have annotated about 4 million DNA switches that are characteristic of gene control elements, but we have only connected a fraction of these (about 600,000) with the genes they likely control.
So there is quite a bit left to understand and piece together -- but now we have a huge inventory of parts to work with.
Comment From Thom Nelson
For John or Ewan (or both!): What are your thoughts on how the data generated by ENCODE will affect how we study genetics and genomics in other taxa? Other vertebrates? Invertebrates?
There was a successful sister project to ENCODE - modENCODE that looked at similar techniques in Fly (drosophila) and worm (C. elegans) - and now in the next phase, Mouse will become well annotated in ENCODE. Furthermore I know of many other communities which are either informally or formally doing "ENCODE like" projects.
All of this speaks to impact of having these assays in our understanding of how organisms work. One doesn't necessarily have to be as complete as the human ENCODE to gain useful insights, and one can look at the analysis of the human ENCODE to work out some of the best combinations of assays to try (though every organism might have their own particular take on, say, specific histone modifications)
Comment From JohnBTV
Just want to say a word of thanks for those who worked to make all of the ENCODE data and papers available free for everyone online, especially journals like Nature and Science that don't normally do so. And the iPad app is cute to show off, but I didn't see anything that was not available via the web site. You should have had Coldplay do a video or something. :-)
Thanks. It was alot of hard work from a lot of people - and the key thing is getting access to the data. Check out the virtual machine if you are really geeky!
Comment From dcgent
Creationists/Intelligent design advocates are coopting the ENCODE results as evidence that a God or other deity designed a genome with "junk"--are you surprised at that and what's your reaction. Does ENCODE's data support the classic theory of evolution or speak to it at all?
John A. Stamatoyannopoulos MD:
ENCODE's data provide a unique and powerful window through which to view evolutionary change. We can see those changes directly by lining up the genome sequences of many different organisms -- these line-ups have revealed millions of regions where all the genomes agree, indicating sequences that have been specially preserved by evolution while others have decayed away (ie freely changed their letter codes). We now see that a large proportion of these 'conserved' regions are lighted up by ENCODE annotations, indicating that they are marking spots in the genome that contain important instructions for cell function.
Comment From mohamad
what is agenome
The genome is the complete set of DNA in each of our cells. It encodes all our genes (the most well understood being those which make proteins, so called protein coding genes) and there is considerable amount of non-genic DNA also in our genomes
1 copy of our genome is from 24 molecules of DNA, and the DNA is made from 4 possible chemical components. Rather than spelling the chemical compound out long hand, atom by atom, we abbreviate them as 4 letters, A, T, G and C. The genome has 3 billion letters in 24 molecules.
Comment From Robin
Looking to understand some basics here. Understand that new findings support cell regulation at a very basic level. Does this include external environmental influences? Can emotions also impact gene switching? How does work performed by "junk DNA" differ from epigenetics?
John A. Stamatoyannopoulos MD:
Cells that make up the body tissues have many complex and remarkable mechanisms for sensing the environment -- temperature, chemicals, etc. Most such signals are relayed down to the cell nucleus where the genome 'lives', and cause changes in what genes are switched on or off, and to what degree. So in that sense the environment can directly affect gene switches.
Emotions are more complex, but we have some reason to believe that similar kinds of circuits may be involved. For example we can see changes in the brain's utilization of glucose and various chemicals based on emotional state which are confined to specific brain regions. And it is very likely that the individual neurons involved in these regions are undergoing changes in their gene activation patterns. While this is difficult to study directly in humans, some researchers are looking at just this question using mice (though with more primitive emotions such as fear or satiety).
Comment From Ward
As a student in high school, I don't always understand the articles. But my question is simpler than others (and may sound stupid). Can you tell me more about the ENCODE project and how it is helping in the understanding of our DNA as well as disease?
The ENCODE project aimed to start our understanding of how the human genome works. We know that (nearly) all the information that determines a human is in the genome, as we all start off as single cell with this DNA. However, we had a patchy understanding of how it works, in particular away from protein coding genes.
To work out how the genome works, we used the fact there are many tiny machines (proteins and RNA - RNA is very like DNA) in each of our cells which know how to "read" parts of the genome. By monitoring where these little molecular machines are on the genome, or how parts of the DNA are copied into RNA (there are quite a few different types of RNA as well), we start to gain some insight into the genome.
We did many such experiments, across different cell types (eg, one cell type was very similar to a liver cell type; another was very similar to a white blood cell). This way not only can we see what is similar, we can also see differences between these cell types.
There is alot more to get to know and understand here - this is definitely closer to the start than the end. But it is a substantial amount of data, and analysis, to start on this journey.
If you are at high school now, there will be plenty to do when you graduate if you'd like to work in this field :) 3:42
Comment From Brian A.
I am sitting in my college biology class right now and this is very interesting to read all of the advancements that have been made as well as seeing my professor explaining what has been learned and studied in the past. Thank you all for all your hard work and dedication!!!
Comment From Chris Day
Is it accurate to say that DNA is 'functional' because it is transcribed? Couldn't this be due to cryptic promoters or due to sloppy terminators? The 80% number seems particularly high given that there are more compact genomes out there too.
The word "functional" is a very context dependent word, and very early on in our paper we operationally define the phrase "a biochemical functional element" as something that emerges from these experiments. "biochemical activity" is an alternative way of saying this. You are right that there are "sloppy" processes in the cell; for example, we know that a particular type of promoter (CpG promoters) are not particularly choosy about the precise nucleotide they start on. However, both this "sloppiness" might well be used by the cell sometimes, and in anycase, just because some things are due to more random processes does not mean all of them are. One has to start by mapping these elements.
As for genome size, it is clear that the genome size is uncorrelated with (apparent) complexity of organism. In particular in the bony fish clades there are similar looking fish with (I think) 50 fold differences in genome size. A considerable amount of this is really how tolerant/permissive a genome is about transposon repeats. Recently we've learnt alot about how genomes have specific molecular immune systems to suppress repeat behaviour (interesting involving non coding RNA), and my own instinct is that most of the variation in genome size is really about the most recent history of the local arms race between the host genomes 'molecular immune system' and their repeats.
But - to stress - there is alot more to understand about how these processes work, and I've been surprised and impressed by recent work showing how repeats have been co-opted for specific functions (the Odom/Flicek lab has particularly nice work on this, as do others)
Comment From Lina Riego
Are you interested in doing an ENCODE project of organisms more related to humans like chimps? Would be informative?
Ewan Birney: Yes - I think this would be great! - I already know of some groups doing this (Greg Crawford did a great piece of work on DNaseI in Chimp - published in Plos genetics) It is also going to be really good to look at ENCODE like information across different individual humans - John, myself, Greg Crawford, Jason Lieb, Mike Snyder, Yoav Gilad and many other groups are looking at this.
Comment From G. Tonsmann
What percentage of the total human genome has been attributed a function (in one form or another) according to ENCODE? Even if it has not been certified that this function is real.
John A. Stamatoyannopoulos MD:
To annotate functional elements in the genome, ENCODE has been employing technologies that recognize the characteristic biochemical 'signatures' of different kinds of elements. For example, there is a specific signature of regulatory DNA regions -- the gene-controlling switches -- that was extensively investigated by hundreds of laboratories before ENCODE started, so we are pretty sure that when we see this signature, we know it is marking regulatory DNA. By scanning the genome in hundreds of different cell and tissue samples, we have annotated regulatory regions that -- when tallied together -- account for around 40% of the genome sequence. But very few of these are utilized in any individual cell type -- in, for example, skin cells, only about 1% of the genome appears to be active regulatory DNA (with the other regulatory regions important for other cells lying dormant). Genes occupy somewhat less of the genome. Traditional protein-making genes take up about 1.5% of the genome. Other genes that make only RNA products are a few % more. So overall, we can now directly assign activities to nearly half the genome. The expectation is that, if we plucked these sequences out of the genome, we would see changes in genome function such as changes in gene activity. If we include RNA production, we can 'see' biochemical products coming from a larger fraction of the genome -- roughly 80%. But in many cases, this includes sequences that the RNA-producing engine (RNA polymerase) is merely running over.
Comment From JohnBTV
The logistics of ENCODE must have been overwhelming, with hundreds of scientists and dozens of labs participating. Mike Eisen has has some things to say about the funding of large-scale science at the expense of individual laboratories, and we were talking about this in a group meeting yesterday. Your thoughts?
The logistics were challenging but doable - it's mainly about organisation, but there is a lot of details to work out. There is a long debate about the relative balance between "small" and "large" science. As I said in my Nature commentary, I believe the vast majority of funds should go to "small" science; however, even in this scenario as large projects tend to concentrate funds to a small set of groups, these large projects have to be run with care. Mike Eisen has some strong views, and he is very keen to ensure both that the creativity of small science is both well funded and not stifled by these projects. I also agree on both of these things, though I think we would disagree on what that means in practice. However a really good thing is that there is sensible, open, debate which informs the strategy of funding bodies. From this perspective I am very glad to work in genomics where we have these debates as colleagues. Often over beer :)
Comment From nickjay
What is the next step in translating ENCODE data into the clinic? In terms of how clinicians will interface with epigenetic data in the front line. Is the reality going to be epigenetic screening, disease markers, personalized medicines for epigenomic manipulation of disease? 4:00 John A. Stamatoyannopoulos MD: The genome is definitely coming to the clinic -- but there are many levels of filtering that separate genomic data from clinical decision making.
To be important in the clinic, genomic data (or really any kind of test result) must be actionable -- that is, it must result in a specific treatment path that is different from what one is doing currently. Those actions may range from recommendations such as 'avoid this kind of food' to 'use this specific drug'. Currently we have relatively little information that connects specific DNA changes with actionable clinical practices. One fundamental key to making such connections is understanding exactly what such changes do at the molecular level. Here is where ENCODE is going to make a big difference, by accelerating the work of hundreds of researchers around the world who are trying to understand how particular genes function and to gain insights into their roles in normal physiology and disease.
ENCODE provides the first large-scale lens through which to view DNA changes outside of genes -- but it will be quite some time before this will translate into clinically relevant information. The immediate next steps wil be to integrate ENCODE data with data from genetic studies of disease as completely as possible to understand which DNA changes we should focus on.
Comment From Curtis So I am a little confused. I thought a large amount of bioinformatic analysis determined that the majority of the genome contains a near random assortment of ATGC's. But the ENCODE reports seem to suggest that the majority of the DNA is nonrandom. How do you reconcile this.
4:01 Ewan Birney:
The DNA letters in our genome in some sense are clearly not random (for example, if I told you the chimpanzee's genome, you could guess the human genome very well). However there are two scenarios where we use the word 'random' about the genome bases. Firstly there are random processes - such as mutational processes which often have some characteristics (a bit like the dice being loaded) but do seem to operate like a "dice". Secondly in bioinformatics we very often have a random model of DNA as a way of "modelling" DNA bases, in other words trying to simulate what DNA "would look like" - these random models rarely are based on any mechanistic understandings of the DNA, rather they are a sort of statistically ok model.
So - there is no contradiction between the findings of ENCODE and the fact we use random models in our analysis (indeed - those random models are often critical in our analysis)
4:02 Elizabeth Pennisi:
We have to wrap up now. My apologies if we didn't get to all of your questions. But tune in next Thursday. Next week's chat is on calorie restriction in primates.