A Conversation with Dr. John Stamatoyannopoulos about ENCODE and Cancer Research
What are the goals of ENCODE?
When ENCODE began in 2003, the overall goal was to annotate all the functional components of the human genome, with an emphasis on the regions that regulate genes. Genes occupy about 2 percent of the genome. And by genes I mean the regions of the genome that get transcribed into RNA and then turned into the proteins that make all the tissues of the body, such as skin and muscle and hair.
Hidden in the remaining 98 percent of the genome are the instructions that tell the genes how to switch on and off in different kinds of cells. A chief goal of ENCODE has been to find those instructions and understand how they are written in the genome.
What do the instructions look like?
In essence, these instructions are organized into millions of DNA “switches.” These switches consist of strings of genetic letters, maybe 100 to 200 letters long, that can be thought of as sentences made up of short DNA words. The DNA words function as docking sites for special regulatory proteins.
When the proteins in a switch are docked to their respective words, the switch becomes active. We have identified, so far, around 4 million switches in the human genome, and there are undoubtedly many more that have not been seen yet.
What was known about regulatory DNA when the project began?
At that time it was well known that genes have switches that control their activity. Up until 2003, perhaps a few hundred switches had been identified. But nobody knew how much of the genome sequence was actually used for things like these switches. Nobody knew how many switches were out there or how many it took to control a gene.
How much of the human genome was thought to be useful?
Cancer cells are taking advantage of the genome’s control circuitry in subtle but important ways that were not previously anticipated.At the time the project began, the widely quoted estimate was that about 5 percent of the human genome was actually good for something. The outcome of the ENCODE project so far shows that this number was a significant underestimate. As summarized in a paper by ENCODE investigators in Nature, approximately 80 percent of the genome yields a specific biochemical activity, such as the production of RNA or interactions between proteins and DNA.
—Dr. John Stamatoyannopoulos
—Dr. John Stamatoyannopoulos
So, it appears that an extraordinary amount of the genome is used for something. A significant component of this is the switches for controlling genes. Altogether, at least 40 percent of the genome appears to encode such switches, which is a lot!
By comparison, around 5 percent of the genome encodes the information that specifies proteins or the core building blocks of so-called non-coding or RNA genes. And there are many other classes of functional elements, some of which we haven’t yet studied.
What have you learned that is relevant to cancer?
First of all, cancer cells have specialized regulatory DNA switches that we, thus far, do not see in normal cells. We don’t know how to read the information in these switches completely yet, but it should help us decipher the regulatory pathways that are active in cancer cells but not normal cells. And in the past, when we have been able to understand cellular pathways in detail, new types of treatments have followed. This is a very exciting avenue that will be aggressively pursued by many people.
A second interesting aspect of this work relates to studies of genetic variation that has been linked to various cancers through genome-wide association studies. To date, researchers have identified a few hundred genetic variants—single letter DNA changes—associated with the risk for any of 17 kinds of cancer. But 95 percent of these genetic variants do not map to genes. In fact, we reported in Science that the majority of these variants lie in the switches that control genes.
What struck you about the work published so far?
Cancer cells use the genome in ways that we did not anticipate. The results to date show that cancer cells take advantage of the genome’s control circuitry in subtle but important ways that were not previously anticipated. A further understanding of this will deepen our knowledge of what makes a cancer cell tick.
Cancer cells also seem to use special sets of instructions already written into the genome that likely were there originally for some other purpose. Nonetheless, these cancer cells have somehow figured out how to use the instructions. We can now try to understand how this happens.
Has the project changed your thinking about genes?
Yes. It’s clear that we need to think now in terms of groups or networks of genes—of maybe dozens or hundreds of genes, all functioning together. This whole idea of looking at individual genes that are responsible for this or for that is becoming outdated.
How will the results help cancer researchers?
The immediate effect could be to greatly accelerate basic research projects that are already under way. For example, many researchers want to know how a particular gene is regulated, but this typically requires costly and time-consuming experiments. Now, for a huge number of genes researchers will be able to just look this information up online using a viewing application called a genome browser. This is particularly true for genes in cancer cells, because ENCODE used many cancer cell lines.
Ultimately, I think the results will enable a more sophisticated understanding of how cancer cells function relative to normal cells, and this should improve our ability to devise strategies to fight the disease.
This will take time?
Yes. It is important for people to understand the timeline.
We hope these results will accelerate the overall research timetable by helping researchers discover new things more quickly, but nobody should expect that this is going to change clinical practice in the next couple of years.
What is the next phase of your ENCODE research?
We’ve learned that many of these functional elements are highly specialized for different kinds of cells. So creating a complete catalog will require the study of a great many different kinds of cell types and cell states—only a fraction of which have been covered to date. Most of these are going to be normal cells.
And you will continue to study cancer cells as well?
Yes. ENCODE has studied nearly 30 different cancer cell lines so far, but there are obviously many different kinds of cancers; it’s an incredibly interesting area because cancer cells seem to be unique relative to one another as well as relative to normal cells. One of the key advantages of a project like ENCODE is that we gain valuable insights about abnormal cells by making comparisons with a deep catalog of genomic elements that function in normal cells.
The more we study the genome the more interesting things we find. I don’t think anybody should write off how much information is really there.
—Interviewed by Edward R. Winstead