Model splicing

This is another archival repost, written for the old blog in January 2008.

The central dogma of molecular biology, first described by Francis Crick in 1958, describes the flow of information between DNA, RNA, and proteins.[1] The central dogma is interesting, but I believe that its use in teaching is somewhat misleading and gives it undue importance. If you’ve come across the central dogma before, it was probably in an undergraduate or perhaps high-school lecture, where it was casually mentioned when explaining that gene expression involves the flow of information from DNA sequence to messenger RNA, and from there to protein sequence and structure. Because we think of gene expression in terms of the information carrying molecule, introductory biology teaches gene expression in those terms: we think of it as a two step process of transcription (DNA to RNA) and translation (RNA to protein).

Gene expression is not a two step process, and of all the steps involved, transcription and translation are not necessarily the most interesting. This week’s Thursday paper is “Pre-mRNA Secondary Structures Influence Exon Recognition”,[2] by Hiller, Zhang, Backofen, and Stamm, and it looks at a particular aspect of one of the lesser known steps: alternative splicing. The story as told by introductory biology is that DNA is transcribed into messenger RNA (mRNA): a carbon copy of the information in DNA whose sole purpose is to convey the information from the precious DNA archive, which is kept safely in the nucleus, out to the sites where the proteins are produced. In fact, the result of transcription is “pre-mRNA” (or “primary transcript”), which undergoes a series of modifications before it is ready for translation. One such modification is splicing.

When researchers started examining the human genome, they were surprised at how many genes they found — eventually coming down from an estimate of hundreds of thousands, to something around 25,000. But they were sure there were far more proteins than that. The reason proteins outnumber genes is that evolution has stumbled upon an efficient way organising things: make several proteins with a single gene. Thus most genes (in “higher” organisms, anyway) are split into many “exons”, each specifying a different section of the protein sequence, and “introns” (non-coding sequences which contain metadata). Thus the protein coding sequence may be split into sections A, B, and C, and the gene may have, say, three alternative versions of each; the protein can be constructed with A1, B1, and C1, or A1, B2, and C3, or A1 and C1 alone, and so on. Splicing is the process that organises the exons. WhatHiller at al are asking is: how does the gene expression machinery know which exons to pick for the protein desired?

It has already been shown that certain sequences within the pre-mRNA, and even features on the DNA, act as signals for the splicing machinery, by altering how the RNA and splicing machinery interact. At the DNA level, alternative promoter sequences situated at different positions upstream of the gene allow the production of a variety of different primary transcripts. Then there are “enhancer” and “silencer” sites, which occur both in introns and in exons (the four are ESEs, ESSs, ISEs and ISSs), and are collectively known as splicing regulatory motifs.[3] We tend to talk about the information contained in nucleic acids in terms of nucleotide sequence. However, unlike DNA, which forms the famous double stranded helical structure, RNAs are (usually) single stranded, but can fold into a variety of 3-dimensional structures by forming double stranded sections with distant regions on the RNA strand. Often, this 3-D structure is what determines howRNAs interact with other biological molecules. So, the first question that Hiller et al asked was: what is the relationship between splicing regulatory motifs and 3-D structure?

This question is rather difficult to answer. Very few mRNA structures have been empirically determined, and the methods for determining them remain expensive and time consuming. However, enough is known about how these structures form to allow the creation of computer programs to predict likely 3-D structures. Variables affecting structure include the locations at which protein co-factors bind to the transcript, the length of the double stranded region that is formed by folding, the proximity of the sections which come together to form double stranded regions, and most importantly, energy minimisation. Using this knowledge, each nucleotide in the pre-mRNA is assigned a probability of being unpaired in the folded structure. While structure has to be estimated, the location of the regulatory motifs is on firmer ground: the AEdb database contains gene sequences for which the locations of motifs have previously been determined experimentally. Using the predicted structures of these sequences, it was found that splicing regulatory motifs are far more likely to be single stranded than the average sequence.

Indicative of something interesting, but it’s not very convincing on its own. So Hiller et al set out to show that alternative splicing is affected by the single- versus double-stranded structure of splicing regulatory motifs. They looked at the SXN-minigene, a gene whose splice variants are already well understood, and in which the effect of motif sequences on splicing has already been characterised. They engineered versions of theSXN gene which had either silencers or enhancers (or, as a control, random sequences of equal length) added either within a single stranded section or within a double-stranded section. They predicted that regulatory motifs would be less effective when hidden in double-stranded structures, and this is what they found. Enhancers, whose job is to make sure that the exon is kept, and silencers, whose job is to make sure that the exon is removed, only worked efficiently when located in single-stranded structures.

The conclusion, therefore, is that mRNA structure is part of what they call the “splicing code” (an analogy to the genetic code, which maps nucleotides to amino acids). This conclusion is nothing particularly surprising — it has long been known that many DNA and RNA interacting proteins directly interact with unpaired nucleotides.[4] But it leads me to make a hypothesis that I’d like to put to the splicing experts — I don’t have enough background in this field, and have not yet had time to read up on whether it’s a plausible hypothesis, or even a novel one. My hypothesis is this: alternative promoters cause a frame-shift of the transcript in terms of the regulatory motifs. Different promoters will therefore be associated with a different set of regulatory motifs because the transcript that they produce has a different 3-D structure. Is such a simple solution possible, and has anybody else considered it?


  1. Crick, F. (1970). Central Dogma of Molecular Biology. Nature 227, 561-563
  2. Hiller, M., Zhang, Z., Backofen, R., Stamm, S. (2007). Pre-mRNA Secondary Structures Influence Exon Recognition. PLoS Genetics, 3(11), e204. DOI: 10.1371/journal.pgen.0030204
  3. Blencowe BJ (2000) Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci 25:106-110. (Cited in Hiller et al 2007.)
  4. e.g. S.D.Auweter, F.C. Oberstrass & F.H. Allain (2006) Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res 34:4943-4959.

Leave a comment

Your email address will not be published. Required fields are marked *