CRISPR/Cas-9 genome editing workshop
The 2017 workshop will focus on the recent Nature report: Correction of a pathogenic gene mutation in human embryos by Ma et al. During the workshop we will engineer the reagents used to correct a pathogenic MYBPC3 variant. MYBPC3 is a gene pathogenic variants in which cause autosomal dominant hypertrophic cardiomyopathy.
Benchling: although, there are many online tools available to help visualize and edit DNA, we recommend using Benchling. Follow the link and set up your own account to follow the rest of the workshop.
MYBPC3: to learn more about this gene and its involvement in cardiomyopathy follow this link to Genetics Home Reference.
Step 1. Visualize MYBPC3 variant in c.DNA / genomic DNA
The variant edited out with CRISPRs in the study above was a “heterozygous dominant 4-bp GAGT deletion (g.9836_9839 del., NC_000011.10) in exon 16 of MYBPC3″. This g.9836_9839 del, refers to positions in genomic DNA in the NCBI reference NC_000011.10 which specifies the genetic code for MYBPC3.
Following the NCBI link above you should see the following page:
the header of this genomic reference shows that we are looking at a locus on human chromosome 11 in sequenced genomic assembly GRCh38, much of the information is not pertinent to this workshop, so scroll down until you reach FEATURES:
Since we hope to change genomic DNA, we have to take into consideration intron-exon boundaries. In the mRNA section of the page above we are specifically interested in intron-exon boundaries, which are indicated by comma separated regions: join(1..80, 1198..1464 etc.). The first exon is base pairs 1 – 80, second is from 1198 to 1464 and so on. Between them are introns. Introns are regions of genomic region which are not expressed, they get spliced out from RNA and never become protein. Our deletion g.9836_9839 del lies in exon 16. We can confirm this by looking at the list of exonic coordinates above, in fact g.9836_9839 is close to the middle of ~100 base pair exon 16 (9768..9873). Checking that the mutation does not lie close to an intron is important, but in this case we can model our CRISPR-Cas9 experiment using the protein coding sequence without worrying about having to model intron-exon boundaries.
Click on link /transcript_id = “NM000256.3” to copy and paste the coding region of the mRNA sequence for MYBPC3 to benchling. On the next page scroll to CDS (coding DNA sequence) and click on it. Then click on FASTA button to display sequence in FASTA format, which is the easiest to copy and paste raw coding region sequence.
Copy the coding regions sequence, open the benchling tab and create a new sequence, name it MYBPC3_CDS and save it.
We can now go to the next step in this workshop.
Step 2. Prepare protein translation of normal and variant (mutant) c.DNA
The central dogma of molecular biology is as follows:
DNA -> RNA -> Protein
we have moved on from DNA to RNA, but now lets look at the protein.
In your cDNA benchling file select all bases (Ctrl + A), right click on the selection and choose create translation, forward. You will now see the protein sequence underlie the cDNA sequence.
The g.9836_9839 del genomic deletion corresponds to CDS (Coding DNA Sequence) deletion c.1420-1423. Lets mark this as an annotation feature in benchling, find and select these positions 1420 to 1423, right click on the selection and choose create annotation, save this annotation as “Del”.
Now lets see what happens to the protein sequence if we delete these 4 base pairs. Use “delete” button to delete the Del sequence, and then undo (“Ctrl + Z”) the deletion, you can then delete the sequence again to go back and forth between deletion and normal sequence.
You should see that the protein sequence changes drastically downstream the GATC deletion, and it also introduces * characters, which mean stop. So as a result of this deletion protein translation halts half way through the protein, which is not a good thing for the protein, in fact stop mutations (a.k.a stop-gained mutations) are almost always pathogenic. Note also that this deletion changes a run of amino acids before the stop, this is because we deleted 4 base pairs, and since amino acids are encoded by thee base pairs each, the 4 base pair deletion shifted the protein frame before adding a stop codon. This is called a frameshift mutation. There is only one kind of mutation that is even worse than a stop gain, its a frameshift mutation.
After we have made the deletion, lets save this sequence as MYBPC_cDNA_del1420_1423.
We are now ready for the next step in this workshop.
Step 3. Search for and select best CRISPR/ Cas-9 guide RNA (gRNA)
We will be trying to reverse the mutation (del 1420-1423) back to normal. To do this we will need to cut the DNA using CRISPR-Cas9 technology as close to the site of the mutation as possible. Select the whole MYBPC_cDNA_del1420_1423 sequence. Choose the “CRISPR” tool on the right side panel of benchling UI, and press “design and analyze guides”.
Next, we select the cDNA region that we want to edit. Lets pick 1300 – 1550 to visualize all guide RNAs surrounding the mutation we want to edit out. Then , click the green “Finish” button and the green “+” button on the page after that.
On this page we have to do two things, first click on “No regions set” to define our cDNA position within the human genome.
Choose “Find genome matches”.
Benchling will scan your sequence file and identify the best matching sequence in the databases, pick the first selection on chromosome 11 and click “Set genome region”.
Now we are presented with a sortable list of all CRISPR guide RNAs. Here we want to select the closest guide RNA with the highest On-target and highest Off-target score to maximize the chance of success of gene editing. Note also we can confirm the PAM sequence for each of the guide RNAs.
Once selected we can proceed to the next step.
We can now move back to our sequence map by selecting the sequence map tab in the upper left corner.
Looks like we have our guide RNA selected and in the right position, note also the PAM sequence following immediately after the guide RNA (AGG)
Step 4. Prepare the repair template DNA to correct the pathogenic variant
Now that we have selected a good targeting CRISPR guide RNA which will allow guide Cas9 enzyme to the site of the mutation, Cas9 will cut at the exact position of the mutation we want to edit. Now that we have come up with a strategy to cut the diseased DNA strand we will need to design a repair template DNA meeting three important parameters:
1. The repair template is at least 100 base pairs long.
2. The repair template will carry identical to normal protein sequence.
3. The repair template must not be cuttable by our g.RNA – Cas-9 pair
In our case it is fairly easy to meet all three parameters. We can just copy and paste a 100 base pair region surrounding the mutation site (50 bp on each end), and because our mutated strand is missing four base pairs, it is unlikely that the guide RNA we designed above will cut normal DNA.
What if the mutation was just a 1 base pair substitution?
In this case we will have to revisit a key quality of protein translation – redundancy. So basically we need to remove the necessary base-pair for Cas-9 cutting but retain the same amino acid sequence. In principle we are introducing a synonymous mutation into the repaired DNA sequence. Synonymous mutations are 1 base pair changes to DNA that do not alter the protein sequence. To achieve our three repair template parameters we would first copy and paste a 100 base pair sequence from our original cDNA in a way that places the mutation in the middle of the sequence. Next, we will identify the Cas-9 PAM site. After which we will determine how to alter the PAM sequence without changing the amino acid sequence encoded by the repair template. Since the PAM sequence (NGG – N stands for any base pair, such as AGG in the example above) is crucial for Cas-9 to cut DNA efficiently, substituting it to say NGA will prevent Cas-9 from cutting the repair template.