Bioinformatics 1

Background

Submit: Either using the submit function on DICE or by dropping a paper copy off at the ITO, level 4 Appleton Tower before the deadline.

This assignment is worth 20 marks. It is due 4pm on October 28th. Please remember that late assignments will not be marked. Plagiarism will be checked automatically, please read the guidelines and adhere to them completely. Make sure you show how you arrived at your solution. Please contact us to clarify anything you are unsure of.

The aim of this assignment is to explore some core molecular biology and bioinformatics concepts, and to perform manipulations as shown in the lectures. To make this a little more interesting, we will link it to the investigation of a human disease for which there is a suspected genetic link.

To get started, select any human disease, rare or common, and do some basic background reading into it. Often a simple web search such as (disease name) genetic basis will lead you to key findings pretty quickly.

Question 1. (3 marks)
What is the name of the disease you have selected? Briefly explain why it is thought there is a genetic basis for this disease. What is the human name for the gene that is thought to be involved? Is this gene known by any other names? Whether yes or no, explain how you investigated this.

Question 2. (2 marks)
Download the cDNA sequence for your gene - if multiple are listed, choose the main one - usually listed at the top of the list. Where did you get this sequence from and what was the unique identifier used so that someone else could be sure they were looking at the same sequence? How many base pairs long is your sequence?

Question 3. (5 marks)
Translate your cDNA sequence into protein/amino acid sequence. How many amino acids does your protein contain? Of the 64 possible codons available, how many are used? What is the most common amino acid in the protein? How many codons for this amino acid exist and how often is each used?

Question 4. (5 marks)
Now look at the following database:
http://www.kazusa.or.jp/codon/
The codon usage database lists the frequency which each codon is used in a species (different species prefer different codons). Sequences which have too many rarer codons result in slowing down transcription and inhibition of protein expression - in extreme cases, rare codons are thought to introduce transcription errors when the rare tRNA is not available. If you were to try and express your human cDNA sequence in yeast, which codons in your sequence might cause problems for expression.

Question 5. (5 marks)
We now turn to sequence alignment. You are given the following coding sequence fragments. These are thought to encode a homologous proteins in different species (1 extra mark if you can give the gene names, and 1 extra mark for the likely species). The sequences begin with the start codon:

1. ATGCCGGCGGGCATGACGAAGCATGGCTCCCGCTCCACCAGCTCG
2. ATGCCCGGGTGGATGAATAAGCATGGATCTCGATCGACTACCTCG
3. ATGCCGGCGGGCATGACGAAGCATGGCTCGCGCTCCACCAGCTCG
4. ATGGTCGGCGAACGCGACAGGGACCGTGAGGCGGTACGCTGGGCA
5. ATGGTCGGCGAACGCGACAGGGACCGATGAGGCGGATACGCTGGG

Use the Needleman Wunsch algorithm to compare sequence 1, which is the human gene, to sequences 2-4. If you compare DNA, use scoring: match +2, mismatch -1, indel -1. If you do this on paper, start with codon 10 up to codon 12. If you use the the protein sequence, start at codon 6, and state which scoring matrix you used.
Write out the optimal alignment at and give its score. What can you say about the relatedness of the species. Which species is likely a better model of the human version, 2 or 4?
Sequence 5 is derived from sequence 4, but now has a mutation with potentially rather severe consequences. Use a codon table to find out why (1 extra mark if you can identify the phenotype; the gene name is a hint).

Home : Teaching : Courses : Bio1