Lab Solution

Control Structures

You can download all of our solutions as lab04.py

  1. What percentage of codons across all primary reading frames are ATG?
    If this were completely random, we'd expect 1/64=1.5625% of the triples. We observe 1.196% for guinea pig and 0.978% for human.

    count = 0
    for k in range(len(dna)-2):    # N.B. stop value
        if dna[k:k+3] == 'ATG':
            count += 1
    
    percent = count/(len(dna)-2) * 100
          

  2. If two consecutive nucleotides match each other, how often is the next nucleotide that same nucleotide?
    If nucleotides were completely random, we’d expect 25%;
    We observe 28.392% in guinea pig and 30.620% in human.

    doubles = 0
    triples = 0
    for k in range(len(dna)-2):    # N.B. stop value
        if dna[k] == dna[k+1]:     # neighbors match
            doubles += 1
            if dna[k] == dna[k+2]: # the third of the triple matches as well
                triples += 1
    
    percent = 100*triples/doubles
          

  3. How many times does a motif of the form CC?AT occur within the sequence? (where ? could be anything)
    For guinea pig, 111 times; for humans, 132 times.

    total = 0
    for k in range(len(dna)-4):    # N.B. stop value
        if dna[k:k+2] == 'CC' and dna[k+3:k+5] == 'AT':
            total += 1
          

  4. When the motif CC?AT does occur, what percentage of the time is the middle nucleotide an A? (A so-called cat box CCAAT)
    For guinea pig, 27.027%; for humans, 21.212%.

    motifs = 0
    catbox = 0
    for k in range(len(dna)-4):    # N.B. stop value
        if dna[k:k+2] == 'CC' and dna[k+3:k+5] == 'AT':
            motifs += 1
            if dna[k+2] == 'A':
                catbox += 1
    
    percent = 100*catbox/motifs
          

  5. The pattern CCAAT is known as a "cat" box. What are the relative percentage of bases immediately following the pattern CCAA in the dna?
    Guinea Pig
    A: 28.431% C: 31.373% G: 10.784% T: 29.412%
    Human
    A: 39.416% C: 29.197% G: 10.949% T: 20.438%

    ca = 0   # count for A
    cc = 0   # count for C
    cg = 0   # count for G
    ct = 0   # count for T
    for k in range(len(dna)-4):    # N.B. stop value
        if dna[k:k+4] == 'CCAA':
            if dna[k+4] == 'A':
                ca += 1
            elif dna[k+4] == 'C':
                cc += 1
            elif dna[k+4] == 'G':
                cg += 1
            elif dna[k+4] == 'T':
                ct += 1
    
    total = ca+cc+cg+ct
    # ... can then display ca/total, cc/total, and so on
          


Last modified: Thursday, 07 February 2019