We demonstrate the convenience of index-based loops when needing to examine relative neighbors of an element. But great care is needed to manage the precise choice of ranges for the indices.
You can download all of our solutions as lab02Alt.py
How often is a base followed immediately by the same base?
If this were completely random, we'd expect this to be 0.25.
We consider every index j up to, but not including, len(dna)-1 as the first of the two neighbors.count = 0 for j in range(len(dna)-1): # len(dna)-1 possible starting locations if dna[j] == dna[j+1]: count += 1 percent = count/float(len(dna)-1)
What percentage of consecutive bases are pattern 'AT'?
If this were completely random, we'd expect 1/16=0.0625 of the pairs,
yet we observe 0.094.
We let index j vary anywhere from 0 to len(dna)-2, as a potential starting place for an AT occurrence.count = 0 for j in range(len(dna)-1): # len(dna)-1 possible starting locations if dna[j:j+2] == 'AT': count += 1 percent = count/float(len(dna)-1)
What are the relative percentage of bases that immediately
follow an 'A'?
We find the following:
| A: | 0.3160823594880356 |
| C: | 0.23298089408273048 |
| G: | 0.15785568540159525 |
| T: | 0.29308106102763865 |
countA = countC = countG = countT = 0 for j in range(len(dna)-1): # len(dna)-1 possible starting locations if dna[j] == 'A': if dna[j+1] == 'A': countA += 1 elif dna[j+1] == 'C': countC += 1 elif dna[j+1] == 'G': countG += 1 else: countT += 1 total = float(countA + countC + countG + countT) print("percent of bases following an A:") print('A', countA/total) print('C', countC/total) print('G', countG/total) print('T', countT/total)
What percentage of the time is a base the same as the base that was TWO earlier?
If this were completely random, we'd expect 0.25; we
observe 0.264599083279.
Since we need to track not only the previous character but the one before that as well, we might choose to keep two such variables and carefully updating the roles after each pass of the loop.count = 0 for j in range(len(dna)-3): # only len(dna)-3 possible locations for first of the two if dna[j] == dna[j+2]: count += 1 print("percent of base matches two apart:") print(count/float(len(dna)-2))Note that we could also write this loop by having j represent the location of the latter of the two positions of interest.count = 0 for j in range(2, len(dna)): # start j at 2 if dna[j-2] == dna[j]: count += 1 print("percent of base matches two apart:") print(count/float(len(dna)-2))
How many times does the pattern CCAAT occur?
We want you to determine this WITHOUT use of the built-in count method.
Hint: Keep track of a sliding window of the most recent five characters
The sliding window approach can now be extended to a longer window.count = 0 for j in range(len(dna)-4): if dna[j:j+5] == 'CCAAT': count += 1
What is the length of the longest consecutive sequence of a repeated base and which base is it?
As discovered in the first lab, there are 9 consecutive A's;
this is the longest such streak for any base.
This task doesn't really seem that amenable to an index-based approach. (But once we introduce while loops in Python, we might consider another approach for this task.)