It looks like they cannot be unambiguously mapped to amino acids. I wonder if it would be sensible in this case to invent a new symbol expressing all the possibilities, eg, just glom together the names of the possible amino acids in sorted order.<meta charset="utf-8"> Can we count on them using the same symbol coding for multiple nucleotides at the same sites across sequences? -- probably not, right? Do the sequence matchers already take care of this?<div>
<br></div><div><div>>>> import dna</div><div># First PR.Seq from the training data:</div><div>>>> t = 'CCTCAAATCACTCTTTGGCAACGACCCCTCGTCCCAATAAGGATAGGGGGGCAACTAAAGGAAGCYCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGACATGGAGTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAARACAGTATGATCAGRTACCCATAGAAATCTATGGACATAAAGCTGTAGGTACAGTATTAATAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACTTTAAATTTY'</div>
<div>>>> dna.DisambiguateAmino(t)Traceback (most recent call last): File "<stdin>", line 1, in <module></div><div> File "dna.py", line 40, in DisambiguateAmino</div><div> raise Exception('Wrong number: <<%s>> for %s' % (', '.join(possibilities), triple))</div>
<div>Exception: Wrong number: <<Arg, Lys>> for ARA</div></div><div><br></div><div>I hope someone can find use for the dictionaries in the attached code anyway.</div>