It looks like they cannot be unambiguously mapped to amino acids. I wonder if it would be sensible in this case to invent a new symbol expressing all the possibilities, eg, just glom together the names of the possible amino acids in sorted order.<meta charset="utf-8"> Can we count on them using the same symbol coding for multiple nucleotides at the same sites across sequences? -- probably not, right? Do the sequence matchers already take care of this?<div>

<br></div><div><div>>>> import dna</div><div># First PR.Seq from the training data:</div><div>>>> t = 'CCTCAAATCACTCTTTGGCAACGACCCCTCGTCCCAATAAGGATAGGGGGGCAACTAAAGGAAGCYCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGACATGGAGTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAARACAGTATGATCAGRTACCCATAGAAATCTATGGACATAAAGCTGTAGGTACAGTATTAATAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACTTTAAATTTY'</div>

<div>>>> dna.DisambiguateAmino(t)Traceback (most recent call last):  File "<stdin>", line 1, in <module></div><div>  File "dna.py", line 40, in DisambiguateAmino</div><div>    raise Exception('Wrong number: <<%s>> for %s' % (', '.join(possibilities), triple))</div>

<div>Exception: Wrong number: <<Arg, Lys>> for ARA</div></div><div><br></div><div>I hope someone can find use for the dictionaries in the attached code anyway.</div>