[ml] Kaggle HIV update

Mike Schachter mike at mindmech.com
Wed Jun 30 04:23:03 UTC 2010

Thanks David the python script you attached is super helpful!

I'm using it to generate a list of possible amino acids given each
codon. I'll be posting the data some time tomorrow along with some
other things.


On Wed, Jun 23, 2010 at 9:01 AM, David Faden <dfaden at gmail.com> wrote:

> It looks like they cannot be unambiguously mapped to amino acids. I wonder
> if it would be sensible in this case to invent a new symbol expressing all
> the possibilities, eg, just glom together the names of the possible amino
> acids in sorted order. Can we count on them using the same symbol coding for
> multiple nucleotides at the same sites across sequences? -- probably not,
> right? Do the sequence matchers already take care of this?
> >>> import dna
> # First PR.Seq from the training data:
> >>> t =
> >>> dna.DisambiguateAmino(t)Traceback (most recent call last):  File
> "<stdin>", line 1, in <module>
>   File "dna.py", line 40, in DisambiguateAmino
>     raise Exception('Wrong number: <<%s>> for %s' % (',
> '.join(possibilities), triple))
> Exception: Wrong number: <<Arg, Lys>> for ARA
> I hope someone can find use for the dictionaries in the attached code
> anyway.
