[ml] Kaggle HIV update

Mike Schachter mike at mindmech.com
Wed Jun 30 04:23:03 UTC 2010


Thanks David the python script you attached is super helpful!

I'm using it to generate a list of possible amino acids given each
codon. I'll be posting the data some time tomorrow along with some
other things.

  mike



On Wed, Jun 23, 2010 at 9:01 AM, David Faden <dfaden at gmail.com> wrote:

> It looks like they cannot be unambiguously mapped to amino acids. I wonder
> if it would be sensible in this case to invent a new symbol expressing all
> the possibilities, eg, just glom together the names of the possible amino
> acids in sorted order. Can we count on them using the same symbol coding for
> multiple nucleotides at the same sites across sequences? -- probably not,
> right? Do the sequence matchers already take care of this?
>
> >>> import dna
> # First PR.Seq from the training data:
> >>> t =
> 'CCTCAAATCACTCTTTGGCAACGACCCCTCGTCCCAATAAGGATAGGGGGGCAACTAAAGGAAGCYCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGACATGGAGTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAARACAGTATGATCAGRTACCCATAGAAATCTATGGACATAAAGCTGTAGGTACAGTATTAATAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACTTTAAATTTY'
> >>> dna.DisambiguateAmino(t)Traceback (most recent call last):  File
> "<stdin>", line 1, in <module>
>   File "dna.py", line 40, in DisambiguateAmino
>     raise Exception('Wrong number: <<%s>> for %s' % (',
> '.join(possibilities), triple))
> Exception: Wrong number: <<Arg, Lys>> for ARA
>
> I hope someone can find use for the dictionaries in the attached code
> anyway.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20100629/92f10e23/attachment.html>


More information about the ml mailing list