[ml] Kaggle HIV update
Mike Schachter
mike at mindmech.com
Wed Jun 30 04:23:03 UTC 2010
Thanks David the python script you attached is super helpful!
I'm using it to generate a list of possible amino acids given each
codon. I'll be posting the data some time tomorrow along with some
other things.
mike
On Wed, Jun 23, 2010 at 9:01 AM, David Faden <dfaden at gmail.com> wrote:
> It looks like they cannot be unambiguously mapped to amino acids. I wonder
> if it would be sensible in this case to invent a new symbol expressing all
> the possibilities, eg, just glom together the names of the possible amino
> acids in sorted order. Can we count on them using the same symbol coding for
> multiple nucleotides at the same sites across sequences? -- probably not,
> right? Do the sequence matchers already take care of this?
>
> >>> import dna
> # First PR.Seq from the training data:
> >>> t =
> 'CCTCAAATCACTCTTTGGCAACGACCCCTCGTCCCAATAAGGATAGGGGGGCAACTAAAGGAAGCYCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGACATGGAGTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAARACAGTATGATCAGRTACCCATAGAAATCTATGGACATAAAGCTGTAGGTACAGTATTAATAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACTTTAAATTTY'
> >>> dna.DisambiguateAmino(t)Traceback (most recent call last): File
> "<stdin>", line 1, in <module>
> File "dna.py", line 40, in DisambiguateAmino
> raise Exception('Wrong number: <<%s>> for %s' % (',
> '.join(possibilities), triple))
> Exception: Wrong number: <<Arg, Lys>> for ARA
>
> I hope someone can find use for the dictionaries in the attached code
> anyway.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20100629/92f10e23/attachment.html>
More information about the ml
mailing list