It looks like the sequences are already coded in terms of amino acids rather than nucleotide triples? <<a href="http://www.biogem.org/Accelrys/Sequencing/symbols_amino_acids.html">http://www.biogem.org/Accelrys/Sequencing/symbols_amino_acids.html</a>><br>

<br><div class="gmail_quote">On Mon, Jun 21, 2010 at 10:29 PM, Thomas Lotze <span dir="ltr"><<a href="mailto:thomas.lotze@gmail.com">thomas.lotze@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

I committed some python for generating base pair triplet count features, and R code for determining frequency and doing a basic GLM including the most frequent triplets.<br>(The Noisebridge machine learning sourceforge git repository is here: <a href="https://sourceforge.net/scm/?type=git&group_id=326816" target="_blank">https://sourceforge.net/scm/?type=git&group_id=326816</a>  To download the files, run "git clone git://<a href="http://ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge" target="_blank">ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge</a>" or, better yet, ask Mike to give you read/write access to this project so you can upload code as well)<br>


<br>This got me to 53.8462 MCE, 36th out of 49 teams.<br><br>See you tomorrow night at 9 for fun with Hadoop!<br><font color="#888888">-Thomas<br>

</font><br>_______________________________________________<br>

ml mailing list<br>

<a href="mailto:ml@lists.noisebridge.net">ml@lists.noisebridge.net</a><br>

<a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>

<br></blockquote></div><br>