[ml] Kaggle HIV update

Mike Schachter mike at mindmech.com
Wed Jun 23 00:27:59 UTC 2010


Hey David,

Unfortunately I don't think the sequences are amino acid sequences.

For the PR sequences, most of them have a length of 297. If it's a
DNA sequence, then this means it codes for 99 amino acids. A quick
look shows that HIV-1 Protease (the protein whose sequence we're
dealing with in the first sequence column) has 99 amino acid pairs:

http://www.bioafrica.net/proteomics/POL-PRprot.html

Does that make sense? If it does, then the sequences from the data are
just noisy and of poor quality, and we're going to have to throw out some
of the noisy data before running it through a sequence aligner. I'm in the
process of doing this now, and will let everyone know how things are coming
along at the meeting.

See everyone tonight!

   mike




On Tue, Jun 22, 2010 at 8:37 AM, David Faden <dfaden at gmail.com> wrote:

> It looks like the sequences are already coded in terms of amino acids
> rather than nucleotide triples? <
> http://www.biogem.org/Accelrys/Sequencing/symbols_amino_acids.html>
>
> On Mon, Jun 21, 2010 at 10:29 PM, Thomas Lotze <thomas.lotze at gmail.com>wrote:
>
>> I committed some python for generating base pair triplet count features,
>> and R code for determining frequency and doing a basic GLM including the
>> most frequent triplets.
>> (The Noisebridge machine learning sourceforge git repository is here:
>> https://sourceforge.net/scm/?type=git&group_id=326816  To download the
>> files, run "git clone git://
>> ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge"
>> or, better yet, ask Mike to give you read/write access to this project so
>> you can upload code as well)
>>
>> This got me to 53.8462 MCE, 36th out of 49 teams.
>>
>> See you tomorrow night at 9 for fun with Hadoop!
>> -Thomas
>>
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>>
>
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20100622/02d237c5/attachment.html>


More information about the ml mailing list