[ml] Kaggle HIV update

David Faden dfaden at gmail.com
Tue Jun 22 18:04:03 UTC 2010


Could missing PR.Seq be informative?

> d0 <- read.csv("Downloads/ml-noisebridge/kaggle/training_data.csv",
stringsAsFactors=F)
> d0$PR.Seq.len <- nchar(d0$PR.Seq)
> mean(d0$Resp[d0$PR.Seq.len == 0])
[1] 0.2375
> mean(d0$Resp[d0$PR.Seq.len != 0])
[1] 0.2032609
> sum(d0$PR.Seq.len == 0)
[1] 80

posteriorProb1IsGreater <- function(trials1, trials2, reps=10000) {
  p1 <- rbeta(reps, sum(trials1) + 1, sum(1 - trials1) + 1)
  p2 <- rbeta(reps, sum(trials2) + 1, sum(1 - trials2) + 1)
  return(mean(p1 > p2))
}

> posteriorProb1IsGreater(d0$Resp[d0$PR.Seq.len == 0], d0$Resp[d0$PR.Seq.len
!= 0])
[1] 0.7949

I guess we may want to ignore this anyway though. Well, I will shut up until
I have a model to contribute.

Thanks for setting this up!

David

On Tue, Jun 22, 2010 at 8:37 AM, David Faden <dfaden at gmail.com> wrote:

> It looks like the sequences are already coded in terms of amino acids
> rather than nucleotide triples? <
> http://www.biogem.org/Accelrys/Sequencing/symbols_amino_acids.html>
>
> On Mon, Jun 21, 2010 at 10:29 PM, Thomas Lotze <thomas.lotze at gmail.com>wrote:
>
>> I committed some python for generating base pair triplet count features,
>> and R code for determining frequency and doing a basic GLM including the
>> most frequent triplets.
>> (The Noisebridge machine learning sourceforge git repository is here:
>> https://sourceforge.net/scm/?type=git&group_id=326816  To download the
>> files, run "git clone git://
>> ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge"
>> or, better yet, ask Mike to give you read/write access to this project so
>> you can upload code as well)
>>
>> This got me to 53.8462 MCE, 36th out of 49 teams.
>>
>> See you tomorrow night at 9 for fun with Hadoop!
>> -Thomas
>>
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.noisebridge.net/pipermail/ml/attachments/20100622/fffc0a88/attachment.html>


More information about the ml mailing list