[ml] Kaggle HIV update
David Faden
dfaden at gmail.com
Tue Jun 22 18:04:03 UTC 2010
Could missing PR.Seq be informative?
> d0 <- read.csv("Downloads/ml-noisebridge/kaggle/training_data.csv",
stringsAsFactors=F)
> d0$PR.Seq.len <- nchar(d0$PR.Seq)
> mean(d0$Resp[d0$PR.Seq.len == 0])
[1] 0.2375
> mean(d0$Resp[d0$PR.Seq.len != 0])
[1] 0.2032609
> sum(d0$PR.Seq.len == 0)
[1] 80
posteriorProb1IsGreater <- function(trials1, trials2, reps=10000) {
p1 <- rbeta(reps, sum(trials1) + 1, sum(1 - trials1) + 1)
p2 <- rbeta(reps, sum(trials2) + 1, sum(1 - trials2) + 1)
return(mean(p1 > p2))
}
> posteriorProb1IsGreater(d0$Resp[d0$PR.Seq.len == 0], d0$Resp[d0$PR.Seq.len
!= 0])
[1] 0.7949
I guess we may want to ignore this anyway though. Well, I will shut up until
I have a model to contribute.
Thanks for setting this up!
David
On Tue, Jun 22, 2010 at 8:37 AM, David Faden <dfaden at gmail.com> wrote:
> It looks like the sequences are already coded in terms of amino acids
> rather than nucleotide triples? <
> http://www.biogem.org/Accelrys/Sequencing/symbols_amino_acids.html>
>
> On Mon, Jun 21, 2010 at 10:29 PM, Thomas Lotze <thomas.lotze at gmail.com>wrote:
>
>> I committed some python for generating base pair triplet count features,
>> and R code for determining frequency and doing a basic GLM including the
>> most frequent triplets.
>> (The Noisebridge machine learning sourceforge git repository is here:
>> https://sourceforge.net/scm/?type=git&group_id=326816 To download the
>> files, run "git clone git://
>> ml-noisebridge.git.sourceforge.net/gitroot/ml-noisebridge/ml-noisebridge"
>> or, better yet, ask Mike to give you read/write access to this project so
>> you can upload code as well)
>>
>> This got me to 53.8462 MCE, 36th out of 49 teams.
>>
>> See you tomorrow night at 9 for fun with Hadoop!
>> -Thomas
>>
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.noisebridge.net/pipermail/ml/attachments/20100622/fffc0a88/attachment-0003.html>
More information about the ml
mailing list