I checked out 'Bishop's Machine Learning book':<div><br></div><div><a href="http://research.microsoft.com/en-us/um/people/cmbishop/prml/">http://research.microsoft.com/en-us/um/people/cmbishop/prml/</a><br><br>
</div><div>*Very* god resource. The sample chapter to download (chapter 8) deals quite well with this type of info. I haven't had any of this theory before, so I think the answer to most of my questions are to sit down and go through this...</div>
<div><br></div><div>It would be more fun with help and nudges from peers, however..</div><div><br></div><div>Cheers,</div><div><br></div><div><br></div><div>Glen</div><div><br></div><div><br><div class="gmail_quote">On Tue, Jul 13, 2010 at 2:50 PM, Vicente Malave <span dir="ltr"><<a href="mailto:vicente.malave@gmail.com">vicente.malave@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">HMMs are made of a few building blocks, which are each conceptually<br>
simple, but it can be a lot to digest all at once. Some of these are<br>
identified here, like conditional probability, bayes rule. A few more<br>
are the Viterbi algorithm to perform inference, which is a special<br>
case of dynamic programming, and then EM to actually learn the<br>
parameters, the notion of having a generative model, latent variables,<br>
what 'Markov' means.<br>
<br>
I have a specific toy example I'd like to to work through (and write<br>
code for!) because its tangentially relevant to my current job:<br>
classifying the language a word is from given the string of<br>
characters: like are these words probably English or something else<br>
using the same character set. The nice thing is that you'll be able to<br>
sort of understand it and sanity check it by inspection, and you can<br>
see if the HMM produces gibberish that is recognizably English<br>
gibberish.<br>
<br>
I usually like Bishop's Machine Learning book if you have or can<br>
obtain a copy of that. I usually do not like Wikipedia entries about<br>
math/cs topics because the quality is widely varying.<br>
<br>
Something free is the classic tutorial paper that introduced them to<br>
the field (from 1989)<br>
(this link should work for everyone, let me know if not, all journals<br>
should be open access)<br>
<a href="http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf" target="_blank">http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf</a><br>
but it is kind of dense. Actually the wikipedia page seems to follow<br>
the outline of Rabiner's presentation.<br>
<br>
Oh yeah, I'm going to your ML group meeting and look forward to<br>
meeting you guys.<br>
<br>
Vicente Malave<br>
<div><div></div><div class="h5"><br>
<br>
<br>
On Tue, Jul 13, 2010 at 2:16 PM, Glen Jarvis <<a href="mailto:glen@glenjarvis.com">glen@glenjarvis.com</a>> wrote:<br>
> I was invited to this list for questions I have about Hidden Markov Models.<br>
> Please forgive the extra chatter.<br>
><br>
> I work at a Phylogenomics lab and we do HMM stuff all the time -- with<br>
> either the SAM software tool (<a href="http://compbio.soe.ucsc.edu/sam.html" target="_blank">http://compbio.soe.ucsc.edu/sam.html</a>), or the<br>
> newer tool HMMR (<a href="http://hmmer.janelia.org/" target="_blank">http://hmmer.janelia.org/</a>). Conceptually I have 'enough'<br>
> knowledge -- like a user to a pc has enough to be able to point and click to<br>
> read their email.<br>
> But, the fundamental "could I write my own toy example" always escapes me.<br>
> Here's what I know (from the ground up):<br>
> Conditional Probability:<br>
> Although I never reviewed this Wiki page, it's probably a fairly good<br>
> introduction:<br>
> <a href="http://en.wikipedia.org/wiki/Conditional_probability" target="_blank">http://en.wikipedia.org/wiki/Conditional_probability</a><br>
><br>
> Baye's Theorem:<br>
> This wiki page doesn't explain it in a clear way like I would to<br>
> someone just starting off. I'd start with a conditional tree (classic text<br>
> book examples are - base decision (root of the tree) is "if the owner<br>
> purchased a warranty or not (either yes (left branch) or no (right branch).<br>
> And, then a second condition -- let's say the radio breaks (another fork of<br>
> the left tree), for example. Then, we can say things like "what's the<br>
> probability that, given the condition that someone purchased a warranty,<br>
> what's the probability that they have a broken radio)... Obviously this<br>
> would be much clear with a nice whiteboard and a decision tree. Baye's<br>
> theorem has always twisted a mind or two in probability class - but it's<br>
> actually not that hard once you get your mind around it (although I need a<br>
> refresher).<br>
> <a href="http://en.wikipedia.org/wiki/Bayes'_theorem" target="_blank">http://en.wikipedia.org/wiki/Bayes'_theorem</a><br>
><br>
> The Simplest Baysean Network<br>
> I'm obviously working myself up to the Wiki page for an HMM<br>
> (<a href="http://en.wikipedia.org/wiki/Hidden_Markov_model" target="_blank">http://en.wikipedia.org/wiki/Hidden_Markov_model</a>). This explains that an<br>
> HMM is "A HMM can be considered as the simplest dynamic Bayesian network."<br>
> [1]).<br>
> But, the simples dynamic Bayesian network Wiki page<br>
> (<a href="http://en.wikipedia.org/wiki/Dynamic_Bayesian_network" target="_blank">http://en.wikipedia.org/wiki/Dynamic_Bayesian_network</a>) used to talk about<br>
> grass, water and sprinklers (as an example). Although I didn't get it 100%,<br>
> I always felt comfortable there was a real-world example.<br>
> With all of that said, I still couldn't "do it." How was this different<br>
> than a Baysean decision tree? If I had a simple exercise that I could work<br>
> through, with an 'answer to compare to', it would help.<br>
><br>
> Best of all, I'd like to write a very small toy program. I'm most<br>
> comfortable with Protein Multiple Sequence Alignments (MSA), so if I could<br>
> make a very basic hmm from a small MSA, it'd really help..<br>
> Can anyone explain to me, like they were speaking to a 6-year old, how a<br>
> simple baysean network (HMM) can be created. If you want an example, imagine<br>
> the following:<br>
><br>
><br>
><br>
> * There are 20 letters of an alphabet (all letters except JOBZUX<br>
> (interesting way to remember it :))<br>
> * An example of a set of sequences that have been aligned follows:<br>
> DLITPLHTYMITGN-VCVHVIKKIIELGGDMDMKCV<br>
> NLITPLHSYLRRDELISASVLKKVIELGADRNLRCC<br>
> HLITPLHSYLRRDESISASVLKKVIELGADRNLRCC<br>
> HLITPLHSYLRRDESISASVLKKVIELGADRNLRCC<br>
> HLITPLHSYLRRDESISASVLKKVIELGADRNLRCC<br>
> DLITPLHTYMITGN-VCVHVIKKIIELGGDMDMKCV<br>
> NLITPLHTYMITGN-VCVHVIKKIIELGGDMDMKCI<br>
> NLITPLHTYMITGN-VCVDVIKKIIELGGDMDMKCV<br>
> DLITPLHTYTITGN-VCAYVIKKIIELGGDMDMKCV<br>
> DLITPLHTYMITGN-VCVHVIKKIIELG--------<br>
> So, in this example, the first column is more variable than the second or<br>
> the fourteenth.<br>
> I should be able to score an HMM and see if it matches another sequence<br>
> (notice lower case letters mean an HMM insert state):<br>
> masltehaivnvrkliystcledfdnristnarinnydpddgycsdgdiysynhtvrykhikvfkkkyyg<br>
> idnrqrqqytdsktalidiigsmilmlkadrknkslvdqykkfvkyiikdnksktanhvfdipnngdmdi<br>
> lytyfnsprtrcikldlikymvdvgivnlnyvckktgygilhaylgnmnvdidilewlcnngvdvnlqns<br>
> ......................................................................<br>
> ...............NLITPLHTYMITGN.VCVDVIKKIIELGGDMDMKCVngmspimtymtnidnvnpe<br>
> itnayiesldgdkvknipmilhsyitlarnidisvvysflqpgvklhykdsagrtclhqyilrhnistni<br>
> ikllheygndvnepdnigntvlhtylsmlsvvhildpetdndirldviqcllslgaditavnclgytplt<br>
> syictaqnymyydiidclisdkvlnmvkhrilqdllirvddtpciihhiiakyniptdlytdeyepydst<br>
> dihdvyhcaiierynnavcetsgmtplhvsiishtnanivmdsfvyllsiqaniniptkngvdplmltme<br>
> nnmlsghqwylvknildkrpnvdivisfldkcyaagkfpslllseddiikptlrlalmlagldycnkcie<br>
> ymerdiaildnshamflafdklvsirdnidkltklhinsrsnisiydilvskcykediithrenhnlvac<br>
> chgndplydiinkyitdarsmyyiandisryimdmypvmripvpllfsciigifrltyfkkiiidrhhds<br>
> finarltdea<br>
><br>
> I know we can use a substitution matrix (i.e., blosum62;<br>
> <a href="http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm" target="_blank">http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm</a>) to say how likely<br>
> it is for one of the letters to change to another letter, but how do I<br>
> incorporate that likelihood given that likelihood of the previous letter<br>
> (the likelihood that the second column is an L, given the first column is a<br>
> D (obviously pretty high in the above example).<br>
> Here's a snippet of what was generated for this example:<br>
> HMM A C D E F G H<br>
> I K L M N P Q R S<br>
> T V W Y<br>
> m->m m->i m->d i->m i->i d->m d->d<br>
> COMPO 2.95757 3.22921 2.84828 2.93689 4.02506 2.84493 3.33255<br>
> 2.33133 2.65067 2.12155 3.45566 3.32303 3.36052 3.85947 3.13260<br>
> 2.99602 2.82345 2.43542 5.63061 3.48580<br>
> 2.68618 4.42225 2.77519 2.73123 3.46354 2.40513 3.72494<br>
> 3.29354 2.67741 2.69355 4.24690 2.90347 2.73739 3.18146 2.89801<br>
> 2.37887 2.77519 2.98518 4.58477 3.61503<br>
> 0.01686 4.48732 5.20967 0.61958 0.77255 0.00000 *<br>
> 1 3.06925 5.62515 1.55218 2.36308 4.87352 3.48242 2.02657<br>
> 4.44300 2.84967 3.92317 4.73346 1.82577 4.04654 3.07283 3.40453<br>
> 2.97597 3.33762 4.00967 6.04161 4.55323 1 x -<br>
> 2.68618 4.42225 2.77519 2.73123 3.46354 2.40513 3.72494<br>
> 3.29354 2.67741 2.69355 4.24690 2.90347 2.73739 3.18146 2.89801<br>
> 2.37887 2.77519 2.98518 4.58477 3.61503<br>
> 0.01686 4.48732 5.20967 0.61958 0.77255 0.48576 0.95510<br>
><br>
> Err.. at this point I'm so confused on what the actual product of an HMM<br>
> actually is, I forget all I did know...<br>
> Has anyone worked with HMMs in a different context? Are there toy examples<br>
> we could write? Do you have any input on this?<br>
><br>
><br>
> Cheers,<br>
><br>
> Glen<br>
> [1] <a href="http://en.wikipedia.org/wiki/Hidden_Markov_model" target="_blank">http://en.wikipedia.org/wiki/Hidden_Markov_model</a><br>
> --<br>
> Whatever you can do or imagine, begin it;<br>
> boldness has beauty, magic, and power in it.<br>
><br>
> -- Goethe<br>
><br>
</div></div>> _______________________________________________<br>
> ml mailing list<br>
> <a href="mailto:ml@lists.noisebridge.net">ml@lists.noisebridge.net</a><br>
> <a href="https://www.noisebridge.net/mailman/listinfo/ml" target="_blank">https://www.noisebridge.net/mailman/listinfo/ml</a><br>
><br>
><br>
</blockquote></div><br><br clear="all"><br>-- <br>Whatever you can do or imagine, begin it;<br>boldness has beauty, magic, and power in it.<br><br>-- Goethe <br>
</div>