[ml] Edit distance data and code
Adam Bossy
adambossy at gmail.com
Mon Jul 12 20:12:56 UTC 2010
Folks, I've pushed the edit distance data and code to calculate it to
our git repository. There are four files total:
1) src/cluster_rtseq.py - The code that computes the levenshtein
distance between string pairs and calls the scipy clustering algorithm
2) src/sample.py - Sample code for hierarchical clustering with scipy
3) src/print_matrix.py - Print the edit distance matrix (per Theo and
Erin's request)
4) data/similarity_matrix.csv - The output for print_matrix.py. Feel
free to tweak the python file to match your needs
You'll need to install scipy and numpy to run any of this code.
I'm doing this on a remote slice -- if anybody can get this running
with the matplotlib package for visualizations, that would be great.
We could then visualize the dendrogram output. I messed up the Python
install on my Mac so I won't be able to set it up without going
through the painful process of fixing it.
Adam
More information about the ml
mailing list