[ml] k-means data clustering... Bueller?

Thu Jun 18 00:42:58 UTC 2009

I've attached some C code for k-means. I think this was partially
automatically translated from Fortran so is somewhat ugly. The author
has okayed its use. (I guess you might want to go with a GPL
implementation just to be perfectly safe though.) I believe it should
build with no dependencies.

David

On Tue, Jun 16, 2009 at 5:19 PM, Almir Karic<almir at kiberpipa.org> wrote:
> i have toyed with it, scipy.cluster.vq.kmeans2 :D does pretty much everything
> we need/want, you can pass it means or you can tell it how many means you want
> and it randomly picks them (ofcourse in both cases telling you to which of the
> means your vectors are closest to)
>
> On Tue, Jun 16, 2009 at 03:04:02PM -0700, Josh Myer wrote:
>> Following up on Michael's post about HMM libraries, has anyone started
>> on the k-means part of the wiimote stuff?  Any libraries to post
>> about, etc?
>> --
>> Josh Myer   650.248.3796
>>   josh at joshisanerd.com
>> _______________________________________________
>> ml mailing list
>> ml at lists.noisebridge.net
>> https://www.noisebridge.net/mailman/listinfo/ml
> _______________________________________________
> ml mailing list
> ml at lists.noisebridge.net
> https://www.noisebridge.net/mailman/listinfo/ml
>

-- 
David Faden, dfaden at iastate.edu
AIM: pitulx
-------------- next part --------------
#ifndef __KMEANS_H__
#define __KMEANS_H__

/**
 * Run k-means on one-dimensional data.
 * Returns a nonzero value to indicate failure.
 */
int kmeans1dCenters(const double* y, int len,
		    double* centers, int k);

/**
 * Run k-means on one-dimensional data.
 * Returns a nonzero value to indicate failure.
 *
 * classes -- len-length array giving final closest center for ith point
 * counts -- k-length array giving number of points assigned to ith center.
 */
int kmeans1d(const double* y, int len, double* centers,
	      int k, int* classes, int* counts);

/** ALGORITHM AS 136  APPL. STATIST. (1979) VOL.28, NO.1 
 *  Divide M points in N-dimensional space into K clusters so that the within 
 *  cluster sum of squares is minimized.
 *
 *  a -- m x n input array
 *  m -- number of points/rows in a
 *  n -- dimension of a point (length of a row in a)
 *  c -- k x n array of centers
 *  k -- number of clusters
 *  ic1 -- m-length workspace (closest center to ith point)
 *  nc -- k-length workspace (number of points in jth cluster)
 *  iter -- max num of iterations
 *  wss -- k-length workspace
 *  ifault -- a nonzero value indicates an error
*/ 
void kmeans(const double * const * a, int m, int n,
	    double **c, int k, int *ic1, int *nc,
	    int iter, double *wss, int *ifault);

#endif

-------------- next part --------------
#ifndef __KMEANS_H__
#define __KMEANS_H__

/**
 * Run k-means on one-dimensional data.
 * Returns a nonzero value to indicate failure.
 */
int kmeans1dCenters(const double* y, int len,
		    double* centers, int k);

/**
 * Run k-means on one-dimensional data.
 * Returns a nonzero value to indicate failure.
 *
 * classes -- len-length array giving final closest center for ith point
 * counts -- k-length array giving number of points assigned to ith center.
 */
int kmeans1d(const double* y, int len, double* centers,
	      int k, int* classes, int* counts);

/** ALGORITHM AS 136  APPL. STATIST. (1979) VOL.28, NO.1 
 *  Divide M points in N-dimensional space into K clusters so that the within 
 *  cluster sum of squares is minimized.
 *
 *  a -- m x n input array
 *  m -- number of points/rows in a
 *  n -- dimension of a point (length of a row in a)
 *  c -- k x n array of centers
 *  k -- number of clusters
 *  ic1 -- m-length workspace (closest center to ith point)
 *  nc -- k-length workspace (number of points in jth cluster)
 *  iter -- max num of iterations
 *  wss -- k-length workspace
 *  ifault -- a nonzero value indicates an error
*/ 
void kmeans(const double * const * a, int m, int n,
	    double **c, int k, int *ic1, int *nc,
	    int iter, double *wss, int *ifault);

#endif