nltk.cluster.KMeansClusterer.__init__

KMeansClusterer.__init__(num_means, distance, repeats=1, conv_test=1e-06, initial_means=None, normalise=False, svd_dimensions=None, rng=None, avoid_empty_clusters=False)[source]
Parameters:
  • num_means (int) – the number of means to use (may use fewer)
  • distance (function taking two vectors and returing a float) – measure of distance between two vectors
  • repeats (int) – number of randomised clustering trials to use
  • conv_test (number) – maximum variation in mean differences before deemed convergent
  • initial_means (sequence of vectors) – set of k initial means
  • normalise (boolean) – should vectors be normalised to length 1
  • svd_dimensions (int) – number of dimensions to use in reducing vector dimensionsionality with SVD
  • rng (Random) – random number generator (or None)
  • avoid_empty_clusters (boolean) – include current centroid in computation of next one; avoids undefined behavior when clusters become empty