nltk.cluster.KMeansClusterer

class nltk.cluster.KMeansClusterer(num_means, distance, repeats=1, conv_test=1e-06, initial_means=None, normalise=False, svd_dimensions=None, rng=None, avoid_empty_clusters=False)[source]

The K-means clusterer starts with k arbitrary chosen means then allocates each vector to the cluster with the closest mean. It then recalculates the means of each cluster as the centroid of the vectors in the cluster. This process repeats until the cluster memberships stabilise. This is a hill-climbing algorithm which may converge to a local maximum. Hence the clustering is often repeated with random initial means and the most commonly occurring output means are chosen.

Methods

__init__(num_means, distance[, repeats, ...])
param num_means:
 the number of means to use (may use fewer)
classification_probdist(vector) Classifies the token into a cluster, returning a probability distribution over the cluster identifiers.
classify(vector)
classify_vectorspace(vector)
cluster(vectors[, assign_clusters, trace])
cluster_name(index) Returns the names of the cluster at index.
cluster_names() Returns the names of the clusters.
cluster_vectorspace(vectors[, trace])
likelihood(vector, label)
likelihood_vectorspace(vector, cluster) Returns the likelihood of the vector belonging to the cluster.
means() The means used for clustering.
num_clusters()
unicode_repr()
vector(vector) Returns the vector after normalisation and dimensionality reduction