nltk.cluster.GAAClusterer

class nltk.cluster.GAAClusterer(num_clusters=1, normalise=True, svd_dimensions=None)[source]

The Group Average Agglomerative starts with each of the N vectors as singleton clusters. It then iteratively merges pairs of clusters which have the closest centroids. This continues until there is only one cluster. The order of merges gives rise to a dendrogram: a tree with the earlier merges lower than later merges. The membership of a given number of clusters c, 1 <= c <= N, can be found by cutting the dendrogram at depth c.

This clusterer uses the cosine similarity metric only, which allows for efficient speed-up in the clustering process.

Methods

__init__([num_clusters, normalise, ...])
classification_probdist(vector) Classifies the token into a cluster, returning a probability distribution over the cluster identifiers.
classify(vector)
classify_vectorspace(vector)
cluster(vectors[, assign_clusters, trace])
cluster_name(index) Returns the names of the cluster at index.
cluster_names() Returns the names of the clusters.
cluster_vectorspace(vectors[, trace])
dendrogram()
return:The dendrogram representing the current clustering
likelihood(vector, label)
likelihood_vectorspace(vector, cluster) Returns the likelihood of the vector belonging to the cluster.
num_clusters()
unicode_repr()
update_clusters(num_clusters)
vector(vector) Returns the vector after normalisation and dimensionality reduction