nltk.cluster.EMClusterer

class nltk.cluster.EMClusterer(initial_means, priors=None, covariance_matrices=None, conv_threshold=1e-06, bias=0.1, normalise=False, svd_dimensions=None)[source]

The Gaussian EM clusterer models the vectors as being produced by a mixture of k Gaussian sources. The parameters of these sources (prior probability, mean and covariance matrix) are then found to maximise the likelihood of the given data. This is done with the expectation maximisation algorithm. It starts with k arbitrarily chosen means, priors and covariance matrices. It then calculates the membership probabilities for each vector in each of the clusters; this is the ‘E’ step. The cluster parameters are then updated in the ‘M’ step using the maximum likelihood estimate from the cluster membership probabilities. This process continues until the likelihood of the data does not significantly increase.

Methods

__init__(initial_means[, priors, ...]) Creates an EM clusterer with the given starting parameters, convergence threshold and vector mangling parameters.
classification_probdist(vector) Classifies the token into a cluster, returning a probability distribution over the cluster identifiers.
classify(vector)
classify_vectorspace(vector)
cluster(vectors[, assign_clusters, trace])
cluster_name(index) Returns the names of the cluster at index.
cluster_names() Returns the names of the clusters.
cluster_vectorspace(vectors[, trace])
likelihood(vector, label)
likelihood_vectorspace(vector, cluster)
num_clusters()
unicode_repr()
vector(vector) Returns the vector after normalisation and dimensionality reduction