gensim.models.Word2Vec.load_word2vec_format¶
-
classmethod
Word2Vec.
load_word2vec_format
(fname, fvocab=None, binary=False, encoding='utf8', unicode_errors='strict', limit=None, datatype=<type 'numpy.float32'>)[source]¶ Load the input-hidden weight matrix from the original C word2vec-tool format.
Note that the information stored in the file is incomplete (the binary tree is missing), so while you can query for word similarity etc., you cannot continue training with a model loaded this way.
binary is a boolean indicating whether the data is in binary word2vec format. norm_only is a boolean indicating whether to only store normalised word2vec vectors in memory. Word counts are read from fvocab filename, if set (this is the file generated by -save-vocab flag of the original C tool).
If you trained the C model using non-utf8 encoding for words, specify that encoding in encoding.
unicode_errors, default ‘strict’, is a string suitable to be passed as the errors argument to the unicode() (Python 2.x) or str() (Python 3.x) function. If your source file may include word tokens truncated in the middle of a multibyte unicode character (as is common from the original word2vec.c tool), ‘ignore’ or ‘replace’ may help.
limit sets a maximum number of word-vectors to read from the file. The default, None, means read all.
datatype (experimental) can coerce dimensions to a non-default float type (such as np.float16) to save memory. (Such types may result in much slower bulk operations or incompatibility with optimized routines.)