nltk.LazyZip

class nltk.LazyZip(*lists)[source]

A lazy sequence whose elements are tuples, each containing the i-th element from each of the argument sequences. The returned list is truncated in length to the length of the shortest argument sequence. The tuples are constructed lazily – i.e., when you read a value from the list, LazyZip will calculate that value by forming a tuple from the i-th element of each of the argument sequences.

LazyZip is essentially a lazy version of the Python primitive function zip. In particular, an evaluated LazyZip is equivalent to a zip:

>>> from nltk.util import LazyZip
>>> sequence1, sequence2 = [1, 2, 3], ['a', 'b', 'c']
>>> zip(sequence1, sequence2) 
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> list(LazyZip(sequence1, sequence2))
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> sequences = [sequence1, sequence2, [6,7,8,9]]
>>> list(zip(*sequences)) == list(LazyZip(*sequences))
True

Lazy zips can be useful for conserving memory in cases where the argument sequences are particularly long.

A typical example of a use case for this class is combining long sequences of gold standard and predicted values in a classification or tagging task in order to calculate accuracy. By constructing tuples lazily and avoiding the creation of an additional long sequence, memory usage can be significantly reduced.

Methods

__init__(*lists)
param lists:the underlying lists
count(value) Return the number of times this list contains value.
index(value[, start, stop]) Return the index of the first occurrence of value in this list that is greater than or equal to start and less than stop.
iterate_from(index)
unicode_repr() Return a string representation for this corpus view that is similar to a list’s representation; but if it would be more than 60 characters long, it is truncated.