nltk.pad_sequence()

nltk.pad_sequence(sequence, n, pad_left=False, pad_right=False, left_pad_symbol=None, right_pad_symbol=None)[source]

Returns a padded sequence of items before ngram extraction.

>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
['<s>', 1, 2, 3, 4, 5, '</s>']
>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>'))
['<s>', 1, 2, 3, 4, 5]
>>> list(pad_sequence([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>'))
[1, 2, 3, 4, 5, '</s>']
Parameters:
  • sequence (sequence or iter) – the source data to be padded
  • n (int) – the degree of the ngrams
  • pad_left (bool) – whether the ngrams should be left-padded
  • pad_right (bool) – whether the ngrams should be right-padded
  • left_pad_symbol (any) – the symbol to use for left padding (default is None)
  • right_pad_symbol (any) – the symbol to use for right padding (default is None)
Return type:

sequence or iter