nltk.windowdiff()

nltk.windowdiff(seg1, seg2, k, boundary='1', weighted=False)[source]

Compute the windowdiff score for a pair of segmentations. A segmentation is any sequence over a vocabulary of two items (e.g. “0”, “1”), where the specified boundary value is used to mark the edge of a segmentation.

>>> s1 = "000100000010"
>>> s2 = "000010000100"
>>> s3 = "100000010000"
>>> '%.2f' % windowdiff(s1, s1, 3)
'0.00'
>>> '%.2f' % windowdiff(s1, s2, 3)
'0.30'
>>> '%.2f' % windowdiff(s2, s3, 3)
'0.80'
Parameters:
  • seg1 (str or list) – a segmentation
  • seg2 (str or list) – a segmentation
  • k (int) – window width
  • boundary (str or int or bool) – boundary value
  • weighted (boolean) – use the weighted variant of windowdiff
Return type:

float