nltk.bleu()
¶
-
nltk.
bleu
(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None)¶ Calculate BLEU score (Bilingual Evaluation Understudy) from Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “BLEU: a method for automatic evaluation of machine translation.” In Proceedings of ACL. http://www.aclweb.org/anthology/P02-1040.pdf
>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', ... 'forever', 'hearing', 'the', 'activity', 'guidebook', ... 'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party']
>>> sentence_bleu([reference1, reference2, reference3], hypothesis1) 0.5045...
>>> sentence_bleu([reference1, reference2, reference3], hypothesis2) 0.3969...
The default BLEU calculates a score for up to 4grams using uniform weights. To evaluate your translations with higher/lower order ngrams, use customized weights. E.g. when accounting for up to 6grams with uniform weights:
>>> weights = (0.1666, 0.1666, 0.1666, 0.1666, 0.1666) >>> sentence_bleu([reference1, reference2, reference3], hypothesis1, weights) 0.45838627164939455
Parameters: - references (list(list(str))) – reference sentences
- hypothesis (list(str)) – a hypothesis sentence
- weights (list(float)) – weights for unigrams, bigrams, trigrams and so on
Returns: The sentence-level BLEU score.
Return type: float