nltk.StackDecoder

class nltk.StackDecoder(phrase_table, language_model)[source]

Phrase-based stack decoder for machine translation

>>> from nltk.translate import PhraseTable
>>> phrase_table = PhraseTable()
>>> phrase_table.add(('niemand',), ('nobody',), log(0.8))
>>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2))
>>> phrase_table.add(('erwartet',), ('expects',), log(0.8))
>>> phrase_table.add(('erwartet',), ('expecting',), log(0.2))
>>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1))
>>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8))
>>> phrase_table.add(('!',), ('!',), log(0.8))
>>> #  nltk.model should be used here once it is implemented
>>> from collections import defaultdict
>>> language_prob = defaultdict(lambda: -999.0)
>>> language_prob[('nobody',)] = log(0.5)
>>> language_prob[('expects',)] = log(0.4)
>>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2)
>>> language_prob[('!',)] = log(0.1)
>>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()
>>> stack_decoder = StackDecoder(phrase_table, language_model)
>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!'])
['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']

Methods

__init__(phrase_table, language_model)
param phrase_table:
 Table of translations for source language
compute_future_scores(src_sentence) Determines the approximate scores for translating every
distortion_score(hypothesis, ...)
expansion_score(hypothesis, ...) Calculate the score of expanding hypothesis with
find_all_src_phrases(src_sentence) Finds all subsequences in src_sentence that have a phrase
future_score(hypothesis, future_score_table, ...) Determines the approximate score for translating the
translate(src_sentence)
param src_sentence:
 Sentence to be translated
valid_phrases(all_phrases_from, hypothesis) Extract phrases from all_phrases_from that contains words

Attributes

distortion_factor float: Amount of reordering of source phrases.