nltk.StackDecoder
¶
-
class
nltk.
StackDecoder
(phrase_table, language_model)[source]¶ Phrase-based stack decoder for machine translation
>>> from nltk.translate import PhraseTable >>> phrase_table = PhraseTable() >>> phrase_table.add(('niemand',), ('nobody',), log(0.8)) >>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2)) >>> phrase_table.add(('erwartet',), ('expects',), log(0.8)) >>> phrase_table.add(('erwartet',), ('expecting',), log(0.2)) >>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1)) >>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8)) >>> phrase_table.add(('!',), ('!',), log(0.8))
>>> # nltk.model should be used here once it is implemented >>> from collections import defaultdict >>> language_prob = defaultdict(lambda: -999.0) >>> language_prob[('nobody',)] = log(0.5) >>> language_prob[('expects',)] = log(0.4) >>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2) >>> language_prob[('!',)] = log(0.1) >>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()
>>> stack_decoder = StackDecoder(phrase_table, language_model)
>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!']) ['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']
Methods¶
__init__ (phrase_table, language_model) |
|
||||
compute_future_scores (src_sentence) |
Determines the approximate scores for translating every | ||||
distortion_score (hypothesis, ...) |
|||||
expansion_score (hypothesis, ...) |
Calculate the score of expanding hypothesis with |
||||
find_all_src_phrases (src_sentence) |
Finds all subsequences in src_sentence that have a phrase | ||||
future_score (hypothesis, future_score_table, ...) |
Determines the approximate score for translating the | ||||
translate (src_sentence) |
|
||||
valid_phrases (all_phrases_from, hypothesis) |
Extract phrases from all_phrases_from that contains words |
Attributes¶
distortion_factor |
float: Amount of reordering of source phrases. |