nltk.IBMModel5

class nltk.IBMModel5(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]

Translation model that keeps track of vacant positions in the target sentence to decide where to place translated words

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
>>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 }
>>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm5 = IBMModel5(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm5.head_vacancy_table[1][1][1], 3))
1.0
>>> print(round(ibm5.head_vacancy_table[2][1][1], 3))
0.0
>>> print(round(ibm5.non_head_vacancy_table[3][3][6], 3))
1.0
>>> print(round(ibm5.fertility_table[2]['summarize'], 3))
1.0
>>> print(round(ibm5.fertility_table[1]['book'], 3))
1.0
>>> print(ibm5.p1)
0.033...
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])

Methods

__init__(sentence_aligned_corpus, ...[, ...]) Train on sentence_aligned_corpus and create a lexical translation model, vacancy models, a fertility model, and a model for generating NULL-aligned words.
best_model2_alignment(sentence_pair[, ...]) Finds the best alignment according to IBM Model 2
hillclimb(alignment_info[, j_pegged]) Starting from the alignment in alignment_info, look at
init_vocab(sentence_aligned_corpus)
maximize_fertility_probabilities(counts)
maximize_lexical_translation_probabilities(counts)
maximize_null_generation_probabilities(counts)
maximize_vacancy_probabilities(counts)
neighboring(alignment_info[, j_pegged]) Determine the neighbors of alignment_info, obtained by
prob_of_alignments(alignments)
prob_t_a_given_s(alignment_info) Probability of target sentence and an alignment given the
prune(alignment_infos) Removes alignments from alignment_infos that have
reset_probabilities()
sample(sentence_pair) Sample the most probable alignments from the entire alignment
set_uniform_probabilities(...) Set vacancy probabilities uniformly to
train(parallel_corpus)

Attributes

MIN_PROB
MIN_SCORE_FACTOR Alignments with scores below this factor are pruned during sampling