nltk.IBMModel1

class nltk.IBMModel1(sentence_aligned_corpus, iterations, probability_tables=None)[source]

Lexical translation model that ignores word order

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm1 = IBMModel1(bitext, 5)
>>> print(ibm1.translation_table['buch']['book'])
0.889...
>>> print(ibm1.translation_table['das']['book'])
0.061...
>>> print(ibm1.translation_table['buch'][None])
0.113...
>>> print(ibm1.translation_table['ja'][None])
0.072...
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])

Methods

__init__(sentence_aligned_corpus, iterations) Train on sentence_aligned_corpus and create a lexical translation model.
best_model2_alignment(sentence_pair[, ...]) Finds the best alignment according to IBM Model 2
hillclimb(alignment_info[, j_pegged]) Starting from the alignment in alignment_info, look at
init_vocab(sentence_aligned_corpus)
maximize_fertility_probabilities(counts)
maximize_lexical_translation_probabilities(counts)
maximize_null_generation_probabilities(counts)
neighboring(alignment_info[, j_pegged]) Determine the neighbors of alignment_info, obtained by
prob_alignment_point(s, t) Probability that word t in the target sentence is aligned to
prob_all_alignments(src_sentence, trg_sentence) Computes the probability of all possible word alignments,
prob_of_alignments(alignments)
prob_t_a_given_s(alignment_info) Probability of target sentence and an alignment given the
reset_probabilities()
sample(sentence_pair) Sample the most probable alignments from the entire alignment
set_uniform_probabilities(...)
train(parallel_corpus)

Attributes

MIN_PROB