nltk.IBMModel5.sample

IBMModel5.sample(sentence_pair)[source]

Sample the most probable alignments from the entire alignment space according to Model 4

Note that Model 4 scoring is used instead of Model 5 because the latter is too expensive to compute.

First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a IBM Model 4. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point. Finally, prune alignments that have substantially lower Model 4 scores than the best alignment.

Parameters:sentence_pair (AlignedSent) – Source and target language sentence pair to generate a sample of alignments from
Returns:A set of best alignments represented by their AlignmentInfo and the best alignment of the set for convenience
Return type:set(AlignmentInfo), AlignmentInfo