nltk.IBMModel5.__init__

IBMModel5.__init__(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]

Train on sentence_aligned_corpus and create a lexical translation model, vacancy models, a fertility model, and a model for generating NULL-aligned words.

Translation direction is from AlignedSent.mots to AlignedSent.words.

Parameters:
  • sentence_aligned_corpus (list(AlignedSent)) – Sentence-aligned parallel corpus
  • iterations (int) – Number of iterations to run training algorithm
  • source_word_classes (dict[str]: int) – Lookup table that maps a source word to its word class, the latter represented by an integer id
  • target_word_classes (dict[str]: int) – Lookup table that maps a target word to its word class, the latter represented by an integer id
  • probability_tables (dict[str]: object) – Optional. Use this to pass in custom probability values. If not specified, probabilities will be set to a uniform distribution, or some other sensible value. If specified, all the following entries must be present: translation_table, alignment_table, fertility_table, p1, head_distortion_table, non_head_distortion_table, head_vacancy_table, non_head_vacancy_table. See IBMModel, IBMModel4, and IBMModel5 for the type and purpose of these tables.