BrillTagger.batch_tag_incremental(sequences, gold)[source]

Tags by applying each rule to the entire corpus (rather than all rules to a single sequence). The point is to collect statistics on the test set for individual rules.

NOTE: This is inefficient (does not build any index, so will traverse the entire corpus N times for N rules) – usually you would not care about statistics for individual rules and thus use batch_tag() instead

  • sequences (list of list of strings) – lists of token sequences (sentences, in some applications) to be tagged
  • gold (list of list of strings) – the gold standard

tuple of (tagged_sequences, ordered list of rule scores (one for each rule))