'Does the IOB tagging method for Named Entity Recognition (NER) has any advantage in terms of model accuracy or computational time?
Can we do NER without the IOB tags and with only the entities as labels? I am specifically working on token classification for visual documents like receipts. For example, This HuggingFace tutorial for LayoutLM on the CORD dataset for receipt information extraction does not use the IOB scheme.
I have trained the LayoutLMv2 model without IOB tagging and it trains well. But will doing it with IOB tags make any difference?
Solution 1:[1]
Imagine that your text is "... dark blue light green ...", where "dark blue" and "light green" are two different colours. If you want to make sure your model understands this difference, you should use IOB to check that the result is I-Color I-Color B-Color I-Color. If you just care that the model classifies these words as colours, no IOB tagging is needed.
From this it can be quite clear that the chosen tagging influences the performance metrics. Finding the correct tag-class couple is more complicated than just finding the correct class. In terms of computational time, I believe it has a minor impact due to the increase in total number of classes when including also IOB tags.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Leo |