'How to overwrite a tag for a named entity with CoreNLP's RegexNER without specifying the original tag
I know that CoreNLP's RegexNER allows me to overwrite a tag using the mapping file. For example; I have the word EGFR which CoreNLP recognizes as an ORGANIZATION. If I have the following line in my mapping file, it still tags it as an ORGANIZATION.
EGFR GENE
If I change that line to look like the following:
EGFR GENE ORGANIZATION
Then CoreNLP tags it as a GENE.
To be able to do this though, I have to know that CoreNLP tags EGFR as an ORGANIZATION and I can't always know that for every word in my mapping file. Now my question is, is there a way to tell the RegexNER to overwrite the tag for EGFR no matter what the original tag is? Something like
EGFR GENE .*
Solution 1:[1]
You can provide a comma separated list of tags that can be overwritten.
For instance:
ORGANIZATION,PERSON,LOCATION,MISC
will allow it to overwrite all of those tags.
I don't think there is an overwrite all option at the moment, so you do have to list each type you want overwritten.
If you always want to overwrite everything with what is in your rules you can supply that with this option to the TokensRegexNERAnnotator
regexner.backgroundSymbol ORGANIZATION,PERSON,LOCATION,MISC,O
And then each rule doesn't have to have a list.
Solution 2:[2]
Great answer by @StanfordNLPHelp
However, if you are using ner.fine for mappings, use properties below to get the overriding -
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner,regexner");
props.setProperty("ner.fine.regexner.mapping", rulesFiles);
// props.put("regexner.backgroundSymbol", "ORGANIZATION,PERSON,LOCATION,MISC,O");
props.put("ner.fine.regexner.backgroundSymbol", "ORGANIZATION,PERSON,LOCATION,MISC,O");
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | StanfordNLPHelp |
Solution 2 | Jatin Sutaria |