'Add custom NER to Spacy 3 pipeline
I am trying to build a custom Spacy pipeline based off the en_core_web_sm pipeline. From what I can tell the ner has been added correctly as it is displayed in the pipe names when printed(see below). For some reason when the model is tested on text I am not getting any results but when the custom ner is used by itself the correct entities are extracted and labelled. I am using Spacy 3.0.8 and en_core_web_sm pipeline 3.0.0.
import spacy
crypto_nlp = spacy.load('model-best')
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner")
print(nlp.pipe_names)
text = 'Ethereum'
doc = nlp(text)
for ent in doc.ents:
print(ent.text, ent.label_)
Output: '['tok2vec', 'tagger', 'parser', 'crypto_ner', 'ner', 'attribute_ruler', 'lemmatizer']'
But when I use my ner model:
doc = crypto_nlp(text)
for ent in doc.ents:
print(ent.text, ent.label_)
Output: 'Ethereum ETH'
Solution 1:[1]
It's not clear from the details in the question, but my guess is that your crypto_nlp
ner
depends on a separate tok2vec
component that's not being included when you source.
Since this tok2vec
won't be shared, it's easiest to modify the ner
component to include a standalone copy of the tok2vec
, which is called "replacing listeners": https://spacy.io/api/language#replace_listeners
If crypto_nlp
has nlp.pipe_names
as ['tok2vec', 'ner']
, then this should replace the listener before loading it into the second pipeline, so it's now a standalone component:
crypto_nlp.replace_listeners("tok2vec", "ner", ["model.tok2vec"])
nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | aab |