'Convert from Prodigy's JSONL format for labeled NER to spaCy's training format?
I am new to Prodigy and spaCy as well as CLI coding. I'd like to use Prodigy to label my data for an NER model, and then use spaCy in python to create models.
Prodigy outputs in SQLite format. SpaCy takes in this other kind of format, not sure what to call it:
TRAIN_DATA = [
(
"Horses are too tall and they pretend to care about your feelings",
{"entities": [(0, 6, LABEL)]},
),
("Do they bite?", {"entities": []}),
(
"horses are too tall and they pretend to care about your feelings",
{"entities": [(0, 6, LABEL)]},
),
("horses pretend to care about your feelings", {"entities": [(0, 6, LABEL)]}),
(
"they pretend to care about your feelings, those horses",
{"entities": [(48, 54, LABEL)]},
),
("horses?", {"entities": [(0, 6, LABEL)]}),
]
How can I convert from one to the other? It seems like this should be easy, but I cannot find it anywhere.
I have no problem loading in the dataset, just converting.
Solution 1:[1]
Prodigy should export this training format with data-to-spacy
as of version 1.9: https://prodi.gy/docs/recipes#data-to-spacy
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | aab |