'Run ner.manual in Prodigy on csv file
I am new to Prodigy and haven't fully figured out the paradigm. For a project, I would like to manually annotate names from texts. My team has developed our own model to recognize the names, so I only want to use the annotated texts (produced with Prodigy) as a golden standard for our model.
To do so, I have a csv file texts.csv
with the text in one of the columns. Do I need to convert this file into a json, or can I also run Prodigy on the csv file?
Also, what is the code that I need to run to start the ner_manual
with this dataset?
I suppose, I have to start with:
!python -m prodigy ner.manual
However, it is unclear to me how I should run the rest. Can someone help me with this?
Solution 1:[1]
File Format
I believe for the recipes that say "Text Source" you can use jsonl, json, csv, or txt (reference the section that says "Text Source": https://prodi.gy/docs/api-loaders). Ner.manual says "Text Source" so I think it should work. (reference: https://prodi.gy/docs/recipes#ner-manual)
ner.manual
In regards to running ner.manual try taking a look at this documentation https://prodi.gy/docs/
The documentation contains a good example:
python -m prodigy ner.manual ner_news_headlines blank:en ./news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION
- ner_news_headlines is the name of the dataset (it could be named anything)
- blank:en is a blank english model
- ./news_headlines.jsonl is the name of the jsonl file that you will be annotating (use whatever file name your file is)
- PERSON,ORG,PRODUCT,LOCATION are the labels that you will annotate your data with (change these to whatever labels you want to use, be sure to separate with commas not spaces)
I'm also pretty new to prodigy so someone else may have a better answer.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |