'Huggingface models only work once, then spit out Tokenizer error
i am following along with this example on huggingface's website, trying to work with twitter sentiment. I am running python 3.9 on PyCharm. the code works fine the first time i run it, however if i try to run the code again with no changes i get this error:
OSError: Can't load tokenizer for 'cardiffnlp/twitter-roberta-base-emotion'. Make sure that:
- 'cardiffnlp/twitter-roberta-base-emotion' is a correct model identifier listed on 'https://huggingface.co/models'
(make sure 'cardiffnlp/twitter-roberta-base-emotion' is not a path to a local directory with something else, in that case)
- or 'cardiffnlp/twitter-roberta-base-emotion' is the correct path to a directory containing relevant tokenizer file,
one thing i did notice is that Pycharm will create a folder named "cardiffnlp" with subfolders corresponding to the different tasks, such as "twitter-roberta-base-sentiment" in my PyCharm project folder, right above my "venv" folder. However if i delete the "twitter-roberta-base-sentiment" folder that was created the first time the code succesfully ran, the code will work fine, and "twitter-roberta-base-sentiment" folder will show up again.
My guess is that this part of the code is downloading and saving the model to Pycharm. i just dont understand why it works the first time only. do i need to change the model location since it no loger needs to go to the URL to get the file if its already stored locally?
# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
html = f.read().decode('utf-8').split("\n")
csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]
Tnanks for the help guys.
Solution 1:[1]
import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification
tokenizer = RobertaTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-emotion")
model = RobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-emotion")
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
print(model.config.id2label[predicted_class_id])
This worked for me, replace 'text' with the string you want the emotion of. This was buried in the docs and luckily works.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | cigien |