'Huggingface models only work once, then spit out Tokenizer error

i am following along with this example on huggingface's website, trying to work with twitter sentiment. I am running python 3.9 on PyCharm. the code works fine the first time i run it, however if i try to run the code again with no changes i get this error:

OSError: Can't load tokenizer for 'cardiffnlp/twitter-roberta-base-emotion'. Make sure that:

- 'cardiffnlp/twitter-roberta-base-emotion' is a correct model identifier listed on 'https://huggingface.co/models'
  (make sure 'cardiffnlp/twitter-roberta-base-emotion' is not a path to a local directory with something else, in that case)

- or 'cardiffnlp/twitter-roberta-base-emotion' is the correct path to a directory containing relevant tokenizer file,

one thing i did notice is that Pycharm will create a folder named "cardiffnlp" with subfolders corresponding to the different tasks, such as "twitter-roberta-base-sentiment" in my PyCharm project folder, right above my "venv" folder. However if i delete the "twitter-roberta-base-sentiment" folder that was created the first time the code succesfully ran, the code will work fine, and "twitter-roberta-base-sentiment" folder will show up again.

My guess is that this part of the code is downloading and saving the model to Pycharm. i just dont understand why it works the first time only. do i need to change the model location since it no loger needs to go to the URL to get the file if its already stored locally?

# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
    html = f.read().decode('utf-8').split("\n")
    csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]

Tnanks for the help guys.



Solution 1:[1]

import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification

tokenizer = RobertaTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-emotion")
model = RobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-emotion")

inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class_id = logits.argmax().item()
print(model.config.id2label[predicted_class_id])

This worked for me, replace 'text' with the string you want the emotion of. This was buried in the docs and luckily works.

https://huggingface.co/docs/transformers/main/en/model_doc/roberta#transformers.RobertaForSequenceClassification.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cigien