Category "huggingface-tokenizers"

Unable to connect to Huggingface from EC2 instance

I am running a python code in EC2 instance where I am loading a Huggingface model using the from_pretrained() method. I get the error OSError: Couldn't reach se

M2M100Tokenizer.from_pretrained 'NoneType' object is not callable

I have the following chunk of code from this link: from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer hi_text = "जीव

tokenization with huggingFace BartTokenizer

I am trying to use BART pretrained model to train a pointer generator network with huggingface transformer library. example input of the task: from transformers

RoBERTa classifier: cannot generate single prediction

I have succesfully trained a text emotion classifier fine-tuning a RoBERTa language model, mostly using a helpful notebook found online. Now I am trying to writ

Huggingface models only work once, then spit out Tokenizer error

i am following along with this example on huggingface's website, trying to work with twitter sentiment. I am running python 3.9 on PyCharm. the code works fine

How to apply max_length to truncate the token sequence from the left in a HuggingFace tokenizer?

In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (

AttributeError: 'GPT2TokenizerFast' object has no attribute 'max_len'

I am just using the huggingface transformer library and get the following message when running run_lm_finetuning.py: AttributeError: 'GPT2TokenizerFast' object