'fastai.text NameError: name 'BaseTokenizer' is not defined
I am a beginner of fastai and trying to build a model referring to Using RoBERTa with fast.ai for NLP.
I was trying to customize the tokenizer (as the code below):
from fastai.text import *
from fastai.metrics import *
from transformers import RobertaTokenizer
class FastAiRobertaTokenizer(BaseTokenizer):
"""Wrapper around RobertaTokenizer to be compatible with fastai"""
def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
self._pretrained_tokenizer = tokenizer
self.max_seq_len = max_seq_len
def __call__(self, *args, **kwargs):
return self
def tokenizer(self, t:str) -> List[str]:
"""Adds Roberta bos and eos tokens and limits the maximum sequence length"""
return [config.start_tok] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [config.end_tok]
But got an error message:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-6-41070aae72d1> in <module>
----> 1 class FastAiRobertaTokenizer(BaseTokenizer):
2 """Wrapper around RobertaTokenizer to be compatible with fastai"""
3 def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
4 self._pretrained_tokenizer = tokenizer
5 self.max_seq_len = max_seq_len
NameError: name 'BaseTokenizer' is not defined
- fastai version: 2.1.8
- torch version: 1.7.1
- transformers version: 3.4.0
Did anyone get the same issue before?
Solution 1:[1]
Oh, I finally figure out that I should change from fastai.text import *
to from fastai.text.all import *
. There is no error message NameError: name 'BaseTokenizer' is not defined
left.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Weber Huang |