'fastai.text NameError: name 'BaseTokenizer' is not defined

I am a beginner of fastai and trying to build a model referring to Using RoBERTa with fast.ai for NLP.

I was trying to customize the tokenizer (as the code below):

from fastai.text import *
from fastai.metrics import *
from transformers import RobertaTokenizer

class FastAiRobertaTokenizer(BaseTokenizer):
    """Wrapper around RobertaTokenizer to be compatible with fastai"""
    def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs): 
        self._pretrained_tokenizer = tokenizer
        self.max_seq_len = max_seq_len 
    def __call__(self, *args, **kwargs): 
        return self 
    def tokenizer(self, t:str) -> List[str]: 
        """Adds Roberta bos and eos tokens and limits the maximum sequence length""" 
        return [config.start_tok] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [config.end_tok]

But got an error message:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-41070aae72d1> in <module>
----> 1 class FastAiRobertaTokenizer(BaseTokenizer):
      2     """Wrapper around RobertaTokenizer to be compatible with fastai"""
      3     def __init__(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
      4         self._pretrained_tokenizer = tokenizer
      5         self.max_seq_len = max_seq_len

NameError: name 'BaseTokenizer' is not defined
  • fastai version: 2.1.8
  • torch version: 1.7.1
  • transformers version: 3.4.0

Did anyone get the same issue before?



Solution 1:[1]

Oh, I finally figure out that I should change from fastai.text import * to from fastai.text.all import *. There is no error message NameError: name 'BaseTokenizer' is not defined left.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Weber Huang