'How bert_en_uncased_preprocess was made? Such as by training a NN or manual coding?

I am new to deep learning and have come across BERT. I tried small_bert/bert_en_uncased_L-4_H-512_A-8 as a Tensorflow tutorial did, and the result was quite amazing. I want to dig deeper and wondering how the corresponding bert_en_uncased_preprocess was made.

As far as I can understand, it does all those WordPiece tokenizing work, so it should involve some coding instead of just training. The https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3 provides the SavedModel. Is there any source code or something similar? And how it was made in general?

It is not related to usage. The usage is fine and clear. The purpose is to study.

Thanks in advance.



Solution 1:[1]

The object is created by NLP module from Tensorflow Model Garden in this 'create_preprocessing' function.

preprocessor = hub.load("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")

The tokenize function of the preprocessor is implemented by BertTokenizer class and bert_pack_inputs function by BertPackInputs class.

Those urls will lead you to the source code.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pengcheng Fan