'How bert_en_uncased_preprocess was made? Such as by training a NN or manual coding?
I am new to deep learning and have come across BERT. I tried small_bert/bert_en_uncased_L-4_H-512_A-8
as a Tensorflow tutorial did, and the result was quite amazing. I want to dig deeper and wondering how the corresponding bert_en_uncased_preprocess
was made.
As far as I can understand, it does all those WordPiece tokenizing work, so it should involve some coding instead of just training. The https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3
provides the SavedModel. Is there any source code or something similar? And how it was made in general?
It is not related to usage. The usage is fine and clear. The purpose is to study.
Thanks in advance.
Solution 1:[1]
The object is created by NLP module from Tensorflow Model Garden in this 'create_preprocessing' function.
preprocessor = hub.load("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
The tokenize
function of the preprocessor is implemented by BertTokenizer
class and bert_pack_inputs
function by BertPackInputs class.
Those urls will lead you to the source code.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Pengcheng Fan |