Category "transformer"

I want to ask you about the structure of "query, key, value" of "transformer"

I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code. But I got a question while doing it. In the MultiHeadAttention l

Do I need to train on my own data in using bert model as an embedding vector?

When I try the huggingface models and it gives the following error message: from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pre

M2M100Tokenizer.from_pretrained 'NoneType' object is not callable

I have the following chunk of code from this link: from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer hi_text = "जीव

Temporal Fusion Transformer in savedModel format

I am trying to save the model from here https://github.com/greatwhiz/tft_tf2/blob/master/README.md in SavedModel format (preferably with Functional API). The so

transformers and BERT downloading to your local machine

I am trying to replicates the code from this page. At my workplace we have access to transformers and pytorch library but cannot connect to internet from our py

how to train a bert model from scratch with huggingface?

i find a answer of training model from scratch in this question: How to train BERT from scratch on a new domain for both MLM and NSP? one answer use Trainer and

Why DETR need to set a empty class?

Why DETR need to set a empty class? It has set a "Background" class, which means non-object, why?

Why DETR need to set a empty class?

Why DETR need to set a empty class? It has set a "Background" class, which means non-object, why?

what's the difference between "self-attention mechanism" and "full-connection" layer?

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powe

How to understand masked multi-head attention in transformer

I'm currently studying code of transformer, but I can not understand the masked multi-head of decoder. The paper said that it is to prevent you from seeing the

AttributeError: 'GPT2TokenizerFast' object has no attribute 'max_len'

I am just using the huggingface transformer library and get the following message when running run_lm_finetuning.py: AttributeError: 'GPT2TokenizerFast' object

PyTorch Temporal Fusion Transformer prediction output length

I have trained a temporal fusion transformer on some training data and would like to predict on some unseen data. To do so, I'm using the pytorch_forecasting Ti

Category "transformer"

Other Categories