Category "bert-language-model"

RuntimeError: Found dtype Long but expected Float when fine-tuning using Trainer API

I'm trying to fine-tune BERT model for sentiment analysis (classifying text as positive/negative) with Huggingface Trainer API. My dataset has two columns, Text

Bert Model Compile Error - TypeError: Invalid keyword argument(s) in `compile`: {'steps_per_execution'}

I have been using bert and trying to compile the model using the below line of code. model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased'

Could not find function 'spacy-transformers.TransformerModel.v3' in function registry 'architectures'

I was trying to create a custom NER model. I used spacy library to create the model. And this line of code is to create the config file from the base.config fil

How can I use BERT for address matching problem?

I am building an address matching algorithm. The main problem is that previous models like Conditional Random fields (CRF)from Paserator and Averaged Perceptron

How to specify a proxy in transformers pipeline

I am using sentiment-analysis pipeline as described here. from transformers import pipeline classifier = pipeline('sentiment-analysis') It's failing with a con

transformers and BERT downloading to your local machine

I am trying to replicates the code from this page. At my workplace we have access to transformers and pytorch library but cannot connect to internet from our py

how to train a bert model from scratch with huggingface?

i find a answer of training model from scratch in this question: How to train BERT from scratch on a new domain for both MLM and NSP? one answer use Trainer and

Early stopping in Bert Trainer instances

I am fine tuning a BERT model for a multiclass classification task. My problem is that I don't know how to add "early stopping" to those Trainer instances. Any

Continual pre-training vs. Fine-tuning a language model with MLM

I have some custom data I want to use to further pre-train the BERT model. I’ve tried the two following approaches so far: Starting with a pre-trained BER

ReadError: file could not be opened successfully. But I am not sure where the tar file is stored to resolve this

I am using biobert-embeddings==0.1.2 and torch==1.2.0 versions to embed some documents. But, I get the following error when I try to load the model by from biob

Bert embedding layer raises 'ValueError: A target array with shape ' with BiLSTM in keras tensorflow

I've problems integrating Bert Embedding Layer in a BiLSTM model for text classification task. My dataset is in the form where each row has 2 columns: text and

Target Data Missing from tensorflow fit()

So I have a problem when train deep learning with BERT with tensorflow which contain text dataset. So i want to fit() the model but got an error when training.

what's the difference between "self-attention mechanism" and "full-connection" layer?

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powe

Pretraining a language model on a small custom corpus

I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre

How to apply max_length to truncate the token sequence from the left in a HuggingFace tokenizer?

In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (

Why does BERT Model fail to find an option that matches my input positional arguments?

While attempting an NLP exercise, I tried to make use of BERT architecture to get a good training model. So I defined a function that builds and compiles the mo

huggingface transformers convert logit scores to probability

I'm a beginner to this field and am stuck. I am following this tutorial (https://towardsdatascience.com/multi-label-multi-class-text-classification-with-bert-tr

Does Fine-tunning Bert Model in multiple times with different dataset make it more accuracy?

i'm totally new in NLP and Bert Model. What im trying to do right now is Sentiment Analysis on Twitter Trending Hashtag ("neg", "neu", "pos") by using DistilBer

why do pooler use tanh as a activation func in bert, rather than gelu?

class BERTPooler(nn.Module): def init(self, config): super(BERTPooler, self).init() self.dense = nn.Linear(config.hidden_size, config.hidden_size) self.activati