Category "nlp"

Asking gpt-2 to finish sentence with huggingface transformers

I am currently generating text from left context using the example script run_generation.py of the huggingface transformers library with gpt-2: $ python transf

which algorithm does google keyboard uses for automatic suggestions (personal vocab included)?

I am confused since google cannnot train their text generation models with each individuals personal vocabulary. I was trying to develop something similar but

How to train a model in SageMaker Studio with .train and .test extension dataset files?

I'm trying to implement ML models with Amazon SageMaker Studio, the thing is that the model that I want to implement is from hugging face and It uses a Dataset

Counting number of co-occurrences of words for a specified vocabulary and within a specified radius?

I have a vocabulary V = ["anarchism", "originated", "term", "abuse"], and list of words test = ['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'fi

How to extract only English words from a from big text corpus using nltk?

I am want remove all non dictionary english words from text corpus. I have removed stopwords, tokenized and countvectorized the data. I need extract only the E

Multilingual NLTK for POS Tagging and Lemmatizer

Recently I approached to the NLP and I tried to use NLTK and TextBlob for analyzing texts. I would like to develop an app that analyzes reviews made by traveler

NLTK agreement with distance metric

I have a task to calculate inter-annotator agreement in multi-label classification, where for each example more than one label can be assigned. I found that NLT

HTTP error 403 in Python 3 web scraping the publications

This is the traceback of the error that is happening when I am trying to put the URL of the publication. It works for the regular websites such as Stack Overflo

Issues running a Keras model with custom layers

I am currently working on my bachelor's thesis at FIIT STU, the primary goal of which is to attempt to replicate and verify the results of the following study.

Why does BERT Model fail to find an option that matches my input positional arguments?

While attempting an NLP exercise, I tried to make use of BERT architecture to get a good training model. So I defined a function that builds and compiles the mo

Fine-Tuning DistilBertForSequenceClassification: Is not learning, why is loss not changing? Weights not updated?

I am relatively new to PyTorch and Huggingface-transformers and experimented with DistillBertForSequenceClassification on this Kaggle-Dataset. from transformers

Does the IOB tagging method for Named Entity Recognition (NER) has any advantage in terms of model accuracy or computational time?

Can we do NER without the IOB tags and with only the entities as labels? I am specifically working on token classification for visual documents like receipts. F

Using Topic Modelling or another NLP approach, is it possible to define words that go into topics/categories for better defined topic model?

I have a problem where I am using topic modelling and taking into consideration LDA & LSA approaches however have found that some of the topics are not bein

How to split a Thai sentence, which does not use spaces, into words?

How to split word from Thai sentence? English we can split word by space. Example: I go to school, split = ['I', 'go', 'to' ,'school'] Split by looking only s

How to split a Thai sentence, which does not use spaces, into words?

How to split word from Thai sentence? English we can split word by space. Example: I go to school, split = ['I', 'go', 'to' ,'school'] Split by looking only s

Does Fine-tunning Bert Model in multiple times with different dataset make it more accuracy?

i'm totally new in NLP and Bert Model. What im trying to do right now is Sentiment Analysis on Twitter Trending Hashtag ("neg", "neu", "pos") by using DistilBer

Tensorflow seq2seq - keep max three checkpoints not working

I am writing a seq2seq and would like to keep only three checkpoints; I thought I was implementing this with: checkpoint_dir = './training_checkpoints' checkpoi

why do pooler use tanh as a activation func in bert, rather than gelu?

class BERTPooler(nn.Module): def init(self, config): super(BERTPooler, self).init() self.dense = nn.Linear(config.hidden_size, config.hidden_size) self.activati

Tensorflow's seq2seq: tensorflow.python.framework.errors_impl.InvalidArgumentError

I am following quite closely the Seq2seq for translation tutorial here https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt#define_the_optimizer_and

KALDI: steps/make_mfcc.sh: no such file conf/mfcc.conf

I am very new to kaldi this is probably my own mistake any help is very much appreciated. I am working with my own dataset. I have cloned voxforge setup and use