Category "nlp"

Vectoring text data of dictionaries' values from pickle file

I'm new to NLP and trying to learn it by myself and I am doing classification. I have a pickle file with some data like this, {'food' : {'f1.txt', 'f2.txt', 'f

R: How can I add titles based on grouping variable in word_associate?

I am using the word_associate package in R Markdown to create word clouds across a grouping variable with multiple categories. I would like the titles of each w

How to generate a sentence around words in Keras?

I know that how to generate next word in keras with lstm but how to predict previous word for example If you have two words like "car" and "running" then It sho

I created a TF-IDF code to analyze an annual report, I want to know the importance of specific keywords

import pandas as pd from sklearn.feature_extraction.text import TfidfTransformer from sklearn.feature_extraction.text import TfidfVectorizer import path import

Will NER improve Text Categorization?

I was wondering - if I'm doing text categorization (with SpaCy, using their textcat-multi component for example), will those results improve if an NER component

Text Classification on a custom dataset with spacy v3

I am really struggling to make things work with the new spacy v3 version. The documentation is full. However, I am trying to run a training loop in a script. (I

Add Noise to Background for Voice Separation

I want to implement a voice separation project. Now, I got a voice dataset with no background noise and a dataset about noise, such as engine sound , horn sound

How to get TF-IDF value of a word from all set of documents?

I need a TF-IDF value for a word that is found in number of documents and not only a single document or a specific document. For example, Consider this corpus c

Removing Non-English Words from CSV - NLTK

I am relatively new to Python and NLTK and have a hold of Flickr data stored in CSV and want to remove non-english words from the tags column. I keep getting er

kwic() function returns less rows than it should

I'm currently trying to perform a sentiment analysis on a kwic object, but I'm afraid that the kwic() function does not return all rows it should return. I'm no

I want to ask you about the structure of "query, key, value" of "transformer"

I'm a beginner at NLP. So I'm trying to reproduce the most basic transformer all you need code. But I got a question while doing it. In the MultiHeadAttention l

Tell `kwic()` to ignore stopwords when situating keywords in context?

I once again have a question about the kwic() function from the quanteda package. I want to extract the five words around a specific keyword (in the example bel

Using a target size (torch.Size([2])) that is different to the input size (torch.Size([2, 5])) is deprecated. Please ensure they have the same size

When I am using criterion = nn.BCELoss() for my binary classification task it creates problem and print "Using a target size (torch.Size([2])) that is different

Error while creating a model for binary classification for text classification

code: model = create_model() model.compile(optimize=tf.keras.optimizers.Adam(learning_rate=2e-5), loss=tf.keras.losses.BinaryCrossentropy(),

Continous Bag of Words

I have a question related to the continous Bag of Words model. If I have a vocabulary size of 1000, a window size of 2, and the number of nodes in the hidden la

I want to add numeric columns to my tfidf sparse matrix

[here] I tried to do it with sp.hstack() and with

Looping through each row in array to calculate cosine similarity

I have a subset of a dataframe that looks like: <OUT> PageNumber english_only_tags 175 flower architecture people 162 hair red bobbles

It looks like the config file at 'bert-base-uncased' is not a valid JSON file?

Working fine for months, then I interrupted a "bert-large-cased" download and the following code returns the error in the title: from transformers import BertMo

Value error trying to fit a logistic regression with SentenceTransformer output (embeddig)

My code: model = SentenceTransformer('hiiamsid/sentence_similarity_spanish_es') I apply the model to the text column of the data frame prueba['encoder'] = prueb

Is there any way to put timer/end the serving of infographics automatically in dispacy?

While running the code with displacy, I see the images being created perfectly as expected. They are also projected to a server, the address of which is mention