Category "nlp"

How to solve missing words in nltk.corpus.words.words()?

I have tried to remove non-English words from a text. Problem many other words are absent from the NLTK words corpus. My code: import pandas as pd lst = ['

Process and progress for natural language analysis of company communication?

Assume there is a large record of all different kinds of inter-employee and customer communications (e.g. mails, chat transcripts, OCRed letters) which should b

How can I use "NER" for German Language with stanford-corenlp?

I am trying to use nlp for german language but it does not work! I was making the pipeline and then NER to find the entity of each element in sentence which is

'CRF' object has no attribute 'keep_tempfiles'

I have imported ` from itertools import chain import nltk import sklearn import scipy.stats import sklearn_crfsuite from sklearn_crfsuite import scorers,CR

Tokenization of Compound Words not Working in Quanteda

I'm trying to create a dataframe containing specific keywords-in-context using the kwic() function, but unfortunately, I'm running into some error when attempti

How are the TokenEmbeddings in BERT created?

In the paper describing BERT, there is this paragraph about WordPiece Embeddings. We use WordPiece embeddings (Wu et al., 2016) with a 30,000 token vocab

How do I know the order of the classes in a CatBoost classifier weights?

This is a pretty dumb question, but I couldn't find anywhere, so I will take my chances in here... I'm building a classifier using CatBoost. Since this is a NLP

TypeError: "hypothesis" expects pre-tokenized hypothesis (Iterable[str]):

I am trying to calculate the Meteor score for the following: print (nltk.translate.meteor_score.meteor_score( ["this is an apple", "that is an apple"], "an

NLP textEmbed function

I am trying to run the textEmbed function in R. Set up needed: require(quanteda) require(quanteda.textstats) require(udpipe) require(reticulate) #udpi

How to Vectorize python function

I have made a resume parser but to parse my resumes, I am using a for loop to run my parse function over each resume. Is there a way to vectorize this approach?

How to store Bag of Words or Embeddings in a Database

I would like to store vector features, like Bag-of-Words or Word-Embedding vectors of a large number of texts, in a dataset, stored in a SQL Database. What're t

R: Correct Way to Calculate Cosine Similarity?

I am working with the R programming language. I have the following data: text = structure(list(id = 1:8, reviews = c("I guess the employee decided to buy their

Error 'power iteration failed to converge within 100 iterations') when I tried to summarize a text document using python networkx

I got an PowerIterationFailedConvergence:(PowerIterationFailedConvergence(...), 'power iteration failed to converge within 100 iterations') when I tried to summ

Category "nlp"

How to solve missing words in nltk.corpus.words.words()?

Process and progress for natural language analysis of company communication?

How can I use "NER" for German Language with stanford-corenlp?

'CRF' object has no attribute 'keep_tempfiles'

Tokenization of Compound Words not Working in Quanteda

How are the TokenEmbeddings in BERT created?

How do I know the order of the classes in a CatBoost classifier weights?

TypeError: "hypothesis" expects pre-tokenized hypothesis (Iterable[str]):

NLP textEmbed function

How to Vectorize python function

How to store Bag of Words or Embeddings in a Database

R: Correct Way to Calculate Cosine Similarity?

Error 'power iteration failed to converge within 100 iterations') when I tried to summarize a text document using python networkx

Continual pre-training vs. Fine-tuning a language model with MLM

How to get up and running with spaCy for Vietnamese?

Definition of downstream tasks in NLP

How to fix LDA model coherence score runtime Error?

Follow-up question regarding a Keras model issue

Extracting names from a text file using Spacy

How do I remove nonsensical or incomplete words from a corpus?

Category "nlp"

Other Categories