Category "word-embedding"

Calculating Similarity Between Pairs of Documents in R [closed]

How can I calculate the cosine semantic similarity between pairs of word documents in R? Specifically, I have the plot (i.e., descriptions) of

How are the TokenEmbeddings in BERT created?

In the paper describing BERT, there is this paragraph about WordPiece Embeddings. We use WordPiece embeddings (Wu et al., 2016) with a 30,000 token vocab

How to store Bag of Words or Embeddings in a Database

I would like to store vector features, like Bag-of-Words or Word-Embedding vectors of a large number of texts, in a dataset, stored in a SQL Database. What're t

ReadError: file could not be opened successfully. But I am not sure where the tar file is stored to resolve this

I am using biobert-embeddings==0.1.2 and torch==1.2.0 versions to embed some documents. But, I get the following error when I try to load the model by from biob

Bert embedding layer raises 'ValueError: A target array with shape ' with BiLSTM in keras tensorflow

I've problems integrating Bert Embedding Layer in a BiLSTM model for text classification task. My dataset is in the form where each row has 2 columns: text and

tensorflow2.x keras Embedding layer process tf.dataset error

This question is a follow-up of tensorflow 2 TextVectorization process tensor and dataset error I would like to make do a word embedding for the processed text

Read GloVe pre-trained embeddings into R, as a matrix

Working in R. I know the pre-trained GloVe embeddings (e.g., "glove.6B.50d.txt") can be found here: https://nlp.stanford.edu/projects/glove/. However, I've had

How to calculate similarity for pre-trained word embeddings

I want to know the most similar words to another from a pretrained embedding vectors in R. E.g: words similar to "beer". For this, I download the pretrained emb

Ensure the gensim generate the same Word2Vec model for different runs on the same data

In LDA model generates different topics everytime i train on the same corpus , by setting the np.random.seed(0), the LDA model will always be initialized and tr