'Flexible word count in Pytorch Embedding
The Embedding class in Pytorch takes in num_embedding as a parameter. According to the doc, num_embedding is "size of the dictionary of embeddings". I am curious about the following two cases when creating an embedding object:
- The num_embedding, or word count in a database, is unknown before we create an embedding.
- The num_embedding, or word count, is flexible. For example, initially I create an embedding with num_embedding, or word count, 1000. Later I have some new elements added in. For example, I have 10 new words on top of the existing 1000 words, how to modify the existing embedding (keep the embedding_dim same) to adapt the change?
Solution 1:[1]
I don't think you can change the num_embeddings
after initializing the embedding. However, maybe you could concatenate the embedding weights to a new tensor to form a new weight tensor which extends the vocabulary.
I hope this example helps:
import torch
import torch.nn as nn
# if values in input are not in the interval [0, 4] an error is thrown
embedding = nn.Embedding(num_embeddings = 5, embedding_dim = 3)
# input values can now be in the interval [0, 5]
embedding.weight = nn.Parameter(torch.cat((embedding.weight, torch.randn(1, 3))))
input = torch.LongTensor([[1, 5], [4, 3]])
Similarly you could add more than one new vocabulary by changing n in torch.rand(n, 3)
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | user18842383 |