'NLP textEmbed function
I am trying to run the textEmbed function in R. Set up needed:
require(quanteda)
require(quanteda.textstats)
require(udpipe)
require(reticulate)
#udpipe_download_model(language = "english")
ud_eng <- udpipe_load_model(here::here('english-ewt-ud-2.5-191206.udpipe'))
virtualenv_list()
reticulate::import('torch')
reticulate::import('numpy')
reticulate::import('transformers')
reticulate::import('nltk')
reticulate::import('tokenizers')
require(text)
It runs the following code
tmp1 <- textEmbed(x = 'sofa help',
model = 'roberta-base',
layers = 11)
tmp1$x
However, it does not run the following code
tmp1 <- textEmbed(x = 'sofa help',
model = 'roberta-base',
layers = 11)
tmp1$x
It gives me the following error
Error in x[[1]] : subscript out of bounds
In addition: Warning message:
Unknown or uninitialised column: `words`.
Any suggestions would be highly appreciated
Solution 1:[1]
I believe that this error has been fixed with a newer version of the text
-package (version .9.50 and above).
(I cannot see any difference in the two code parts – but I think that this error is related to only submitting one token/word to textEmbed, which now works).
Also, see updated instructions for how to install the text
-package http://r-text.org/articles/Extended_Installation_Guide.html
library(text)
library(reticulate)
# Install text required python packages in a conda environment (with defaults).
text::textrpp_install()
# Show available conda environments.
reticulate::conda_list()
# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R.
text::textrpp_initialize(save_profile = TRUE)
# Test so that the text package work.
textEmbed("hello")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Gorp |