'NLP textEmbed function

I am trying to run the textEmbed function in R. Set up needed:

  require(quanteda)
  require(quanteda.textstats)
  require(udpipe)
  require(reticulate)


#udpipe_download_model(language = "english")

  ud_eng <- udpipe_load_model(here::here('english-ewt-ud-2.5-191206.udpipe'))

  virtualenv_list()

  reticulate::import('torch')
  reticulate::import('numpy')
  reticulate::import('transformers')
  reticulate::import('nltk')
  reticulate::import('tokenizers')
  require(text)

It runs the following code

tmp1 <- textEmbed(x = 'sofa help',
                  model = 'roberta-base',
                  layers = 11)

tmp1$x

However, it does not run the following code

tmp1 <- textEmbed(x = 'sofa help',
                      model = 'roberta-base',
                      layers = 11)
    
    tmp1$x

It gives me the following error

Error in x[[1]] : subscript out of bounds
In addition: Warning message:
Unknown or uninitialised column: `words`. 

Any suggestions would be highly appreciated



Solution 1:[1]

I believe that this error has been fixed with a newer version of the text-package (version .9.50 and above).

(I cannot see any difference in the two code parts – but I think that this error is related to only submitting one token/word to textEmbed, which now works).

Also, see updated instructions for how to install the text-package http://r-text.org/articles/Extended_Installation_Guide.html


library(text)
library(reticulate)

# Install text required python packages in a conda environment (with defaults).
text::textrpp_install()

# Show available conda environments.
reticulate::conda_list()

# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R. 
text::textrpp_initialize(save_profile = TRUE)

# Test so that the text package work.
textEmbed("hello")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gorp