'LDA Returning numbers instead of words from Term Document Matrix

I am trying to use the LDA function to evaluate a corpus of text in R. However, when I do so, it seems to use the row names of the observations rather than the actual words in the corpus. I can't find anything else about this online so I imagine I must be doing something very basic incorrectly.

library(tm)
library(SnowballC)
library(tidytext)
library(stringr)
library(tidyr)
library(topicmodels)
library(dplyr)

#read in data
data <- read.csv('CSV_format_data.csv',sep=',')
#Create corpus/DTM
interviews <- as.matrix(data[,2])
ints.corpus <- Corpus(VectorSource(interviews))
ints.dtm <- TermDocumentMatrix(ints.corpus)

chapters_lda <- LDA(ints.dtm, k = 4, control = list(seed = 5421685))
chapters_lda_td <- tidy(chapters_lda,matrix="beta")
chapters_lda_td

head(ints.dtm$dimnames$Terms)

The 'chapters_lda_td' command outputs

# A tibble: 4,084 x 3
   topic term        beta
   <int> <chr>      <dbl>
 1     1 1     0.000555  
 2     2 1     0.00399   
 3     3 1     0.000614  
 4     4 1     0.000699  
 5     1 2     0.0000195 
 6     2 2     0.000708  
 7     3 2     0.000731  
 8     4 2     0.00000155
 9     1 3     0.000974  
10     2 3     0.0000363 
# ... with 4,074 more rows

Note that there are numbers instead of words as there should be in the "term" column. The number of rows matches the number of documents times the number of topics, rather than the number of terms times the number of topics, as it should be. The 'head(ints.dtm$dimnames$Terms)' is to check that there are actually words in the DTM, which there are. The result is:

[1] "aaye"      "able"      "adjust"    "admission" "after"     "age" 

The data file itself is a pretty standard two-column CSV file with an ID and a block of text, and hasn't given me any problem while doing other text-mining stuff with it and the tm package. Any help would be appreciated, thank you!



Solution 1:[1]

I figured it out! It is because I am using the command

ints.dtm <- TermDocumentMatrix(ints.corpus)

rather than

ints.dtm <- DocumentTermMatrix(ints.corpus)

I guess the ordering of Term and Document switches their dimnames order around, so LDA grabs the wrong one.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 NickCHK