'How to get PRAW (the Python Reddit API Wrapper) to read submission ID?
Goal: I have collected hundreds of reddit posts' details in Excel sheets. Now, I want to collect comments on these Reddit posts using PRAW.
Method: At first, I simply copied and pasted a post ID (e.g. "nyf6o3") into my Python script.
import praw
submission = reddit.submission(id="nyf6o3")
submission.comments.replace_more(limit=None)
comments_rows = [[comment.subreddit, comment.submission, comment.id, comment.author, comment.score, comment.created, comment.body] for comment in submission.comments.list()]
comments_1 = pd.DataFrame(comments_rows, columns=["Subreddit", "Post ID", "Comment ID", "Author", "Score", "Created", "Body"])
print(comments_1)
This script works, but it is quite a hassle. So I got Python to read my Excel sheet and assign variables names to each of the post IDs (e.g, "post1", "post2", etc.).
import pandas as pd
file_loc = r"C:\Users\Someone\Downloads\Datasets\Airbnb_hosts.xlsx"
df = pd.read_excel(file_loc, sheet_name=None, header=0, usecols="B")
post1=df["airbnb_hosts"].values[0]
post2=df["airbnb_hosts"].values[1]
post3=df["airbnb_hosts"].values[2]
post4=df["airbnb_hosts"].values[3]
post5=df["airbnb_hosts"].values[4]
post6=df["airbnb_hosts"].values[5]
post7=df["airbnb_hosts"].values[6]
# ETCETERA...
Problem: Unfortunately, "post1" does not get read as a post ID and now the comment-obtaining piece of code doesn't work anymore.
import praw
submission = reddit.submission(id=post1)
submission.comments.replace_more(limit=None)
comments_rows = [[comment.subreddit, comment.submission, comment.id, comment.author, comment.score, comment.created, comment.body] for comment in submission.comments.list()]
comments_1 = pd.DataFrame(comments_rows, columns=["Subreddit", "Post ID", "Comment ID", "Author", "Score", "Created", "Body"])
print(comments_1)
Do you have any ideas on what I could try? I really want to avoid having to manually copy&paste post IDs into a script.
Solution 1:[1]
What is the error that you get?
file_loc = pd.read_excel (r"Airbnb_hosts.xlsx")
df = pd.DataFrame(file_loc, columns= ['Author'])
post1=df["Author"].values[0]
post2=df["Author"].values[1]
print (post1)
print (post2)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Khan |