'Tweepy Cursor not reaching its limit
I am coding a Twitter bot which joins giveaways of users that I follow.
The problem is that when I use a for
loop to iterate over a ItemIterator
Cursor
of 50 items it breaks before finishing. It usually does 20 or 39-40 iterations.
My main function is:
from funciones import *
from config import *
api = login(user)
i=0
while 1>i:
tweets = get_tweets(api, 50, True, None, None)
file = start_stats()
for tweet in tweets:
try:
i = i+1
tweet = is_RT(tweet)
show(tweet)
check(api,tweet,file)
print(f'{i}) 1.5 - 2m tweets cd')
sleep(random.randrange(40, 60,1))
except Exception as e:
print(str(e))
st.append(e)
print('15-20 min cooldown')
sleep(random.randrange(900, 1200,1))
So when the loop usually does 39 iterations, the code jumps into the 15 min. cooldown getting these of Tweets:
len(tweets.current_page) - 1 Out[251]: 19
tweets.page_index Out[252]: 19
tweets.limit Out[253]: 50
tweets.num_tweets Out[254]: 20
I've seen this in the Tweepy cursor.py but I still don't know how to fix it.
def next(self):
if self.limit > 0:
if self.num_tweets == self.limit:
raise StopIteration
if self.current_page is None or self.page_index == len(self.current_page) - 1:
# Reached end of current page, get the next page...
self.current_page = self.page_iterator.next()
self.page_index = -1
self.page_index += 1
self.num_tweets += 1
return self.current_page[self.page_index]
The function I use in my main function to get the cursor is this:
def get_tweets(api,count=1,cursor = False, user = None, id = None):
if id is not None:
tweets = api.get_status(id=id, tweet_mode='extended')
return tweets
if cursor:
if user is not None:
if count>0:
tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items(count)
else:
tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items()
else:
if count>0:
tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items(count)
else:
tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items()
else:
if user is not None:
tweets = api.user_timeline(screen_name=user, count=count,tweet_mode='extended')
else:
tweets = api.home_timeline(count=count, tweet_mode='extended')
return tweets
When I've tried test codes like
j = 0
tweets = get_tweets(api,50,True)
for i in tweets:
j=j+1
print(j)
j
and tweets.num_tweets
are almost always 50, but I think when this is not 50 is because I don't wait between request, because I've reached j=300
with this, so maybe the problem is in the check function:
(It's a previous check function which also has the same problem, I've noticed it when I've started getting stats, the only difference is that I return values if the Tweets has been liked, rt, etc.)
def check(tweet):
if (bool(is_seen(tweet))
+ bool(age_check(tweet,3))
+ bool(ignore_check(tweet)) == 0):
rt_check(tweet)
like_check(tweet)
follow_check(tweet)
tag_n_drop_check(tweet)
quoted_check(tweet)
This is the first time I asked help so I don't know if I've posted all the info needed. This is driving me mad since last week and I don't know who to ask :(
Thanks in advance!
Solution 1:[1]
The IdIterator
that Cursor
returns when used with API.home_timeline
stops when it receives a page with no results. This is most likely what's happening, since the default count
for the endpoint is 20 and:
The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied.
This is a limitation of this Twitter API endpoint, as there's not another good way to determine when to stop paginating.
However, you can pass a higher count
(e.g. 100 if that works for you, up to 200) to the endpoint while using Cursor
with it and you'll be less likely to receive a premature empty page.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Harmon758 |