'Tweepy Cursor not reaching its limit

I am coding a Twitter bot which joins giveaways of users that I follow.

The problem is that when I use a for loop to iterate over a ItemIterator Cursor of 50 items it breaks before finishing. It usually does 20 or 39-40 iterations.

My main function is:

from funciones import *
from config import *
api = login(user)
i=0
while 1>i:
    tweets = get_tweets(api, 50, True, None, None)
    file = start_stats()
    for tweet in tweets:
      try:
          i = i+1
          tweet = is_RT(tweet)
          show(tweet)
          check(api,tweet,file)
          print(f'{i}) 1.5 - 2m tweets cd')
          sleep(random.randrange(40, 60,1))
      except Exception as e:
          print(str(e))
          st.append(e)
    print('15-20 min cooldown')
    sleep(random.randrange(900, 1200,1))

So when the loop usually does 39 iterations, the code jumps into the 15 min. cooldown getting these of Tweets:

len(tweets.current_page) - 1 Out[251]: 19

tweets.page_index Out[252]: 19

tweets.limit Out[253]: 50

tweets.num_tweets Out[254]: 20

I've seen this in the Tweepy cursor.py but I still don't know how to fix it.

  def next(self):
        if self.limit > 0:
            if self.num_tweets == self.limit:
                raise StopIteration
        if self.current_page is None or self.page_index == len(self.current_page) - 1:
            # Reached end of current page, get the next page...
            self.current_page = self.page_iterator.next()
            self.page_index = -1
        self.page_index += 1
        self.num_tweets += 1
        return self.current_page[self.page_index]

The function I use in my main function to get the cursor is this:

def get_tweets(api,count=1,cursor = False, user = None, id = None):
    if id is not None:
        tweets = api.get_status(id=id, tweet_mode='extended')
        return tweets
    
    if cursor:
        if user is not None:
            if count>0:
                tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items(count)
            else:
                tweets = tp.Cursor(api.user_timeline, screen_name=user, tweet_mode='extended').items()
        else:
            if count>0:
                tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items(count)
            else:
                tweets = tp.Cursor(api.home_timeline, tweet_mode='extended').items()
    else:
        if user is not None:
            tweets = api.user_timeline(screen_name=user, count=count,tweet_mode='extended')
        else:
            tweets = api.home_timeline(count=count, tweet_mode='extended')
    return tweets

When I've tried test codes like

j = 0
tweets = get_tweets(api,50,True)
for i in tweets:
    j=j+1
print(j)

j and tweets.num_tweets are almost always 50, but I think when this is not 50 is because I don't wait between request, because I've reached j=300 with this, so maybe the problem is in the check function:

(It's a previous check function which also has the same problem, I've noticed it when I've started getting stats, the only difference is that I return values if the Tweets has been liked, rt, etc.)

def check(tweet):
    if (bool(is_seen(tweet))
     +  bool(age_check(tweet,3))
     +  bool(ignore_check(tweet)) == 0):
        rt_check(tweet)
        like_check(tweet)
        follow_check(tweet)
        tag_n_drop_check(tweet)
        quoted_check(tweet)

This is the first time I asked help so I don't know if I've posted all the info needed. This is driving me mad since last week and I don't know who to ask :(

Thanks in advance!



Solution 1:[1]

The IdIterator that Cursor returns when used with API.home_timeline stops when it receives a page with no results. This is most likely what's happening, since the default count for the endpoint is 20 and:

The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied.

https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/api-reference/get-statuses-home_timeline

This is a limitation of this Twitter API endpoint, as there's not another good way to determine when to stop paginating.

However, you can pass a higher count (e.g. 100 if that works for you, up to 200) to the endpoint while using Cursor with it and you'll be less likely to receive a premature empty page.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Harmon758