'Crawling Twitter API for specific tweets

I am trying to crawl twitter for specific keywords, which I have made into the array

keywords = ["art", "railway", "neck"]

I am trying to search for these words in a specific location, which I have written as

PLACE_LAT = 29.7604
PLACE_LON = -95.3698
PLACE_RAD = 200

I have then tried to apply a function to find at least 200 tweets, but I know that only 100 can be searched with each query. My code so far is below, however, this code did not work.

def retrieve_tweets(api, keyword, batch_count, total_count, latitude, longitude, radius):
    """
    collects tweets using the Twitter search API

    api:         Twitter API instance
    keyword:     search keyword
    batch_count: maximum number of tweets to collect per each request
    total_count: maximum number of tweets in total
    """


    # the collection of tweets to be returned
    tweets_unfiltered = []
    tweets = []

    # the number of tweets within a single query
    batch_count = str(batch_count)

    '''
    You are required to insert your own code where instructed to perform the first query to Twitter API.
    Hint: revise the practical session on Twitter API on how to perform query to Twitter API.
    '''
    # per the first query, to obtain max_id_str which will be used later to query sub
    resp = api.request('search/tweets', {'q': keywords,
                                         'count': '100',
                                         'lang':'en',
                                         'result_type':'recent',
                                         'geocode':'{PLACE_LAT},{PLACE_LONG},{PLACE_RAD}mi'.format(latitude, longitude, radius)})

    # store the tweets in a list

    # check first if there was an error
    if ('errors' in resp.json()):
        errors = resp.json()['errors']
        if (errors[0]['code'] == 88):
            print('Too many attempts to load tweets.')
            print('You need to wait for a few minutes before accessing Twitter API again.')

    if ('statuses' in resp.json()):
        tweets_unfiltered += resp.json()['statuses']
        tweets = [tweet for tweet in tweets_unfiltered if ((tweet['retweeted'] != True) and ('RT @' not in tweet['text']))]

        # find the max_id_str for the next batch
        ids = [tweet['id'] for tweet in tweets_unfiltered]
        max_id_str = str(min(ids))

        # loop until as many tweets as total_count is collected
        number_of_tweets = len(tweets)
        while number_of_tweets < total_count:

            resp = api.request('search/tweets', {'q': keywords,
                                                 'count': '50',
                                                 'lang':'en',
                                                 'result_type': 'recent',
                                                 'max_id': max_id_str,
                                                 'geocode':'{PLACE_LAT},{PLACE_LONG},{PLACE_RAD}mi'.format(latitude, longitude, radius)}
                          )

            if ('statuses' in resp.json()):
                tweets_unfiltered += resp.json()['statuses']
                tweets = [tweet for tweet in tweets_unfiltered if ((tweet['retweeted'] != True) and ('RT @' not in tweet['text']))]

                ids = [tweet['id'] for tweet in tweets_unfiltered]
                max_id_str = str(min(ids))

                number_of_tweets = len(tweets)

            print("{} tweets are collected for keyword {}. Last tweet created at {}".format(number_of_tweets, 
                                                                                    keyword, 
                                                                                    tweets[number_of_tweets-1]['created_at']))
    return tweets

I only needed to write code where it said # Insert your code. What changes do I need to make to get this to work

def retrieve_tweets(api, keyword, batch_count, total_count, latitude, longitude, radius):
    """
    collects tweets using the Twitter search API

    api:         Twitter API instance
    keyword:     search keyword
    batch_count: maximum number of tweets to collect per each request
    total_count: maximum number of tweets in total
    """


    # the collection of tweets to be returned
    tweets_unfiltered = []
    tweets = []

    # the number of tweets within a single query
    batch_count = str(batch_count)

    '''
    You are required to insert your own code where instructed to perform the first query to Twitter API.
    Hint: revise the practical session on Twitter API on how to perform query to Twitter API.
    '''
    # per the first query, to obtain max_id_str which will be used later to query sub
    resp = api.request('search/tweets', {'q': #INSERT YOUR CODE
                                         'count': #INSERT YOUR CODE
                                         'lang':'en',
                                         'result_type':'recent',
                                         'geocode':'{},{},{}mi'.format(latitude, longitude, radius)})

    # store the tweets in a list

    # check first if there was an error
    if ('errors' in resp.json()):
        errors = resp.json()['errors']
        if (errors[0]['code'] == 88):
            print('Too many attempts to load tweets.')
            print('You need to wait for a few minutes before accessing Twitter API again.')

    if ('statuses' in resp.json()):
        tweets_unfiltered += resp.json()['statuses']
        tweets = [tweet for tweet in tweets_unfiltered if ((tweet['retweeted'] != True) and ('RT @' not in tweet['text']))]

        # find the max_id_str for the next batch
        ids = [tweet['id'] for tweet in tweets_unfiltered]
        max_id_str = str(min(ids))

        # loop until as many tweets as total_count is collected
        number_of_tweets = len(tweets)
        while number_of_tweets < total_count:

            resp = api.request('search/tweets', {'q': #INSERT YOUR CODE
                                             'count': #INSERT YOUR CODE
                                             'lang':'en',
                                             'result_type':  #INSERT YOUR CODE
                                             'max_id': max_id_str,
                                             'geocode': #INSERT YOUR CODE
                          )

            if ('statuses' in resp.json()):
                tweets_unfiltered += resp.json()['statuses']
                tweets = [tweet for tweet in tweets_unfiltered if ((tweet['retweeted'] != True) and ('RT @' not in tweet['text']))]

                ids = [tweet['id'] for tweet in tweets_unfiltered]
                max_id_str = str(min(ids))

                number_of_tweets = len(tweets)

            print("{} tweets are collected for keyword {}. Last tweet created at {}".format(number_of_tweets, 
                                                                                    keyword, 
                                                                                    tweets[number_of_tweets-1]['created_at']))
    return tweets


Solution 1:[1]

What is your question or issue? I didn't see any in your post.

A couple of suggestions... Remove lang and result_type parameters from your request. Because you are using geocode you should not expect very many results since hardly anyone turns location on when they tweet.

Also, rather than using max_id parameter, you may want to look at the TwitterPager class which takes care of this for you. Here is an example: https://github.com/geduldig/TwitterAPI/blob/master/examples/page_tweets.py.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jonas