'How to check if extended entity is present or not in tweepy response

I am able to fetch different tweet parameters from tweet.

    keyword = tweepy.Cursor(api.search, val,tweet_mode='extended',lang='en').items(2)
    tweetdone = 0
    all_tweet = []
    for tweet in keyword:
        tweet_record = {} 
        tweet_record['tweet.text'] = tweet.full_text
        tweet_record['tweet.user.name'] = tweet.user.name
        tweet_record['tweet.user.location'] = tweet.user.location
        tweet_record['tweet.user.verified'] = tweet.user.verified
        tweet_record['tweet.lang'] = tweet.lang
        tweet_record['tweet.created_at'] = tweet.created_at
        tweet_record['tweet.user'] = tweet.user
        tweet_record['tweet.retweet_count'] = tweet.retweet_count
        tweet_record['tweet.favorite_count'] = tweet.favorite_count
        

I want to parse media objects from the tweet, but extended_entities in which media_url is present is not available in all tweets. so if I try to fetch it like this:

tweet_record['media_url'] = tweet.extended_entities.media_url

It errors out because extended_entities may not be present in some tweets.

How to deal this issue and fetch media content correctly?



Solution 1:[1]

You have a couple of options here, you can check whether the key exists, or use some try/excepts.

Check whether key exists:

You can do this because tweepy returns a status object, which acts similarly to a json file, or python dictionary, and thus you essentially have a key:value pair. You should be able to use (going by your above code)

if 'extended_entities' in tweet:
    tweet_record['media_url'] = tweet.extended_entities.media_url

of course, the reverse is also possible

if 'extended_entities' not in tweet:
    #whatever you want to do

This could lead to problems though, what if the extended_entities exists, but for some reason media_url doesn't? And what if you want to get even more from within that (there isn't for a status object, but hey, I'm just trying to future proof here!) You'll have to do long, or multi nested if statements, which won't look the best

if 'extended_entities' in tweet:
    if 'media_url' in tweet['extended_entities']
        #etc

so it might be easier to just throw it in a try except...

try:
    tweet_record['media_url'] = tweet.extended_entities.media_url
except AttributeError:
    #etc

this means the program won't error when particular elements aren't found. AttributeError is for accessing an invalid attribute of an object. You of course may want to re-order this for readability. Keep in mind though, that while doing this is pythonic it can be a bit hard to read if used too often in my opinion.

I referred to this question when looking up things for this answer. Gives some good ideas for this sort of thing if you need further help.

Hope that helps.

Solution 2:[2]

Also, a good option is to use hasattr(Object, name) within an if-statement:

if hasattr(tweet, "extended_entities"):
\# do whatever

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Generic Snake
Solution 2 Paul Brennan