'How to query count of all tags on a Stack Exchange site in a single request

I'm experimenting with some machine learning techniques. In this case PSO-KMean for clustering.

I thought I might test it out by hitting the Stack Exchange API up to grab a list of tags and a count of each tag, then convert that into a array of floats representing each sites position in "tag-space"

I am using Py-Stack-Exchange

from stackauth import StackAuth
import stackexchange 

site_data = {}
n_sites= 20
for site_auth in StackAuth().sites()[3:n_sites+3]: #Skip big 3
    site=site_auth.get_site()
    site_tags = {}
    for tag in site.all_tags():
        site_tags[(tag.name)]=tag.count
    site_data[site.domain] = site_tags

Now this must have gone over the 10,000 requests limit after I messed around with it a few times because I got StackExchangeError: 502 [throttle_violation]: too many requests from this IP, more requests available in 81719 seconds

So I guess it is making a request for each and every tag on the site to get its count. This is no good for anyone, it is slower for me, and more work on the Stack Exchange Infrastructure. I feel like there must be a way to get the information in 1 hit per site, but am not familiar enough with the API to work it out.



Solution 1:[1]

You can not pull all the tags with only 1 API call. On Stack Overflow, alone, there are 38,484 tags as of this answer. At 100 tags per page, that means you have to make 385 separate calls.

An alternative to the API for this problem, may be to utilize the Data Explorer. Without more details, I can point you at a very simple query that pulls all tag information for Stack Overflow:

select * from tags

This information is updated on a weekly basis, so it is not live data.

Finally, you could use the data dump for off line analysis. This is a large archive that Stack Exchange makes available on a quarterly basis (approximately). Fortunately, the newest dump is from September 2014, so it is fairly up to date.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community