'Strange errors with asynchronous requests
async def rss_downloader(rss):
global counter
async with download_limit:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'
}
try:
response = await httpx.get(rss, headers=headers, verify=False)
if response.status_code == 200:
r_text = response.text
await downloaded_rss.put({'url': rss, 'feed': r_text})
else:
counter += 1
print(f'№{counter} - {response.status_code} - {rss}')
except (
ConnectTimeout, ConnectionClosed
):
not_found_rss.append(rss)
except Exception:
not_found_rss.append(rss)
logging.exception(f'{rss}')
async def main():
parser_task = asyncio.create_task(parser_queue())
tasks = [
asyncio.create_task(rss_downloader(item['url'])) for item in db[config['mongodb']['channels_collection']].find({'url': {'$ne': 'No RSS'}})
]
await asyncio.gather(*tasks, parser_task)
Very often, this code cannot load some pages, causing various errors. Here is an example of some errors. But when I try to load the same pages one at a time, everything is fine:
In [1]: import httpx
In [2]: r = await httpx.get('http://www.spinmob.com/nirvanictrance.xml')
In [3]: r
Out[3]: <Response [200 OK]>
In [4]:
As a semaphore, I set a limit of 20 workers, which is not so much, I tried less and less - all the same, these errors appear. Why can this happen and what can I do about it?
Solution 1:[1]
The httpx
documentation covers the ReadTimeout
you are experiencing:
HTTPX is careful to enforce timeouts everywhere by default. The default behavior is to raise a TimeoutException after 5 seconds of network inactivity. The read timeout specifies the maximum duration to wait for a chunk of data to be received (for example, a chunk of the response body). If HTTPX is unable to receive data within this time frame, a
ReadTimeout
exception is raised.
Try first disabling the timeout duration for reads (adapted from the example in the above link):
timeout = httpx.Timeout(10.0, read_timeout=None)
response = await httpx.get(rss, headers=headers, verify=False, timeout=timeout)
And then experiment with different timeout durations to see what is reasonable for your use-case.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Fernando Gomes |