'HTTP error 403 in Python 3 web scraping the publications
This is the traceback of the error that is happening when I am trying to put the URL of the publication. It works for the regular websites such as Stack Overflow or Wikipedia, but when I try it on the publications such as https://www.sciencedirect.com/science/article/pii/S1388248120302113?via%3Dihub, The error shows up.
Here is my code:
req = Request(' https://www.sciencedirect.com/science/article/pii/S1388248120302113?via%3Dihub', headers={'User-Agent': 'Mozilla/5.0'})
html_plain = urlopen(req).read()
Here is the traceback of the error:
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Solution 1:[1]
The HTTP 403 Forbidden client error status response code indicates that the server understood the request but refuses to authorize it. This is not an error in the code, this is the website refusing to server that page when you are doing the web scraping.
Solution 2:[2]
try:
# If you're getting a 403 response, use this. ("HTTP error occurred: 403 Client Error: Forbidden")
user_agent = 'Mozilla/5.0'
response = requests.get(url, headers={'User-Agent': user_agent})
# If the response was successful, no Exception will be raised
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}') # Python 3.6
except Exception as err:
print(f'Other error occurred: {err}') # Python 3.6
else:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DisappointedByUnaccountableMod |
Solution 2 | user19063751 |