'Why is the Python Requests Module not returning links?

So I had created a python web scraper for my college capstone project that scraped around the web and followed links based on a random selection from the page. I utilized Python's request module to return links from a get request. I had it working flawlessly along with a graphing program that showed the program working in real time. I fired it up to show my professor and now the .links returns an empty dictionary for every single website.

Originally I had added a skip for any site that returned no links, but now all of them are returning empty. I've reinstalled Python, reinstalled the requests module, and tried feeding the program websites manually and I cannot seem to find a reason for the change.

For reference, I have been using Portswigger.net as a baseline to test the .links to see if I get them returned. It worked before, and now it does not.

Here is the get request sample:

import requests

Url = "https://portswigger.net"

def GetRequest(url):
    with requests.get(url=Url) as response:
        try:
            links = response.links
            if links:
                return links
            else:
                return False
        except Exception as error:
            return error

print(GetRequest(Url))

UPDATE So out of the 200 sites I tested this morning, the only one to return links was kite.com. It returned the links no problem and my program was able to follow them and collect the data. Literally a week ago the whole program would run fine and return page links from almost every single website.



Solution 1:[1]

Requests.Response.links doesn't work like that [1]. It looks for Links in the Header, not link elements in the Response body.

What you want is to extract link elements from the Response body, so I would recommend something like lxml or beautifulsoup.

Seeing as this is fairly common and straight forward and this is a school project, I'll leave that task up to the reader.

[1] - https://docs.python-requests.org/en/latest/api/#requests.Response.links

Solution 2:[2]

Parsing links with beautifulsoup4 is a possible solution:

import requests
from bs4 import BeautifulSoup


def get_links(url: str) -> list[str]:
    with requests.get(url) as response:
        soup = BeautifulSoup(response.text, features='html.parser')
        links = []
        for link in soup.find_all('a'):
            target = link.get('href')
            if target.startswith('http'):
                links.append(target)

    return links       

links = get_links('https://portswigger.com')
print(*links, sep='\n')

# https://forum.portswigger.net/
# https://portswigger.net/web-security
# https://portswigger.net/research
# https://portswigger.net/daily-swig
# ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 plunker
Solution 2 Stefan B