'scraping page with no output to desired links

I wanted to scrape this webpage:

http://protected.to/f-42cbf8ce2521d615

But I have to click on "continue to folder" to get to those links. I cannot see these links in the HTML source, only when I physically use a mouse to click on the "continue to folder" button.

How can I avoid that physical click to get to those links in the website?

I am new to web scraping so please help me solve this issue.

Thanks for your attention and time.

Ozooha

import requests
from bs4 import BeautifulSoup

s = requests.Session()
url='http://protected.to/f-c9036f7a236b1511'
r = s.get(url)
soup = BeautifulSoup(r.text, features="html.parser")

params = {i['name']:i.get('value') for i in soup.find('div', {'class':'col-md-12 text-center'}).find_all('input')}
headers = {"Host": "protected.to", 
           "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0",
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
           "Accept-Language": "en-US,en;q=0.5",
           "Accept-Encoding": "gzip, deflate",
           "Connection":"keep-alive",
           "Cookie": r.headers['Set-Cookie'],
           "Upgrade-Insecure-Requests": "1", 
           "Sec-GPC": "1",
           "DNT": "1"}

print(params)
r_ = s.post(url, headers = headers, cookies = r.cookies, params=params)
print(r_.status_code)


Solution 1:[1]

You can use complex libraries written for behaving like user, selenium. But I would go to simple .click() to the button then parse the HTML.

const button = document.querySelector('[value="Continue to folder"]');
button.click();
// Parse the HTML

Solution 2:[2]

"Continue to Folder" is a submit button for the form which POSTs the "__RequestVerificationToken" value and the slug token to the page to display the contents of the folder.

So, in theory - you have to parse the HTML in http://protected.to/f-42cbf8ce2521d615 to extract the value of the hidden field "__RequestVerificationToken" that's the input name holding that token value; to obtain the slug token you need to look between the tags, you will see it dynamically creates a slug token when you load the page;

Once you got that value, you'll have to make a POST to the same URL http://protected.to/f-42cbf8ce2521d615 with the token and slug, the contents of the body will look something like this: __RequestVerificationToken=8BYeNPftVEEivO2imhtWIuWAb0mjhPg-5pAhq1mlpL_pTyYR1AyScbfqB8QZDudwGY_1LkV79FCDgpyffRPuktApd2ZQYBdi2ySA5ATUZ601&Slug=42cbf8ce2521d615

The above would return the page with the folder contents; you can replicate what I am saying above by simply opening up Dev tools and inspecting what happens when you hit 'Continue to Folder', you can see the POST made with the contents along with elements of the page which contain the items needed to make the POST call (the verification token and slug token).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Umut Gerçek
Solution 2 Ilan P