'puppeteer bypass cloudflare by enable cookies and Javascript
(In nodeJs -> server side only).
I'm doing some webscraping and some pages are protected by the cloudflare anti-ddos page. I'm trying to bypasse this page. By searching around I found a lot of article on the stealth methode or reCapcha. But the thing is cloudflare is not even trying to give me capcha, it keep being stuck on the page (wait for 5 secondes) because it display in red (TURN ON JAVASCRIPT AND RELOAD) and (TURN ON COOKIES AND RELOAD), by the way my javascript seems to be active because my programme run on a lot of website and it process the javascript.
This is my code:
//vm = this;
vm.puppeteer.use(vm.StealthPlugin())
vm.puppeteer.use(vm.AdblockerPlugin({
blockTrackers: true
}))
let browser = await vm.puppeteer.launch({
headless: true
});
let browserPage = await browser.newPage();
await browserPage.goto(link, {
waitUntil: 'networkidle2',
timeout: 40 * 1000
});
await browserPage.waitForTimeout(20 * 1000);
let body = await browserPage.evaluate(() => {
return document.documentElement.outerHTML;
});
I also try to delete stealthPlugin and AdblockerPlugin but cloodflare keeping telling me there is no javascript and cookies.
Can anyone help me please ?
Solution 1:[1]
Setting your own UserAgent and Accept-Language header should work because your headless browser needs to pretend like a real person who is browsing.
You can use page.setExtraHTTPHeaders() and page.setUserAgent() to do so.
await browserPage.setExtraHTTPHeaders({
'Accept-Language': 'en'
});
// You can use any UserAgent you want
await browserPage.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36');
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matt |