'Bypass Cloudflare with puppeteer

I am trying to scrape some startups data of a site with puppeteer and when I try to navigate to the next page the cloudflare waiting screen comes in and disrupts the scraper. I tried changing the IP but its still the same. Is there a way to bypass it with puppeteer.

(async () => {

  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
  });

  const page = await browser.newPage();

  page.setDefaultNavigationTimeout(0);

  let links = [];

  // initial page

  await page.goto(`https://www.startupranking.com/top/india`, {
    waitUntil: "networkidle0",
  });

  // looping through the url to different pages

  for (let i = 2; i <= 7; i++) {
    if (i === 3) {
      console.log("waiting");

      await page.waitFor(20000);

      console.log("waited");
    }

    const onPageLinks = await page.$$eval("tr .name a", (arr) =>
      arr.map((cur) => cur.href)
    );

    links = links.concat(onPageLinks);

    console.log(onPageLinks, "inside loop");

    await page.goto(`https://www.startupranking.com/top/india/${i}`, {
      waitUntil: "networkidle0",
    });
  }

  console.log(links, links.length, "outside loop");
})();

As it is only checking for the first loop i put in a waitFor to bypass the time it takes to check, it works fine on some IP's but on others it gives challenges to solve, I have to run this on a server so I am thinking of bypassing it completely.

Solution 1:^[1]

You can use the cloudflare-scraper package, which is also based on puppeteer.

taken from the documentation

installation

npm install cloudflare-scraper puppeteer

usage

const cloudflareScraper = require('cloudflare-scraper');

(async () => {
  try {
    const response = await cloudflareScraper.get('https://cloudflare-url.com');
    console.log(response);
  } catch (error) {
    console.log(error);
  }
})();

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	dcts

'Bypass Cloudflare with puppeteer

Solution 1:[1]

taken from the documentation

Sources

Related Questions

Solution 1:^[1]