'Web Scrape pagination in a single URL (cheerio and axios)

newbie here. I was on web scraping project. And I wanted some guide on web scraping pagination technique. I'm scraping this site https://www.imoney.my/unit-trust-investments. As you can see ,I wanted to retrieve different "Total return" percentage based on Xyears. Right now I'm using cheerio and axios.

const http = require("http");
const axios = require("axios");
const cheerio = require("cheerio");

http
    .createServer(async function (_, res) {
        try {
            const response = await axios.get(
                "https://www.imoney.my/unit-trust-investments"
            );

            const $ = cheerio.load(response.data);

            const funds = [];
            $("[class='list-item']").each((_i, row) => {
                const $row = $(row);

                const fund = $row.find("[class*='product-title']").find("a").text();
                const price = $row.find("[class*='is-narrow product-profit']").find("b").text();
                const risk = $row.find("[class*='product-title']").find("[class*='font-xsm extra-info']").text().replace('/10','');;
                const totalreturn = $row.find("[class*='product-return']").find("[class='font-lg']").find("b").text().replace('%','');

                funds.push({ fund, price, risk, totalreturn});
            });
            
            res.statusCode = 200;
            res.write(JSON.stringify(funds, null, 4));
        } catch (err) {
            res.statusCode = 400;
            res.write("Unable to process request.");
        }
        res.end();
    })
    .listen(8080);

do note, the URL does not change when different year is selected, only the value for total return is changed



Solution 1:[1]

This happens because the page uses javascript to generate the content. In this case, you need something like Puppeteer. That's what you need:

const puppeteer = require("puppeteer");

const availableFunds = "10000";
const years = 2; // 3 for 0.5 years; 2 for 1 year; 1 for 2 years, 0 for 3 years.

async function start() {
  const browser = await puppeteer.launch({
    headless: false,
  });

  const page = await browser.newPage();
  await page.goto("https://www.imoney.my/unit-trust-investments");
  await page.waitForSelector(".product-item");

  await page.focus("#amount");
  for (let i = 0; i < 5; i++) {
    await page.keyboard.press("Backspace");
  }
  await page.type("#amount", availableFunds);

  await page.click("#tenure");
  for (let i = 0; i < years; i++) {
    await page.keyboard.press("ArrowUp");
  }
  await page.keyboard.press("Enter");
  const funds = await page.evaluate(() => {
    const funds = [];
    Array.from(document.querySelectorAll(".product-item")).forEach((el) => {
      const fund = el.querySelector(".title")?.textContent.trim();
      const price = el.querySelector(".investmentReturnValue")?.textContent.trim();
      const risk = el.querySelector(".col-title .info-desc dd")?.textContent.trim();
      const totalreturn = el.querySelector(".col-rate.text-left .info-desc .ir-value")?.textContent.trim();
      if (fund && price && risk && totalreturn) funds.push({ fund, price, risk, totalreturn });
    });
    return funds;
  });

  console.log(funds);

  browser.close();
}

start();

Output:

[
  {
    fund: 'Aberdeen Standard Islamic World Equity Fund - Class A',
    price: 'RM 12,651.20',
    risk: 'Medium\n                                7/10',
    totalreturn: '26.51'
  },
  {
    fund: 'Affin Hwang Select Balanced Fund',
    price: 'RM 10,355.52',
    risk: 'Medium\n                                5/10',
    totalreturn: '3.56'
  },
... and others

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mikhail Zub