'How to scrape an image src using puppeteer in NodeJS?
I'm trying to scrape the source of the first image with a specific class. On the page, there are multiple images with different additional classes but they share the class opwvks06
. I have tried the following:
(async () => {
let browser, page;
let url = 'https://www.facebook.com/radiosalue/photos/?ref=page_internal';
try {
browser = await puppeteer.launch({ headless: true });
page = await browser.newPage();
await page.setViewport({ width: 1366, height: 500 });
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
const image = await page.evaluate(() => {
const getImage = document
.querySelector('img[class="opwvks06"]')
.getAttribute('src');
return getImage;
});
console.log(image);
} catch (error) {
console.log(error.message);
} finally {
if (browser) {
await browser.close();
console.log('closing browser');
}
}
})();
However, this returns null. Following is the html structure.
Solution 1:[1]
To the answer Mike 'Pomax' Kamermans all you had to do was add:
await page.waitForSelector("img.opwvks06:first-child");
You can also try using Stealth Puppeteer if the site is protected from bots, but in your case it is not necessary. Here is the final code:
(async () => {
let browser, page;
let url = "https://www.facebook.com/radiosalue/photos/?ref=page_internal";
try {
browser = await puppeteer.launch({ headless: true });
page = await browser.newPage();
await page.setViewport({ width: 1366, height: 500 });
await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
await page.waitForSelector("img.opwvks06:first-child");
const image = await page.evaluate(() => {
const getImage = document.querySelector("img.opwvks06:first-child").getAttribute("src");
return getImage;
});
console.log(image);
} catch (error) {
console.log(error.message);
} finally {
if (browser) {
await browser.close();
console.log("closing browser");
}
}
})();
Output:
https://scontent.fiev13-1.fna.fbcdn.net/v/t39.30808-6/279856934_10159266106247585_585375152905621309_n.jpg?stp=dst-jpg_p206x206&_nc_cat=106&ccb=1-6&_nc_sid=8024bb&_nc_ohc=owbdAyQwP3wAX-8rdo5&_nc_ht=scontent.fiev13-1.fna&oh=00_AT8yJizEIWx8oEFLUBb90ZIIj-Q4WLmmiWtpd1aRVy-UkA&oe=627C10A5
closing browser
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Mikhail Zub |