'Cannot scrape the correct aspect ration of the image - Python
I'm having a problem to extract an image from a "Manga" website using python. Below is the element example on the website:
- img id="comic" class="loading" onerror="this.src='data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7'; this.removeAttribute('onerror'); this.className = 'loaderror';" src="https://example_on_the_image.jpg"> == $0"
I'm able to parse out the "src" link & the image aspect ratio supposed to be as follow if using normal browser to view:
- Rendered size: 920 × 1301 px
- Rendered aspect ratio: 920∶1301
- Intrinsic size: 720 × 1018 px
- Intrinsic aspect ratio: 360∶509
- File size: 101 kB
- Current source: (url of the image)
Yet, the image that I have downloaded become "160 x 160px" & file size is lesser. I have tried using Beautifulsoup, Selenium etc, still getting the same result.
But if I using:
- the browser & right click to "Save Image As"
- Inspect -> on the image element -> right click -> Capture node screenshot
I was able to save "Rendered size" as the above 2 method using normal browsers. Why using python to scrape, I cannot get the correct aspect ratio??
Hope that somebody can guide me on this or where I did wrong, thanks.
Solution 1:[1]
''' Here's my Playwright code:
from playwright.sync_api import sync_playwright
manga_url = ("the url that u going to scrape")
dwn_path = your_directory
os.chdir(dwn_path)
with sync_playwright() as p:
browser = p.chromium.launch(headless=False, slow_mo=500)
page = browser.new_page()
page.goto(manga_url)
page.locator("#comic").screenshot(path="screenshot.png")
print(page.title())
browser.close()
Solution 2:[2]
Solved the problem, Selenium cannot "screenshot" the element in full render size but using Playwright can let me screenshot on the correct aspect ratio that display after browser loaded.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Tang Chee Ming |