'Can't grab coordinates from ArcGIS iframe in a webpage using requests

I've created a script to get coordinates (-119.412 49.023 in this case) from a map located in a webpage using requests module. When I try using my script below I get nothing. I know I can get that portion using selenium but I wish to get it done using requests module. I looked into the dev tools to find any clue as to how I can grab it but no luck.

This is where the coordinates is located.

import requests
from bs4 import BeautifulSoup

link = 'https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    print(soup.select_one("[data-dojo-attach-point='coordinateInfo']"))

How can I scrape the coordinates from that site using requests?



Solution 1:[1]

*The co-ordinates value entirely depends on JavaScript and requests module can't render JavaScript

** To see the co-ordinates value, need to scroll down by JavaScript ececution

*** The co-ordinates value is under an iframe

**** So to get co-ordinates value, You need an automation something like selenium

***** I use selenium4 pip install selenium and webdriverManager

******Don't use maximize_window_size(), if so, then it will say move the mouse and see the co-ordinates, normaly you can see the co-ordinate left downside after completing the execution of selenium

Script:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.chrome.options import Options

option = webdriver.ChromeOptions()

# Chrome to stay open 
option.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/')


wait = WebDriverWait(driver, 30)

# Execute Javascript  to scroll down to see the coordinates 
driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@class="bb-textmedia__content"]'))))

#Switch to iframe
driver.get(wait.until(EC.visibility_of_element_located((By.XPATH, '(//iframe)[1]'))).get_attribute('src'))

coordinates = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@class="coordinate-info jimu-float-leading jimu-align-leading"]'))).text.replace('Degrees','')
print(coordinates)

Output:

-119.554 49.229 

Solution 2:[2]

You can use requests-html, which will automatically download Chromium on the first render.
https://pypi.org/project/requests-html/

It doesn't get the content of the <iframe src="{}"> element though, so we .search() the iframe link, .render() that page separately, and then wait for the coordinateInfo to load.

import asyncio

from bs4 import BeautifulSoup
import requests_html

link = 'https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/'


async def get_content(page):
    content = await page.content()
    while 'coordinateInfo' not in content or 'loading...' in content:
        await asyncio.sleep(1)
        content = await page.content()
    await page.close()
    return content

with requests_html.HTMLSession() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
    res = s.get(link)
    iframe_link = res.html.search('iframe src="{}"')[0].replace('&amp;', '&')
    iframe_res = s.get(iframe_link)
    iframe_res.html.render(keep_page=True)
    content = s.loop.run_until_complete(get_content(iframe_res.html.page))
    soup = BeautifulSoup(content, "lxml")
    print(soup.select_one("[data-dojo-attach-point='coordinateInfo']"))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 F.Hoque
Solution 2 aaron