'Some websites dont fully load/render in selenium headless mode

So I have a problem that I have been noticing with selenium when I run it headless where some pages don't totally load/render some elements. I don't exactly know what's happening not to load 100%; maybe JS not running?

My code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from decouple import config
from time import sleep

DEBUG = config('DEBUG')

class DiscordME(object):
    def __init__(self):
        self.LINUX = config('LINUX', cast=bool)
        self.DRIVER_VERSION = config('DRIVER_VERSION')
        self.HEADLESS = True

        options = Options()
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-gpu')
        options.add_argument('--ignore-certificate-errors')
        options.add_argument('--disable-extensions')
        options.add_argument('--disable-dev-shm-usage')
        if self.HEADLESS:
            options.add_argument('--headless')
            options.add_argument('--window-size=1920,1200')

        if self.LINUX:
            self.browser = webdriver.Chrome(executable_path=f'./drivers/chromedriver-{self.DRIVER_VERSION}', options=options)
        else:
            self.browser = webdriver.Chrome(executable_path=f'.\drivers\chromedriver-{self.DRIVER_VERSION}.exe', options=options)

    def get_website(self):
        self.browser.get('https://discord.me/login')
        WebDriverWait(self.browser, 10).until(
            EC.url_changes('https://discord.me/login')
        )
        print(self.browser.current_url)
        print(self.browser.page_source)
        #print(self.browser.find_element_by_xpath('//*[@id="app-mount"]/div[2]/div/div[2]/div/div/form/div/div/div[1]/div[3]/div[1]/div/div[2]/input'))

DiscordME().get_website()

In this script, it doesn't load the login inputs when it accesses the discord API login page. As I can see in the page_source I noticed that the page is not being mounted so that could be the problem.



Solution 1:[1]

from selenium import webdriver
from time import sleep

options = webdriver.ChromeOptions()
options.add_argument("--window-size=1920,1080")
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument(
    "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
browser = webdriver.Chrome(options=options)

some websites uses user-agent to detect whether the browser is in headless mode or not as headless browser uses a different user-agent than normal browser. So explicitly set user agent.

Headless browser detection

Solution 2:[2]

Another thing to to consider if you are having trouble loading a website with selenium is the processing power.

I was using a Micro AWS instance with a single CPU which worked for many websites, but when I came to a more complex one it kept intermittently getting 0 elements when conducting a search like find_elements_by_xpath('//a[@href]') while sometimes it would work successfully and find the hyperlinks. I upgraded the instance to one with more CPUs (4, but 2 would probably have been sufficient) and that allowed me to fully load the site and scrape the elements.

I would definitely try the other two solutions posted here first (chrome options or firefox browser), but processing power could be the problem as well.

Solution 3:[3]

I Just would like to share my experience on this as solving the issue consumed much of my time trying many options and settings for Chrome webdriver.

The user-agent setting solved the problem for some websites I scraped. but, for some other websites the only solution worked with me was to use FireFox webdriver instead of Chrome as per following :

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

fireFoxOptions = Options()  
fireFoxOptions.add_argument("--headless") 
fireFoxOptions.add_argument("--window-size=1920,1080")
fireFoxOptions.add_argument('--start-maximized')
fireFoxOptions.add_argument('--disable-gpu')
fireFoxOptions.add_argument('--no-sandbox')

driver = webdriver.Firefox(options=fireFoxOptions, 
executable_path=r'C:\[your path to firefox webdriver exe file]\geckodriver.exe')

driver.get('https://discord.me/login')

Use the link here to download latest geckodriver for FireFox, and make sure FireFox browser is already installed in you machine.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PDHide
Solution 2
Solution 3 Rola