'Webscraping Symfony/Panther: Can't get HTML

I want to scrape a site with the symfony panther package within a Laravel application. According to the documentation https://github.com/symfony/panther#a-polymorphic-feline I cannot use the HttpBrowser nor the HttpClient classes because they do not support JS.

Therefore I try to use the ChromClient which uses a local chrome executable and a chromedriver binary shipped with the panther package.

$client = Client::createChromeClient();
$crawler = $client->request('GET', 'http://example.com');
dd($crawler->html());

Unfortunately, I only receive the empty default chrome page as HTML:

<html><head></head><body></body></html>

Every approach to do something else with the $client or the $crawler-instance leads to an error "no nodes available".

Additionally, I tried the basic example from the documentation https://github.com/symfony/panther#basic-usage --> same result.

I'm using ubuntu 18.04 Server under WSL on Windows and installed the google-chrome-stable deb-package. This seemed to work because after the installation the error "the binary was not found" does not longer occur.

I also tried to manually use the executable of the Windows host system but this only opens an empty CMD window always reopened when closing. I have to kill the process via TaskManager.

Is this because the Ubuntu server does not have any x-server available?
What can I do to receive any HTML?



Solution 1:[1]

$client = Client::createChromeClient();
$crawler = $client->request('GET', 'http://example.com');

/**
* Get all Html code of page
*/

$client->getCrawler()->html();

/**
* For example to filter field by ID = AuthenticationBlock and get text
*/

$loginUsername = $client->getCrawler()->filter('#AuthenticationBlock')->text();

Solution 2:[2]

So, I'm probably late, but I got the same problem with a pretty easy solution: Just open a simple crawler with the response content.

This one differs from the Panther DomCrawler especially in methods, but it is is safer to evaluate HTML structures.

$client = Client::createChromeClient();
$client->request('GET', 'http://example.com');

$html = $client->getInternalResponse()->getContent();
$crawler = new Symfony\Component\DomCrawler\Crawler($html);

// you can use following to get the whole HTML
$crawler->outerHtml();

// or specific parts
$crawler->filter('.some-class')->outerHtml();

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dharman
Solution 2 Sengorius