'Webscraping Symfony/Panther: Can't get HTML
I want to scrape a site with the symfony panther package within a Laravel application. According to the documentation https://github.com/symfony/panther#a-polymorphic-feline I cannot use the HttpBrowser
nor the HttpClient
classes because they do not support JS.
Therefore I try to use the ChromClient which uses a local chrome executable and a chromedriver binary shipped with the panther package.
$client = Client::createChromeClient();
$crawler = $client->request('GET', 'http://example.com');
dd($crawler->html());
Unfortunately, I only receive the empty default chrome page as HTML:
<html><head></head><body></body></html>
Every approach to do something else with the $client
or the $crawler
-instance leads to an error "no nodes available".
Additionally, I tried the basic example from the documentation https://github.com/symfony/panther#basic-usage --> same result.
I'm using ubuntu 18.04 Server under WSL on Windows and installed the google-chrome-stable
deb-package. This seemed to work because after the installation the error "the binary was not found" does not longer occur.
I also tried to manually use the executable of the Windows host system but this only opens an empty CMD window always reopened when closing. I have to kill the process via TaskManager.
Is this because the Ubuntu server does not have any x-server available?
What can I do to receive any HTML?
Solution 1:[1]
$client = Client::createChromeClient();
$crawler = $client->request('GET', 'http://example.com');
/**
* Get all Html code of page
*/
$client->getCrawler()->html();
/**
* For example to filter field by ID = AuthenticationBlock and get text
*/
$loginUsername = $client->getCrawler()->filter('#AuthenticationBlock')->text();
Solution 2:[2]
So, I'm probably late, but I got the same problem with a pretty easy solution: Just open a simple crawler with the response content.
This one differs from the Panther DomCrawler especially in methods, but it is is safer to evaluate HTML structures.
$client = Client::createChromeClient();
$client->request('GET', 'http://example.com');
$html = $client->getInternalResponse()->getContent();
$crawler = new Symfony\Component\DomCrawler\Crawler($html);
// you can use following to get the whole HTML
$crawler->outerHtml();
// or specific parts
$crawler->filter('.some-class')->outerHtml();
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dharman |
Solution 2 | Sengorius |