'How to Load more/show more pagination with scrapy-selenium
Getting response but scraping nothing!
import scrapy
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
class ProductSpider(scrapy.Spider):
name = "card"
allowed_domains = ['moneyfacts.co.uk']
start_urls = ['https://moneyfacts.co.uk/credit-cards/balance-transfer-credit-cards/?fbclid=IwAR05-Sa1hIcYTRx8DXYYQd0UfDRjWF-jD2-u51jiLP-WKlkxSddKjzUcnWA']
def __init__(self):
self.driver = webdriver.Chrome()
def parse(self, response):
self.driver.get(response.url)
actions = ActionChains(self.driver)
while True:
next = self.driver.find_elements_by_css_selector("button#show-more")
if next:
last_height = self.driver.execute_script("return document.body.scrollHeight")
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
actions.move_to_element(next[0]).click().perform()
lists= Selector(text=self.driver.page_source)
for list in lists.xpath('//ul[@id="finder-table"]/li'):
yield{
'Name': list.xpath('.//*[@class="table-item-heading-product-name"]/span/strong/text()').get(),
'Title': list.xpath('.//*[@class="table-item-heading-product-name"]/span/text()').get()
}
else:
break
self.driver.close()
Solution 1:[1]
I guess you need to scroll to the "show more" button before clicking on it since it is not on the visual area of the screen until you scroll the screen down.
Also, it's better to locate the element according to class name rather to it's text.
Also, in case there is no more "show more" buttons there your code will throw exception. So I used find_elements
instead of what you wrote to get the elements list. This will not throw exception. In case no elements found it will return an empty list and your code will exit normally. In case element found you will use the first element in the returned list.
This is what I have finally re-building your code:
import scrapy
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
class ProductSpider(scrapy.Spider):
name = "card"
allowed_domains = ['moneyfacts.co.uk']
start_urls = ['https://moneyfacts.co.uk/credit-cards/balance-transfer-credit-cards/?fbclid=IwAR05-Sa1hIcYTRx8DXYYQd0UfDRjWF-jD2-u51jiLP-WKlkxSddKjzUcnWA']
def __init__(self):
self.driver = webdriver.Chrome()
def parse(self, response):
self.driver.get(response.url)
actions = ActionChains(self.driver)
while True:
next = driver.find_elements_by_css_selector("button#show-more")
if next:
last_height = driver.execute_script("return document.body.scrollHeight")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
actions.move_to_element(next[0]).click().perform()
lists = self.driver.find_elements_by_xpath(
'//ul[@id="finder-table"]/li')
for list in lists:
yield{
'Name': list.xpath('.//*[@class="table-item-heading-product-name"]/span/strong/text()').get(),
'Title': list.xpath('.//*[@class="table-item-heading-product-name"]/span/text()').get()
}
else:
break
self.driver.close()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |