'Extract table from the website using scrapy
I want to scrape text from table they will give me data but they will not given me data in these format shown in pic
from scrapy import Spider
from scrapy.http import Request
class AuthorSpider(Spider):
name = 'book'
start_urls = ['https://blogsrl.it/gb/4-no-food']
def parse(self, response):
books = response.xpath("//h3/a/@href").extract()
for book in books:
url = response.urljoin(book)
yield Request(url, callback=self.parse_book)
def parse_book(self, response):
rows=response.xpath("//dl[@class='data-sheet']")
details={}
for row in rows:
key = row.xpath('.//dt//text()').get(default='').strip()
value=row.xpath('.//dd/text()').getall()
value = ''.join(value).strip()
details[key] = value
yield details
Solution 1:[1]
I think the issue is with your XPATH. Your XPATH will not return you a list instead it will return the string since it's only targeting 1 element.
Perhaps try
rows=response.xpath("//dl[@class='data-sheet']//dt | //dl[@class='data-sheet']//dd ")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | CodeWithAwais |