'iterate through the links of a website recursively
I am new to scrapy and I wrote a function that should iterate through all the links of a website recursively and get me the links at the end with '.csv' or '.pdf' that contain the word 'XYZ' in the link. Here is what I wrote but it's returning me nothing. What am I doing wrong?
def parse(self, response):
for each in response.xpath('//a/@href').getall():
if each.endswith(".csv") or each.endswith(".pdf") and "XYZ" in each:
mylist.append(each)
yield response.follow(each, self.parse)
Solution 1:[1]
Please provide an example of the url you're working with because it helps detect if the issues are xpath
related or something else, more quickly.
My guess is the following:
- You likely want to add 4 spaces to yield so it's within the
if
statement, given you only want those with.csv
or.pdf
. - your
xpath
is not returning the links so nothing is being followed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | dollar bill |