'Get all link text and href in a page using scrapy

class LinkSpider(scrapy.Spider):
    name = "link"
    def start_requests(self):
        urlBasang = "https://bloomberg.com"
        yield scrapy.Request(url = urlBasang, callback = self.parse)
    def parse(self, response):
        newCsv = open('data_information/link.csv', 'a')
        for j in response.xpath('//a'):

            title_to_save = j.xpath('/text()').extract_first()
            href_to_save= j.xpath('/@href').extract_first()

            print("test")

            print(title_to_save)
            print(href_to_save)

            newCsv.write(title_to_save+ "\n")
        newCsv.close()

this is my code but title_to_save and href_to_save return None

I want to get all text inside tag "a" and its href



Solution 1:[1]

You want

title_to_save = j.xpath('./text()').get()
href_to_save= j.xpath('./@href').get()

Note the dot before the path (I use get instead of extract_first due to this).

On the output csv, perhaps you are aware but you should probably yield the information you want to write out and then run your spider using the -o data_information/link.csv option which is a bit more flexible than opening a file for appending in your parse method. So your code would look something like

class LinkSpider(scrapy.Spider):
    name = "link"
    # No need for start_requests for as this is the default anyway
    start_urls = ["https://bloomberg.com"]  

    def parse(self, response):
        for j in response.xpath('//a'):

            title_to_save = j.xpath('./text()').get()
            href_to_save= j.xpath('./@href').get()

            print("test")
            print(title_to_save)
            print(href_to_save)

            yield {'title': title_to_save}

Solution 2:[2]

url: https://ingatlan.com/lista/elado+lakas+budapest

My snippet is:

'url': product.xpath("//a[@class='listing__thumbnail js-listing-active-area']/@href").get()

-----Getting the same urls in the output *epitesu-lakas/32609638"}, Everything else is fine but doing wrong when fetching the href attribute

{"eladasi_ar": " 29.65 M Ft ", "pernm2": " 417 606 Ft/m", "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": null, "pernm2": null, "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": " 59 M Ft ", "pernm2": " 1 229 167 Ft/m", "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": null, "pernm2": null, "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": " 25.35 M Ft ", "pernm2": " 507 000 Ft/m", "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": null, "pernm2": null, "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tomjn
Solution 2 Andronicus