'Get all link text and href in a page using scrapy
class LinkSpider(scrapy.Spider):
name = "link"
def start_requests(self):
urlBasang = "https://bloomberg.com"
yield scrapy.Request(url = urlBasang, callback = self.parse)
def parse(self, response):
newCsv = open('data_information/link.csv', 'a')
for j in response.xpath('//a'):
title_to_save = j.xpath('/text()').extract_first()
href_to_save= j.xpath('/@href').extract_first()
print("test")
print(title_to_save)
print(href_to_save)
newCsv.write(title_to_save+ "\n")
newCsv.close()
this is my code but title_to_save and href_to_save return None
I want to get all text inside tag "a" and its href
Solution 1:[1]
You want
title_to_save = j.xpath('./text()').get()
href_to_save= j.xpath('./@href').get()
Note the dot before the path
(I use get
instead of extract_first
due to this).
On the output csv, perhaps you are aware but you should probably yield
the information you want to write out and then run your spider using the -o data_information/link.csv
option which is a bit more flexible than opening a file for appending in your parse
method. So your code would look something like
class LinkSpider(scrapy.Spider):
name = "link"
# No need for start_requests for as this is the default anyway
start_urls = ["https://bloomberg.com"]
def parse(self, response):
for j in response.xpath('//a'):
title_to_save = j.xpath('./text()').get()
href_to_save= j.xpath('./@href').get()
print("test")
print(title_to_save)
print(href_to_save)
yield {'title': title_to_save}
Solution 2:[2]
url: https://ingatlan.com/lista/elado+lakas+budapest
My snippet is:
'url': product.xpath("//a[@class='listing__thumbnail js-listing-active-area']/@href").get()
-----Getting the same urls in the output *epitesu-lakas/32609638"}, Everything else is fine but doing wrong when fetching the href attribute
{"eladasi_ar": " 29.65 M Ft ", "pernm2": " 417 606 Ft/m", "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": null, "pernm2": null, "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": " 59 M Ft ", "pernm2": " 1 229 167 Ft/m", "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": null, "pernm2": null, "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": " 25.35 M Ft ", "pernm2": " 507 000 Ft/m", "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
{"eladasi_ar": null, "pernm2": null, "szoba_szam": null, "url": "/xi-ker/elado+lakas/tegla-epitesu-lakas/32609638"},
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | tomjn |
Solution 2 | Andronicus |