'Scraping First post from phpbb3 forum by Python
I have alink like that
http://www.arabcomics.net/phpbb3/viewtopic.php?f=98&t=71718
the link has LINKS in first post in phpbb3 forum
How I get LINKS in first post
I tried this but not working
import requests
from bs4 import BeautifulSoup as bs
url = 'http://www.arabcomics.net/phpbb3/viewtopic.php?f=98&t=71718'
response= requests.get(url)
soup = bs(response.text, 'html5lib')
itemstr= soup.findAll('div',{'class':'postbody'})
for link in itemstr.findAll('a'):
links = link.get('href')
print(links)
Solution 1:[1]
Big oof my man, just use regex for this ? No need to use bs, also regex will work even if they remake site.
import re
myurlregex=re.compile(r'''(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))\" class=\"postlink\"''')
url = re.findall(myurlregex,response.text)[0]
Also as a coder regex is one of skills u will need always.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Strings |