'Scraping Wikipedia for information with Beautiful Soup
I managed to scrape wikipedia for names of US Presidents using Beautiful Soup. After which I converted them into dataframe.
names=[all the president's name]
wiki =[url of the president's wiki page]
combine = {'name':names,'wiki_url':wiki}
df = pd.DataFrame(combine)
df.index.name='id'
display(df)
id name wiki_url father mother
0 George Washington /wiki/George_Washington
1 John Adams /wiki/John_Adams
2 Thomas Jefferson /wiki/Thomas_Jefferson
How do I scrape through each president entry and update the dataframe with their father's and mother's names using Beautiful Soup?
I know for each page, there's the html <table class='infobox vcard", which contains the Father's and Mother's name?
Solution 1:[1]
from bs4 import BeautifulSoup
import requests
r = requests.get('https://en.wikipedia.org/wiki/George_Washington').text
s = BeautifulSoup(r, 'lxml')
for th in s.find_all('th', class_='infobox-label'):
if th.text == "Parent(s)":
td = th.next_sibling
for a in td.find_all('a'):
print(a.text)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | wallfell00 |