'soup.find() function is not working, how do I find the ID value?

If I have the following HTML that was found with BeautifulSoup, can someone explain why print(soup.find(id="style")) or print(soup.find(id="id")) does not work? I am trying to find the id number specifically in the line

<td style="text-align:center"><a href="?id=6359075900">6359075900</a></td>

 </span>
<br/><br/>
<table>
<tr>
<th class="outer">Criteria</th>
<td class="outer">Type: Identity    Match: ILIKE    Search: 'example.org'</td>
</tr>
</table>
<br/>
<table>
<tr>
<th class="outer">Certificates</th>
<td class="outer">
<table>
<tr>
<th>
<a href="?q=example.org&amp;dir=v&amp;sort=0&amp;group=none">crt.sh ID</a>
</th>
<th style="white-space:nowrap">
       <a href="?q=example.org&amp;dir=v&amp;sort=1&amp;group=none">Logged At</a> 
 ⇧    </th>
<th style="white-space:nowrap"><a href="?q=example.org&amp;dir=v&amp;sort=2&amp;group=none">Not Before</a>
</th>
<th style="white-space:nowrap"><a href="?q=example.org&amp;dir=v&amp;sort=4&amp;group=none">Not After</a>
</th>
<th>Common Name</th>
<th>Matching Identities</th>
<th>
<a href="?q=example.org&amp;dir=v&amp;sort=3&amp;group=none">Issuer Name</a>
</th>
</tr>
<tr>
<td style="text-align:center"><a href="?id=6359075900">6359075900</a></td>
<td style="text-align:center;white-space:nowrap">2022-03-17</td>
<td style="text-align:center;white-space:nowrap">2022-03-14</td>
<td style="text-align:center;white-space:nowrap">2023-03-14</td>
<td>www.example.org</td>
<td>example.org<br/>www.example.org</td>
<td><a href="?caid=185756" style="white-space:normal">C=US, O=DigiCert Inc, CN=DigiCert TLS RSA SHA256 2020 CA1</a></td>
</tr>

Solution 1:^[1]

This should do it (it will find the displayed number, not the value of the id parameter in the link, but I assume it is the same):

from bs4 import BeautifulSoup
import re

f = open("index.html")      # this is your HTML
soup = BeautifulSoup(f, 'html.parser')
res = soup.find_all(href=re.compile("\?id"))
print(res[0].contents[0])   # 6359075900

This works with your example. If you have more than one links with data to extract, you will need to change the regex in the compile parameter and iterate through the results instead of using hardcoded indexes as the [0] in the code above.

Solution 2:^[2]

Main issue, there is no tag with attribute called id in your soup, so you wont find() anything.

Try to select your elements more specific e.g. with css selectors -> all href that contains parameter ?id :

soup.select('a[href*="?id"]')

Example

from bs4 import BeautifulSoup

html = '''
<tr>
<td style="text-align:center"><a href="?id=6359075900">6359075900</a></td>
<td><a href="?caid=185756" style="white-space:normal">C=US, O=DigiCert Inc, CN=DigiCert TLS RSA SHA256 2020 CA1</a></td>
</tr>
<tr>
<td style="text-align:center"><a href="?id=6359075900">6359075901</a></td>
<td><a href="?caid=185756" style="white-space:normal">C=US, O=DigiCert Inc, CN=DigiCert TLS RSA SHA256 2020 CA1</a></td>
</tr>
<tr>
<td style="text-align:center"><a href="?id=6359075900">6359075902</a></td>
<td><a href="?caid=185756" style="white-space:normal">C=US, O=DigiCert Inc, CN=DigiCert TLS RSA SHA256 2020 CA1</a></td>
</tr>
'''
soup = BeautifulSoup(html)
for a in soup.select('a[href*="?id"]'):
    print(a.text)

Output

6359075900
6359075901
6359075902

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	evilmandarine
Solution 2	HedgeHog

'soup.find() function is not working, how do I find the ID value?

Solution 1:[1]

Solution 2:[2]

Example

Output

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]