'Using lxml to query a website in Python 3?

I am currently trying to query the following website (HPBD) using lxml in Python 3. I need to scrape some information from each search result. For now, I have obtained a list of titles I would like to search (approx. 100). I here append the first five as an example:

for book in BOOK_TITLES:
    print('-', book, '\n')


- Apologie des ministres du culte, qui ont prêté la déclaration exigée par la loi du 7 vendém. an 4. contre les critiques de mm. Dédoyar & Vanhoren, les Motifs de Malines & autres brochures 

- Observations sur la déclaration exigée des ministres des cultes, en vertu de la loi du 7 vendémiaire, an 4 

- l'Eloquence chrétienne dans l'idée et dans la pratique 

- Verscheyde leeringen en exempelen der oude vaders 

- Godtvruchtige leeringen en gebeden voor de eerste communie

However, I am quite new to web scraping and started by playing around with BeatifulSoup to see if the idea was right, but could not get anything out of it (see code below).

import requests
import urllib.request
import time
import urllib
from bs4 import BeautifulSoup
from lxml import etree
from bs4 import BeautifulSoup as BS


for book in BOOK_TITLES:
    text = requests.get('https://kxp.k10plus.de/DB=1.77/SET=5/TTL=1/CMD?MATCFILTER=N&MATCSET=N&ACT0=&ACT=SRCHA&IKT=1016&SRT=YOP&ADI_IKT9200=&TRM=' + urllib.parse.quote(book)).text
    soup = BS(text)
    print(soup)

The problem is that the 'mutable' part of the URL for, e.g., title 3 in the list above is:

l%27Eloquence+chrétienne+dans+l%27idée+et+dans+la+pratique

While urllib.parse.quote(book) returns:

l%27Eloquence%20chr%C3%A9tienne%20dans%20l%27id%C3%A9e%20et%20dans%20la%20pratique

I am using Google Chrome and Python 3. Any fix for the above code and suggestions for alternatives using lxml are appreciated.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Using lxml to query a website in Python 3?

Sources

Related Questions