'How to scrape all data from first page to last page using beautifulsoup

I have been trying to scrape all data from the first page to the last page, but it returns only the first page as the output. How can I solve this? Below is my code:

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from time import sleep
from random import randint


pages = np.arange(2, 1589, 20)

for page in pages:
    page = requests.get( "https://estateintel.com/app/projects/search?q=%7B%22sectors%22%3A%5B%22residential%22%5D%7D&page="+str(page))
    sleep(randint(2,10))

    soup = BeautifulSoup(page.content, 'html.parser')
    lists = soup.find_all('div', class_="project-card-vertical h-full flex flex-col rounded border-thin border-inactive-blue overflow-hidden pointer")

    for list in lists:
        title = list.find('p', class_ ="project-location text-body text-base mb-3").text. replace ('\n', '',).strip()
        location = list.find('span', class_ ="text-gray-1").text. replace ('\n', '',).strip()
        status = list.find('span', class_ ="text-purple-1 font-bold").text. replace ('\n', '',).strip()
        units = list.find('span', class_ ="text-body font-semibold").text. replace ('\n', '',).strip()

        info = [title,location,status,units]
        print(info)

Solution 1:^[1]

The page is loaded dynamically using the API. Therefore, with a regular GET request, you will always get the first page. You need to study how the page communicates with the browser and find the request you need, I wrote an example for review.

import json
import requests


def get_info(page):
    url = f"https://services.estateintel.com/api/v2/properties?type\\[\\]=residential&page={page}"

    headers = {
        'accept': 'application/json',
        'authorization': 'false',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36'
    }
    response = requests.request("GET", url, headers=headers)
    json_obj = json.loads(response.text)
    for data in json_obj['data']:
        print(data['name'])
        print(data['area'], data['state'])
        print(data['status'])
        print(data['size']['value'], data['size']['unit'])
        print('------')


for page in range(1, 134):
    get_info(page)

You can choose the fields you need, this is just an example, also add to dataframe. Output:

Twin Oaks Apartment
Kilimani Nairobi
Completed
0 units
------
Duchess Park
Lavington Nairobi
Completed
62 units
------
Greenvale Apartments
Kileleshwa Nairobi
Completed
36 units
------
The Urban apartments & Suites
Osu Greater Accra
Completed
28 units
------
Chateau Towers
Osu Greater Accra
Completed
120 units
------
Cedar Haus Gardens
Oluyole Oyo
Under Construction
38 units
------
10 Agoro Street
Oluyole Oyo
Completed
1 units
..............

Solution 2:^[2]

Think it is working well, but needs the time to sleep - Just in case, you could select your elements more specific e.g. with css selectors and store information in a list of dicts instead just printing it.

Example

import pandas as pd
import requests
from bs4 import BeautifulSoup
from time import sleep
from random import randint

data = []
for page in range(1,134):
    print(page)
    page = requests.get( "https://estateintel.com/app/projects/search?q=%7B%22sectors%22%3A%5B%22residential%22%5D%7D&page="+str(page))
    sleep(randint(2,10))

    soup = BeautifulSoup(page.content, 'html.parser')

    for item in soup.select('div.project-grid > a'):
        data.append({
            'title' : item.h3.text.strip(),
            'location' : item.find('span', class_ ="text-gray-1").text.strip(),
            'status' : item.find('span', class_ ="text-purple-1 font-bold").text.strip(),
            'units' : item.find('span', class_ ="text-body font-semibold").text.strip()
        })
pd.DataFrame(data)

Output

	title	location	status	units
0	Twin Oaks Apartment	Kilimani, Nairobi	Completed	Size: --
1	Duchess Park	Lavington, Nairobi	Completed	Size: 62 units
2	Greenvale Apartments	Kileleshwa, Nairobi	Completed	Size: 36 units
3	The Urban apartments & Suites	Osu, Greater Accra	Completed	Size: 28 units
4	Chateau Towers	Osu, Greater Accra	Completed	Size: 120 units
5	Cedar Haus Gardens	Oluyole, Oyo	Under Construction	Size: 38 units
6	10 Agoro Street	Oluyole, Oyo	Completed	Size: 1 units
7	Villa O	Oluyole, Oyo	Completed	Size: 2 units
8	Avenue Road Apartments	Oluyole, Oyo	Completed	Size: 6 units
9	15 Alafia Street	Oluyole, Oyo	Completed	Size: 4 units
10	12 Saint Mary Street	Oluyole, Oyo	Nearing Completion	Size: 8 units
11	RATCON Estate	Oluyole, Oyo	Completed	Size: --
12	1 Goodwill Road	Oluyole, Oyo	Completed	Size: 4 units
13	Anike's Court	Oluyole, Oyo	Completed	Size: 3 units
14	9 Adeyemo Quarters	Oluyole, Oyo	Completed	Size: 4 units
15	Marigold Residency	Nairobi West, Nairobi	Under Construction	Size: --
16	Kings Distinction	Kilimani, Nairobi	Completed	Size: --
17	Riverview Apartments	Kyumvi, Machakos	Completed	Size: --
18	Serene Park	Kyumvi, Machakos	Under Construction	Size: --
19	Gitanga Duplexes	Lavington, Nairobi	Under Construction	Size: 36 units
20	Westpointe Apartments	Upper Hill, Nairobi	Completed	Size: 254 units
21	10 Olaoluwa Street	Oluyole, Oyo	Under Construction	Size: 12 units
22	Rosslyn Grove	Nairobi West, Nairobi	Under Construction	Size: 90 units
23	7 Kamoru Ajimobi Street	Oluyole, Oyo	Completed	Size: 2 units

Solution 3:^[3]

#pip install trio httpx pandas

import trio
import httpx
import pandas as pd

allin = []

keys1 = ['name', 'area', 'state']
keys2 = ['value', 'unit']


async def scraper(client, page):
    client.params = client.params.merge({'page': page})
    r = await client.get('/properties')
    allin.extend([[i.get(k, 'N/A') for k in keys1] +
                  [i['size'].get(b, 'N/A')
                   for b in keys2] for i in r.json()['data']])


async def main():
    async with httpx.AsyncClient(timeout=None, base_url='https://services.estateintel.com/api/v2') as client, trio.open_nursery() as nurse:
        client.params = {
            'type[]': 'residential'
        }
        for page in range(1, 3):
            nurse.start_soon(scraper, client, page)
    df = pd.DataFrame(allin, columns=[keys1 + keys2])
    print(df)


if __name__ == "__main__":
    trio.run(main)

Output:

0              Cedar Haus Gardens       Oluyole            Oyo    38  units
1                 10 Agoro Street       Oluyole            Oyo     1  units
2                         Villa O       Oluyole            Oyo     2  units
3          Avenue Road Apartments       Oluyole            Oyo     6  units
4                15 Alafia Street       Oluyole            Oyo     4  units
5            12 Saint Mary Street       Oluyole            Oyo     8  units
6                   RATCON Estate       Oluyole            Oyo     0  units
7                 1 Goodwill Road       Oluyole            Oyo     4  units
8                   Anike's Court       Oluyole            Oyo     3  units
9              9 Adeyemo Quarters       Oluyole            Oyo     4  units
10             Marigold Residency  Nairobi West        Nairobi     0  units
11           Riverview Apartments        Kyumvi       Machakos     0  units
12        Socian Villa Apartments    Kileleshwa        Nairobi    36  units
13          Kings Pearl Residency     Lavington        Nairobi    55  units
14              Touchwood Gardens      Kilimani        Nairobi    32  units
15            Panorama Apartments    Upper Hill        Nairobi     0  units
16               Gitanga Duplexes     Lavington        Nairobi    36  units
17                    Serene Park        Kyumvi       Machakos    25  units
18              Kings Distinction      Kilimani        Nairobi    48  units
19            Twin Oaks Apartment      Kilimani        Nairobi     0  units
20                   Duchess Park     Lavington        Nairobi    70  units
21           Greenvale Apartments    Kileleshwa        Nairobi    36  units
22  The Urban apartments & Suites           Osu  Greater Accra    28  units
23                 Chateau Towers           Osu  Greater Accra   120  units

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Sergey K
Solution 2	HedgeHog
Solution 3	Î±Ô‹É±Ò½Ôƒ Î±Ð¼Ñ”ÑÎ¹cÎ±Î·

'How to scrape all data from first page to last page using beautifulsoup

Solution 1:[1]

Solution 2:[2]