'How to scrape all data from first page to last page using beautifulsoup

I have been trying to scrape all data from the first page to the last page, but it returns only the first page as the output. How can I solve this? Below is my code:

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from time import sleep
from random import randint


pages = np.arange(2, 1589, 20)

for page in pages:
    page = requests.get( "https://estateintel.com/app/projects/search?q=%7B%22sectors%22%3A%5B%22residential%22%5D%7D&page="+str(page))
    sleep(randint(2,10))

    soup = BeautifulSoup(page.content, 'html.parser')
    lists = soup.find_all('div', class_="project-card-vertical h-full flex flex-col rounded border-thin border-inactive-blue overflow-hidden pointer")

    for list in lists:
        title = list.find('p', class_ ="project-location text-body text-base mb-3").text. replace ('\n', '',).strip()
        location = list.find('span', class_ ="text-gray-1").text. replace ('\n', '',).strip()
        status = list.find('span', class_ ="text-purple-1 font-bold").text. replace ('\n', '',).strip()
        units = list.find('span', class_ ="text-body font-semibold").text. replace ('\n', '',).strip()

        info = [title,location,status,units]
        print(info)


Solution 1:[1]

The page is loaded dynamically using the API. Therefore, with a regular GET request, you will always get the first page. You need to study how the page communicates with the browser and find the request you need, I wrote an example for review.

import json
import requests


def get_info(page):
    url = f"https://services.estateintel.com/api/v2/properties?type\\[\\]=residential&page={page}"

    headers = {
        'accept': 'application/json',
        'authorization': 'false',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36'
    }
    response = requests.request("GET", url, headers=headers)
    json_obj = json.loads(response.text)
    for data in json_obj['data']:
        print(data['name'])
        print(data['area'], data['state'])
        print(data['status'])
        print(data['size']['value'], data['size']['unit'])
        print('------')


for page in range(1, 134):
    get_info(page)

You can choose the fields you need, this is just an example, also add to dataframe. Output:

Twin Oaks Apartment
Kilimani Nairobi
Completed
0 units
------
Duchess Park
Lavington Nairobi
Completed
62 units
------
Greenvale Apartments
Kileleshwa Nairobi
Completed
36 units
------
The Urban apartments & Suites
Osu Greater Accra
Completed
28 units
------
Chateau Towers
Osu Greater Accra
Completed
120 units
------
Cedar Haus Gardens
Oluyole Oyo
Under Construction
38 units
------
10 Agoro Street
Oluyole Oyo
Completed
1 units
..............

Solution 2:[2]

Think it is working well, but needs the time to sleep - Just in case, you could select your elements more specific e.g. with css selectors and store information in a list of dicts instead just printing it.

Example
import pandas as pd
import requests
from bs4 import BeautifulSoup
from time import sleep
from random import randint

data = []
for page in range(1,134):
    print(page)
    page = requests.get( "https://estateintel.com/app/projects/search?q=%7B%22sectors%22%3A%5B%22residential%22%5D%7D&page="+str(page))
    sleep(randint(2,10))

    soup = BeautifulSoup(page.content, 'html.parser')

    for item in soup.select('div.project-grid > a'):
        data.append({
            'title' : item.h3.text.strip(),
            'location' : item.find('span', class_ ="text-gray-1").text.strip(),
            'status' : item.find('span', class_ ="text-purple-1 font-bold").text.strip(),
            'units' : item.find('span', class_ ="text-body font-semibold").text.strip()
        })
pd.DataFrame(data)
Output
title location status units
0 Twin Oaks Apartment Kilimani, Nairobi Completed Size: --
1 Duchess Park Lavington, Nairobi Completed Size: 62 units
2 Greenvale Apartments Kileleshwa, Nairobi Completed Size: 36 units
3 The Urban apartments & Suites Osu, Greater Accra Completed Size: 28 units
4 Chateau Towers Osu, Greater Accra Completed Size: 120 units
5 Cedar Haus Gardens Oluyole, Oyo Under Construction Size: 38 units
6 10 Agoro Street Oluyole, Oyo Completed Size: 1 units
7 Villa O Oluyole, Oyo Completed Size: 2 units
8 Avenue Road Apartments Oluyole, Oyo Completed Size: 6 units
9 15 Alafia Street Oluyole, Oyo Completed Size: 4 units
10 12 Saint Mary Street Oluyole, Oyo Nearing Completion Size: 8 units
11 RATCON Estate Oluyole, Oyo Completed Size: --
12 1 Goodwill Road Oluyole, Oyo Completed Size: 4 units
13 Anike's Court Oluyole, Oyo Completed Size: 3 units
14 9 Adeyemo Quarters Oluyole, Oyo Completed Size: 4 units
15 Marigold Residency Nairobi West, Nairobi Under Construction Size: --
16 Kings Distinction Kilimani, Nairobi Completed Size: --
17 Riverview Apartments Kyumvi, Machakos Completed Size: --
18 Serene Park Kyumvi, Machakos Under Construction Size: --
19 Gitanga Duplexes Lavington, Nairobi Under Construction Size: 36 units
20 Westpointe Apartments Upper Hill, Nairobi Completed Size: 254 units
21 10 Olaoluwa Street Oluyole, Oyo Under Construction Size: 12 units
22 Rosslyn Grove Nairobi West, Nairobi Under Construction Size: 90 units
23 7 Kamoru Ajimobi Street Oluyole, Oyo Completed Size: 2 units

Solution 3:[3]

#pip install trio httpx pandas

import trio
import httpx
import pandas as pd

allin = []

keys1 = ['name', 'area', 'state']
keys2 = ['value', 'unit']


async def scraper(client, page):
    client.params = client.params.merge({'page': page})
    r = await client.get('/properties')
    allin.extend([[i.get(k, 'N/A') for k in keys1] +
                  [i['size'].get(b, 'N/A')
                   for b in keys2] for i in r.json()['data']])


async def main():
    async with httpx.AsyncClient(timeout=None, base_url='https://services.estateintel.com/api/v2') as client, trio.open_nursery() as nurse:
        client.params = {
            'type[]': 'residential'
        }
        for page in range(1, 3):
            nurse.start_soon(scraper, client, page)
    df = pd.DataFrame(allin, columns=[keys1 + keys2])
    print(df)


if __name__ == "__main__":
    trio.run(main)

Output:

0              Cedar Haus Gardens       Oluyole            Oyo    38  units
1                 10 Agoro Street       Oluyole            Oyo     1  units
2                         Villa O       Oluyole            Oyo     2  units
3          Avenue Road Apartments       Oluyole            Oyo     6  units
4                15 Alafia Street       Oluyole            Oyo     4  units
5            12 Saint Mary Street       Oluyole            Oyo     8  units
6                   RATCON Estate       Oluyole            Oyo     0  units
7                 1 Goodwill Road       Oluyole            Oyo     4  units
8                   Anike's Court       Oluyole            Oyo     3  units
9              9 Adeyemo Quarters       Oluyole            Oyo     4  units
10             Marigold Residency  Nairobi West        Nairobi     0  units
11           Riverview Apartments        Kyumvi       Machakos     0  units
12        Socian Villa Apartments    Kileleshwa        Nairobi    36  units
13          Kings Pearl Residency     Lavington        Nairobi    55  units
14              Touchwood Gardens      Kilimani        Nairobi    32  units
15            Panorama Apartments    Upper Hill        Nairobi     0  units
16               Gitanga Duplexes     Lavington        Nairobi    36  units
17                    Serene Park        Kyumvi       Machakos    25  units
18              Kings Distinction      Kilimani        Nairobi    48  units
19            Twin Oaks Apartment      Kilimani        Nairobi     0  units
20                   Duchess Park     Lavington        Nairobi    70  units
21           Greenvale Apartments    Kileleshwa        Nairobi    36  units
22  The Urban apartments & Suites           Osu  Greater Accra    28  units
23                 Chateau Towers           Osu  Greater Accra   120  units

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sergey K
Solution 2 HedgeHog
Solution 3 αԋɱҽԃ αмєяιcαη