'Scraping data from oddsportal.com with Python
I have been trying to extract the data for each cell on a number from this ajax website, the details for each cell only pop-up when a mouse point on the cell.
I have used Python selenium webdriver
and phantomjs to load and extract the page_source but the data wasn't found. I used firebug to look for any .json file that the content may be loading from but found none.
Please take a look at the site and suggest how I can scrape the content from the hover-box displaced when pointing on each cell on the map.
P.S: I have done a lot for research both on stack overflow and several sites all to no avail.
Solution 1:[1]
The data is out there for grabs, all you need to do is get a good look at what's going on behind the scenes. How? Use the Developer Tool
and inspect the requests. You'll surely notice that are two that have what you need:
- the bookies request -> https://www.oddsportal.com/res/x/bookies-201014103652-1602877009.js
- the data request -> https://fb.oddsportal.com/feed/match/1-1-ld9FDhEI-1-2-yje1c.dat?_=1603009421226
The first one uses current time in seconds, and the other one in milliseconds. You have to swap that each time you make a request.
The bookies request is a mapping of all bookies from the site. This is how I got the 1xBet
code, which is 417
. You can easily map that with the odds data, for example, to fetch all history bets for a given bookie and/or bookies. There are plenty of possibilities here.
The odds request is, well, all the data that you see in those tables.
Then, you need to do a bit of regex to grab the JSON
data that comes back with those two requests.
Finally, you can start poking around the payload and get what you need. For example, let say you're after 1xBet
and the X
column. So you'd expect something like this:
1xBet - 2.61 3.5 2.88
and the history for the X
column (here with value 3.5
above) could look something like this:
2020-10-18 05:03:45 - 3.58
2020-10-18 04:29:41 - 3.54
2020-10-18 02:53:17 - 3.56
2020-10-17 22:25:56 - 3.58
2020-10-17 17:13:53 - 3.60
2020-10-17 13:03:37 - 3.64
2020-10-17 10:12:06 - 3.66
and so on...
So, putting this all together, here's what I've come up with:
import json
import re
import time
from datetime import datetime
import requests
headers = {
"accept": "*/*",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9,pl;q=0.8",
"referer": "https://www.oddsportal.com/",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.99 Safari/537.36"
}
def get_response(url: str) -> str:
return requests.get(url, headers=headers).text
time_now_s = int(time.time())
time_now_ms = int(round(time.time() * 1000))
bookies_js = f"https://www.oddsportal.com/res/x/bookies-201014103652-{time_now_s}.js"
odds_data_js = f"https://fb.oddsportal.com/feed/match/1-1-ld9FDhEI-1-2-yje1c.dat?_={time_now_ms}"
bookies = json.loads(re.findall(r'bookmakersData=({.*});var', get_response(bookies_js))[0])
odds_data = json.loads(re.findall(r"\.dat',\s({.*})", get_response(odds_data_js))[0])
bookie = bookies['417']['WebName'] # 417 is 1xBet's code
bookies_odds = odds_data['d']['oddsdata']['back']['E-1-2-0-0-0']['odds']['417'] # current odds for a bookie
odds_sorted = dict(sorted(bookies_odds.items())).values() # sorted as on the website 1 - X - 2
print(f"{bookie} - {' '.join(str(i) for i in odds_sorted)}")
history_columns = {
"1": "4ccecxv464x0xbcsm1", # 1 column
"2": "4ccecxv464x0xbcsm2", # 2 column
"X": "4ccecxv498x0x0", # X column
}
# odds history for the X column for a given bookie
history_data = odds_data['d']['history']['back'][history_columns['X']]['417']
for item in history_data:
value, _, timestamp = item
print(f"{datetime.fromtimestamp(timestamp)} - {value}")
Outputs:
1xBet - 2.61 3.5 2.88
2020-10-18 05:03:45 - 3.58
2020-10-18 04:29:41 - 3.54
2020-10-18 02:53:17 - 3.56
2020-10-17 22:25:56 - 3.58
2020-10-17 17:13:53 - 3.60
2020-10-17 13:03:37 - 3.64
2020-10-17 10:12:06 - 3.66
2020-10-17 09:58:04 - 3.64
2020-10-17 08:38:52 - 3.62
2020-10-17 08:08:54 - 3.64
2020-10-17 07:44:47 - 3.62
2020-10-17 06:17:33 - 3.64
2020-10-17 06:07:36 - 3.62
2020-10-17 00:04:35 - 3.64
2020-10-16 23:54:39 - 3.62
2020-10-16 23:38:40 - 3.64
2020-10-16 18:54:48 - 3.62
2020-10-16 15:14:17 - 3.64
2020-10-16 13:27:06 - 3.66
2020-10-01 21:43:39 - 3.40
Which is what you see on the web page for 1xBet
for column X
once you hoover over the value.
Tip: if you ever run into the missing oddsdata key
problem, take a look a the Developer Tool and see if the endpoint has changed. See this:
Just grab the new endpoint https://fb.oddsportal.com/feed/postmatchscore/1-ld9FDhEI-yje1c.dat?_=
or better yet go for this part yje1c
part only and swap it with the one in the code.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |