'Can't manipulate dataframe in pandas
Don't understand why I can't do even the most simple data manipulation with this data i've scraped. I've tried all sorts of methjods to manipulate the data but all come up with the same sort of error. Is my data even in a data frame yet? I can't tell.
import pandas as pd
from urllib.request import Request, urlopen
req = Request('https://smallcaps.com.au/director-transactions/'
, headers={'User-Agent': 'Mozilla/5.0'})
trades = urlopen(req).read()
df = pd.read_html(trades)
print(df) #<-- This line prints the df and works fine
df.drop([0, 1]) #--> THis one shows the error below
print(df)
Error:
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\Scraper\DirectorTrades.py", line 10, in <module>
df.drop([0, 1])
AttributeError: 'list' object has no attribute 'drop'
Solution 1:[1]
Main issue is as mentioned that pandas.read_html()
returns a list of dataframes and you have to specify by index wich you like to choose.
Is my data even in a data frame yet?
df = pd.read_html(trades)
No it is not, cause it provides a list of dataframesdf = pd.read_html(trades)[0]
Yes, this will give you the first dataframe from list of frames
Example
import pandas as pd
from urllib.request import Request, urlopen
req = Request('https://smallcaps.com.au/director-transactions/'
, headers={'User-Agent': 'Mozilla/5.0'})
trades = urlopen(req).read()
df = pd.read_html(trades)[0]
df.drop([0, 1])
df
Output
Date | Code | Company | Director | Value | |
---|---|---|---|---|---|
0 | 27/4/2022 | ESR | Estrella Resources | L. Pereira | ?$1,075 |
1 | 27/4/2022 | LNY | Laneway Resources | S. Bizzell | ?126,750 |
2 | 26/4/2022 | FGX | Future Generation Investment Company | G. Wilson | ?$13,363 |
3 | 26/4/2022 | CDM | Cadence Capital | J. Webster | ?$25,110 |
4 | 26/4/2022 | TEK | Thorney Technologies | A. Waislitz | ?$35,384 |
5 | 26/4/2022 | FGX | Future Generation Investment Company | K. Thorley | ?$7,980 |
...
Solution 2:[2]
read_html returns a list of dataframes.
Try:
dfs = pd.read_html(trades)
dfs = [df.drop([0,1]) for df in dfs]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | HedgeHog |
Solution 2 | Learning is a mess |