'Can't manipulate dataframe in pandas

Don't understand why I can't do even the most simple data manipulation with this data i've scraped. I've tried all sorts of methjods to manipulate the data but all come up with the same sort of error. Is my data even in a data frame yet? I can't tell.

import pandas as pd
from urllib.request import Request, urlopen

req = Request('https://smallcaps.com.au/director-transactions/'
              , headers={'User-Agent': 'Mozilla/5.0'})
trades = urlopen(req).read()
df = pd.read_html(trades)
print(df) #<-- This line prints the df and works fine

df.drop([0, 1]) #--> THis one shows the error below
print(df) 

Error:

Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\Scraper\DirectorTrades.py", line 10, in <module>
    df.drop([0, 1])
AttributeError: 'list' object has no attribute 'drop'


Solution 1:[1]

Main issue is as mentioned that pandas.read_html() returns a list of dataframes and you have to specify by index wich you like to choose.

Is my data even in a data frame yet?

  • df = pd.read_html(trades) No it is not, cause it provides a list of dataframes

  • df = pd.read_html(trades)[0] Yes, this will give you the first dataframe from list of frames

Example
import pandas as pd
from urllib.request import Request, urlopen

req = Request('https://smallcaps.com.au/director-transactions/'
              , headers={'User-Agent': 'Mozilla/5.0'})
trades = urlopen(req).read()
df = pd.read_html(trades)[0]
df.drop([0, 1])
df
Output
Date Code Company Director Value
0 27/4/2022 ESR Estrella Resources L. Pereira ?$1,075
1 27/4/2022 LNY Laneway Resources S. Bizzell ?126,750
2 26/4/2022 FGX Future Generation Investment Company G. Wilson ?$13,363
3 26/4/2022 CDM Cadence Capital J. Webster ?$25,110
4 26/4/2022 TEK Thorney Technologies A. Waislitz ?$35,384
5 26/4/2022 FGX Future Generation Investment Company K. Thorley ?$7,980

...

Solution 2:[2]

read_html returns a list of dataframes.

Try:

dfs = pd.read_html(trades)
dfs = [df.drop([0,1]) for df in dfs]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HedgeHog
Solution 2 Learning is a mess